Patrick Robertson
898faf6fe4
Further WIP - currently working on verify_signed
2025-02-25 12:08:08 +00:00
Patrick Robertson
f8e846d59a
Create facebook dropin - working for images + text. CAVEAT: only gets the first ~100 chars of the post at the moment
2025-02-25 11:44:35 +00:00
Patrick Robertson
01bf88a695
Merge branch 'main' into timestamping_rewrite
2025-02-24 12:03:14 +00:00
Patrick Robertson
73a2e2d752
Fix tests for moving orchestration to secrets/orchestration.yaml
2025-02-21 19:05:39 +00:00
Patrick Robertson
4174285898
Fix unit tests
2025-02-20 13:18:06 +00:00
Patrick Robertson
40b8359348
Implementation test with 2 x orchestrators with different configs
2025-02-20 11:18:28 +00:00
Patrick Robertson
7dde8d609d
Merge main
2025-02-20 10:29:57 +00:00
Patrick Robertson
5211c5de18
Merge pull request #210 from bellingcat/logger_fix
...
Fix issue #200 + Refactor _LAZY_LOADED_MODULES
2025-02-19 15:11:42 +00:00
erinhmclark
47a634fc63
Add WACZ, Wayback and local storage tests.
2025-02-19 13:14:08 +00:00
Patrick Robertson
a9802dd004
Remove the global _LAZY_LOADED_MODULES and allow each instance of ArchivingOrchestrator to load its own modules
2025-02-19 12:25:35 +00:00
Patrick Robertson
222a94563f
WIP: Docs tidyups+add howto on logging and authentication
...
(Authentication is WIP)
2025-02-19 10:37:04 +00:00
erinhmclark
10a5ad62b8
Include Atlos tests, metadata fixture.
2025-02-19 09:18:41 +00:00
erinhmclark
f0fd9bf445
Updates tests to use pytest-mock.
2025-02-18 23:32:03 +00:00
erinhmclark
657fbd357d
Merge branch 'main' into tests/add_module_tests
2025-02-18 19:47:47 +00:00
erinhmclark
7b88df72cb
Update test_metadata_enricher.py
2025-02-18 19:46:57 +00:00
Patrick Robertson
3c543a3a6a
Various fixes for issues with new architecture ( #208 )
...
* Add formatters to the TOC - fixes #204
* Add 'steps' settings to the example YAML in the docs. Fixes #206
* Improved docs on authentication architecture
* Fix setting modules on the command line - they now override any module settings in the orchestration as opposed to appending
* Fix tests for gsheet-feeder: add a test service_account.json (note: not real keys in there)
* Rename the command line entrypoint to _command_line_run
Also: make it clear that code implementation should not call this
Make sure the command line entry returns (we don't want a generator)
* Fix unit tests to use now code-entry points
* Version bump
* Move iterating of generator up to __main__
* Breakpoint
* two minor fixes
* Fix unit tests + add new '__main__' entry point implementation test
* Skip youtube tests if running on CI. Should still run them locally
* Fix full implementation run on GH actions
* Fix skipif test for GH Actions CI
* Add skipifs for truth - it blocks GH:
---------
Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
2025-02-18 19:10:09 +00:00
erinhmclark
ce5a200d1f
Added tests, updated instagram_tbot_extractor.py raise failure.
2025-02-18 12:59:10 +00:00
erinhmclark
f4c623b11b
Merge branch 'main' into tests/add_module_tests
2025-02-17 09:03:04 +00:00
Patrick Robertson
6d43bc7d4d
Fix generator programmatic setup ( #197 )
...
* Fix returning a generator of a generator
* Move download test test to pytest.mark.download
2025-02-15 17:36:44 +00:00
erinhmclark
71b41dd901
Remove accidental path, yet again.
2025-02-14 10:05:32 +00:00
erinhmclark
b0756a6a34
Remove accidental full path.
2025-02-14 09:57:44 +00:00
erinhmclark
319c1e8f92
Add more tests.
2025-02-14 09:48:37 +00:00
erinhmclark
3fce593aad
Merge branch 'main' into tests/add_module_tests
2025-02-12 19:33:29 +00:00
erinhmclark
cbe98c729d
Enricher tests
2025-02-12 19:32:40 +00:00
erinhmclark
d9d936c2ca
Thumbnail enricher fix seconds to minutes.
2025-02-12 12:22:27 +00:00
Patrick Robertson
d0c379a3ba
WIP - timestamping enricher
2025-02-11 18:18:19 +00:00
Patrick Robertson
3163cb793a
Fix timestamping enricher for new module structure (temp paths)
2025-02-11 15:26:40 +00:00
Patrick Robertson
7bb4d68a22
Merge branch 'load_modules' into timestamping_rewrite
2025-02-11 15:21:31 +00:00
Patrick Robertson
29901da601
Merge branch 'load_modules' into docs_update
2025-02-11 14:10:56 +00:00
Patrick Robertson
2f51d3917a
Further addition to docs: creating modules, configurations, installation
2025-02-11 13:49:30 +00:00
erinhmclark
d1d6cde008
Set mock timestamp without z format
2025-02-11 12:27:48 +00:00
erinhmclark
5e2e93382f
Test fixes for 3.10 compliance.
2025-02-11 12:17:42 +00:00
erinhmclark
f97ec6a9e0
Fixed S3 module import
2025-02-11 11:58:28 +00:00
erinhmclark
89d9140d15
Fixed setup/ config_setup reference
2025-02-11 11:47:11 +00:00
erinhmclark
1792e02d1d
skip authenticated tests in test_gdrive_storage.py
2025-02-11 11:34:36 +00:00
erinhmclark
18666ff027
skip authenticated tests in test_gsheet_feeder.py
2025-02-11 11:28:24 +00:00
erinhmclark
a69ac3e509
Fix file hash reference in S3 tests
2025-02-11 09:46:22 +00:00
erinhmclark
c4bb667cec
Merge branch 'load_modules' into add_module_tests
...
# Conflicts:
# src/auto_archiver/modules/s3_storage/s3_storage.py
# src/auto_archiver/utils/gsheet.py
# src/auto_archiver/utils/misc.py
2025-02-10 16:17:08 +00:00
erinhmclark
f311621e58
Small fixes.
...
Add timestamp helper method.
2025-02-10 15:57:42 +00:00
Patrick Robertson
f3f6b92817
Implementation test cleanup
2025-02-10 12:43:21 +00:00
Patrick Robertson
74207d7821
Implementation tests for auto-archiver
2025-02-10 13:27:11 +01:00
erinhmclark
e9ad1e1b85
Pass media to storage cdn_call
2025-02-06 22:01:55 +00:00
erinhmclark
266c7a14e6
Context related fixes, some more tests.
2025-02-06 16:53:00 +00:00
erinhmclark
67504a683e
Merge branch 'load_modules' into add_module_tests
2025-02-06 10:13:37 +00:00
erinhmclark
5b0bad832f
Updated test, test metadata
2025-02-06 10:11:56 +00:00
Patrick Robertson
6ab8fd2ee4
Tidy up setting modules as Orchestrator attributes on startup.
...
Don't override the values in config['steps'] – the config should be left as is
2025-02-06 10:20:05 +01:00
erinhmclark
52542812dc
Merge tests from version with context.
2025-02-05 16:42:58 +00:00
Patrick Robertson
78e6418249
Unit tests for csv feeder + fix some bugs
2025-02-04 13:37:26 +01:00
Patrick Robertson
c25d5cae84
Remove ArchivingContext completely
...
Context for a specific url/item is now passed around via the metadata (metadata.set_context('key', 'val') and metadata.get_context('key', default='something')
The only other thing that was passed around in ArchivingContext was the storage info, which is already accessible now via self.config
2025-01-30 17:50:54 +01:00
Patrick Robertson
d76063c3f3
Fix unit tests
2025-01-30 16:46:53 +01:00
Patrick Robertson
d6b4b7a932
Further cleanup
...
* Removes (partly) the ArchivingOrchestrator
* Removes the cli_feeder module, and makes it the 'default', allowing you to pass URLs directly on the command line, without having to use the cumbersome --cli_feeder.urls. Just do auto-archiver https://my.url.com
* More unit tests
* Improved error handling
2025-01-30 16:44:40 +01:00
Patrick Robertson
fade68c6f4
Fix up unit tests - dataclass + subclasses not having @dataclass was breaking it
2025-01-30 13:45:24 +01:00
Patrick Robertson
b7d9145f6c
Further tidyups + refactoring for new structure
...
* Add implementation tests for orchestrator + logging tests
* Standardise method/class vars for extractors to see if they are suitable
* Fix bugs with removing default loguru logger (allows further customisation)
* Fix bug loading required fields from file
*
2025-01-30 13:21:10 +01:00
Patrick Robertson
00a7018f36
Fix up dependency checking (use 'dependencies' instead of 'external_dependencies' -> simpler/easier to remember
2025-01-29 19:25:22 +01:00
Patrick Robertson
3d37c494aa
Tidy ups + unit tests:
...
1. Allow loading modules from --module_paths=/extra/path/here
2. Improved unit tests for module loading
3. Further small tidy ups/clean ups
2025-01-29 18:42:49 +01:00
Patrick Robertson
4c1c8953ca
Add unit tests for timestamping_enricher
2025-01-29 12:20:52 +01:00
Patrick Robertson
7a4871db6b
Fix up unit tests for new structure
2025-01-28 14:40:12 +01:00
Patrick Robertson
14e2479599
Merge branch 'more_mainifests' into load_modules
2025-01-27 11:05:56 +01:00
erinhmclark
aa7ca93a43
Update manifests and modules
2025-01-24 12:58:16 +00:00
Patrick Robertson
9befb9776c
Fix loading modules when entry_point isn't set
2025-01-23 21:08:54 +01:00
Patrick Robertson
b27bf8ffeb
Fix up loading/storing configs + unit tests
2025-01-23 20:32:19 +01:00
erinhmclark
1274a1b231
More manifests, base modules and rename from archiver to extractor.
2025-01-23 16:40:48 +00:00
erinhmclark
79684f8348
Set up feeder manifests (not merged by source yet)
2025-01-23 09:16:42 +00:00
Patrick Robertson
241b35002c
Initial changes to move to '__manifest__' format
2025-01-21 19:02:38 +01:00
Patrick Robertson
d3e3eb7639
unit tests for loading dropins
2025-01-21 16:59:45 +01:00
Patrick Robertson
dff0105659
Small fixups + implement Truth code for posts with multiple media
2025-01-20 18:40:46 +01:00
Patrick Robertson
fd2e7f973b
Further tidy-ups, also adds some ytdlp utils to 'utils'
2025-01-20 16:31:28 +01:00
Patrick Robertson
befc92deb4
Further unit test tidy ups
2025-01-17 17:29:13 +01:00
Patrick Robertson
d4893ee05e
Fix unit tests for base_archiver->generic_archiver rename
2025-01-17 17:08:00 +01:00
Patrick Robertson
17c1c9c360
Fix up core unit tests when a twitter api key isn't provided
2025-01-17 12:02:38 +01:00
Patrick Robertson
394bcd8d47
Further refactoring of youtubedl_archiver->base_archiver
...
* Keep twitter_api_archiver
* Remove unit tests for obsolete archivers
* Guess filename of media using the 'Content-Type' header
* Add mechanism to run 'expensive' tests last (see conftest.py) and also flag expensive tests to fail straight off (pytest.mark.incremental)
2025-01-17 11:56:08 +01:00
Patrick Robertson
3168bed0d9
Add (skipped) test for twitter extraction with youtubedlp
2025-01-15 19:00:57 +01:00
Patrick Robertson
5626bba815
Add test on bluesky and note on why it doesn't work
2025-01-15 18:31:20 +01:00
Patrick Robertson
74cf1f5f23
Merge branch 'main' into youtubedlp-rewrite
2025-01-15 17:47:23 +01:00
Patrick Robertson
4f2b9baa73
refactor youtubedlp archiver to work for all valid websites
...
1. Extract more metadata
2. Better extract thumbnail
3. Setup framework for specific sites to provide more granular metadata processing
2025-01-15 17:46:47 +01:00
Patrick Robertson
20726c1116
Remove tiktok-downloader - getting info is broken
...
TODO: switch to using youtube-dlp
2025-01-14 17:40:45 +01:00
Patrick Robertson
6f10270baf
Remove unittest and switch to pytest fully
2025-01-14 16:28:39 +01:00
Patrick Robertson
528b78db85
Flag tombstone tweets for twitter_syndication method
2025-01-13 18:17:24 +01:00
Patrick Robertson
57eacdc24a
Merge branch 'main' into feat/unittest
2025-01-13 18:06:55 +01:00
Patrick Robertson
bbef80de4c
Add unit tests for html_formatter, csv_db
2025-01-13 17:58:10 +01:00
Patrick Robertson
63973e2ce7
switch to pytest and pytest-recording
2025-01-13 16:23:20 +01:00
Patrick Robertson
e2bc84ccb9
Merge branch 'main' into feat/unittest
2025-01-13 13:15:13 +01:00
Patrick Robertson
3546d4ad79
Fix 'download_syndication' method for tweet archiving (now requires a token)
...
Plus add in unit tests for token generation + download syndication
2025-01-12 12:55:00 +01:00
Patrick Robertson
c932fb7416
Improved logging when an invalid/deleted tweet is attempted to be downloaded
...
Plus: unit tests for non-existent tweet + invalid tweet ID
2025-01-12 12:00:45 +01:00
Patrick Robertson
9dc4eb35de
Switch to pytest and use vcr for request storing
2025-01-08 11:25:13 +01:00
Patrick Robertson
8c044c15f0
Add base test class for archivers with boilerplate code
...
Plus: create test class for twitter archiver. Currently WIP
2025-01-08 10:38:56 +01:00
Miguel Sozinho Ramalho
a697f0a212
adds an unauthenticated Bluesky archiver ( #160 )
...
* adds a TODO for next code iterations
* implements bsky archiver
* adds new archiver to example orchestration file
* Fix downloading media for posts with multiple images
(Images are stored in media/images)
* Setup a basic framework for unit tests
Use 'python -m unittest' from the project root to run
---------
Co-authored-by: Patrick Robertson <robertson.patrick@gmail.com>
2025-01-07 10:28:07 +00:00
Patrick Robertson
30d423c8e6
Setup a basic framework for unit tests
...
Use 'python -m unittest' from the project root to run
2024-12-31 14:29:52 +01:00