Wykres commitów

188 Commity (dev)

Autor SHA1 Wiadomość Data
Patrick Robertson 898faf6fe4 Further WIP - currently working on verify_signed 2025-02-25 12:08:08 +00:00
Patrick Robertson f8e846d59a Create facebook dropin - working for images + text. CAVEAT: only gets the first ~100 chars of the post at the moment 2025-02-25 11:44:35 +00:00
Patrick Robertson 01bf88a695 Merge branch 'main' into timestamping_rewrite 2025-02-24 12:03:14 +00:00
Patrick Robertson 73a2e2d752 Fix tests for moving orchestration to secrets/orchestration.yaml 2025-02-21 19:05:39 +00:00
Patrick Robertson 4174285898 Fix unit tests 2025-02-20 13:18:06 +00:00
Patrick Robertson 40b8359348 Implementation test with 2 x orchestrators with different configs 2025-02-20 11:18:28 +00:00
Patrick Robertson 7dde8d609d Merge main 2025-02-20 10:29:57 +00:00
Patrick Robertson 5211c5de18
Merge pull request #210 from bellingcat/logger_fix
Fix issue #200 + Refactor _LAZY_LOADED_MODULES
2025-02-19 15:11:42 +00:00
erinhmclark 47a634fc63 Add WACZ, Wayback and local storage tests. 2025-02-19 13:14:08 +00:00
Patrick Robertson a9802dd004 Remove the global _LAZY_LOADED_MODULES and allow each instance of ArchivingOrchestrator to load its own modules 2025-02-19 12:25:35 +00:00
Patrick Robertson 222a94563f WIP: Docs tidyups+add howto on logging and authentication
(Authentication is WIP)
2025-02-19 10:37:04 +00:00
erinhmclark 10a5ad62b8 Include Atlos tests, metadata fixture. 2025-02-19 09:18:41 +00:00
erinhmclark f0fd9bf445 Updates tests to use pytest-mock. 2025-02-18 23:32:03 +00:00
erinhmclark 657fbd357d Merge branch 'main' into tests/add_module_tests 2025-02-18 19:47:47 +00:00
erinhmclark 7b88df72cb Update test_metadata_enricher.py 2025-02-18 19:46:57 +00:00
Patrick Robertson 3c543a3a6a
Various fixes for issues with new architecture (#208)
* Add formatters to the TOC - fixes #204

* Add 'steps' settings to the example YAML in the docs. Fixes #206

* Improved docs on authentication architecture

* Fix setting modules on the command line - they now override any module settings in the orchestration as opposed to appending

* Fix tests for gsheet-feeder: add a test service_account.json (note: not real keys in there)

* Rename the command line entrypoint to _command_line_run

Also: make it clear that code implementation should not call this
Make sure the command line entry returns (we don't want a generator)

* Fix unit tests to use now code-entry points

* Version bump

* Move iterating of generator up to __main__

* Breakpoint

* two minor fixes

* Fix unit tests + add new '__main__' entry point implementation test

* Skip youtube tests if running on CI. Should still run them locally

* Fix full implementation run on GH actions

* Fix skipif test for GH Actions CI

* Add skipifs for truth - it blocks GH:

---------

Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
2025-02-18 19:10:09 +00:00
erinhmclark ce5a200d1f Added tests, updated instagram_tbot_extractor.py raise failure. 2025-02-18 12:59:10 +00:00
erinhmclark f4c623b11b Merge branch 'main' into tests/add_module_tests 2025-02-17 09:03:04 +00:00
Patrick Robertson 6d43bc7d4d
Fix generator programmatic setup (#197)
* Fix returning a generator of a generator

* Move download test test to pytest.mark.download
2025-02-15 17:36:44 +00:00
erinhmclark 71b41dd901 Remove accidental path, yet again. 2025-02-14 10:05:32 +00:00
erinhmclark b0756a6a34 Remove accidental full path. 2025-02-14 09:57:44 +00:00
erinhmclark 319c1e8f92 Add more tests. 2025-02-14 09:48:37 +00:00
erinhmclark 3fce593aad Merge branch 'main' into tests/add_module_tests 2025-02-12 19:33:29 +00:00
erinhmclark cbe98c729d Enricher tests 2025-02-12 19:32:40 +00:00
erinhmclark d9d936c2ca Thumbnail enricher fix seconds to minutes. 2025-02-12 12:22:27 +00:00
Patrick Robertson d0c379a3ba WIP - timestamping enricher 2025-02-11 18:18:19 +00:00
Patrick Robertson 3163cb793a Fix timestamping enricher for new module structure (temp paths) 2025-02-11 15:26:40 +00:00
Patrick Robertson 7bb4d68a22 Merge branch 'load_modules' into timestamping_rewrite 2025-02-11 15:21:31 +00:00
Patrick Robertson 29901da601 Merge branch 'load_modules' into docs_update 2025-02-11 14:10:56 +00:00
Patrick Robertson 2f51d3917a Further addition to docs: creating modules, configurations, installation 2025-02-11 13:49:30 +00:00
erinhmclark d1d6cde008 Set mock timestamp without z format 2025-02-11 12:27:48 +00:00
erinhmclark 5e2e93382f Test fixes for 3.10 compliance. 2025-02-11 12:17:42 +00:00
erinhmclark f97ec6a9e0 Fixed S3 module import 2025-02-11 11:58:28 +00:00
erinhmclark 89d9140d15 Fixed setup/ config_setup reference 2025-02-11 11:47:11 +00:00
erinhmclark 1792e02d1d skip authenticated tests in test_gdrive_storage.py 2025-02-11 11:34:36 +00:00
erinhmclark 18666ff027 skip authenticated tests in test_gsheet_feeder.py 2025-02-11 11:28:24 +00:00
erinhmclark a69ac3e509 Fix file hash reference in S3 tests 2025-02-11 09:46:22 +00:00
erinhmclark c4bb667cec Merge branch 'load_modules' into add_module_tests
# Conflicts:
#	src/auto_archiver/modules/s3_storage/s3_storage.py
#	src/auto_archiver/utils/gsheet.py
#	src/auto_archiver/utils/misc.py
2025-02-10 16:17:08 +00:00
erinhmclark f311621e58 Small fixes.
Add timestamp helper method.
2025-02-10 15:57:42 +00:00
Patrick Robertson f3f6b92817 Implementation test cleanup 2025-02-10 12:43:21 +00:00
Patrick Robertson 74207d7821 Implementation tests for auto-archiver 2025-02-10 13:27:11 +01:00
erinhmclark e9ad1e1b85 Pass media to storage cdn_call 2025-02-06 22:01:55 +00:00
erinhmclark 266c7a14e6 Context related fixes, some more tests. 2025-02-06 16:53:00 +00:00
erinhmclark 67504a683e Merge branch 'load_modules' into add_module_tests 2025-02-06 10:13:37 +00:00
erinhmclark 5b0bad832f Updated test, test metadata 2025-02-06 10:11:56 +00:00
Patrick Robertson 6ab8fd2ee4 Tidy up setting modules as Orchestrator attributes on startup.
Don't override the values in config['steps'] – the config should be left as is
2025-02-06 10:20:05 +01:00
erinhmclark 52542812dc Merge tests from version with context. 2025-02-05 16:42:58 +00:00
Patrick Robertson 78e6418249 Unit tests for csv feeder + fix some bugs 2025-02-04 13:37:26 +01:00
Patrick Robertson c25d5cae84 Remove ArchivingContext completely
Context for a specific url/item is now passed around via the metadata (metadata.set_context('key', 'val') and metadata.get_context('key', default='something')
The only other thing that was passed around in ArchivingContext was the storage info, which is already accessible now via self.config
2025-01-30 17:50:54 +01:00
Patrick Robertson d76063c3f3 Fix unit tests 2025-01-30 16:46:53 +01:00
Patrick Robertson d6b4b7a932 Further cleanup
* Removes (partly) the ArchivingOrchestrator
* Removes the cli_feeder module, and makes it the 'default', allowing you to pass URLs directly on the command line, without having to use the cumbersome --cli_feeder.urls. Just do auto-archiver https://my.url.com
* More unit tests
* Improved error handling
2025-01-30 16:44:40 +01:00
Patrick Robertson fade68c6f4 Fix up unit tests - dataclass + subclasses not having @dataclass was breaking it 2025-01-30 13:45:24 +01:00
Patrick Robertson b7d9145f6c Further tidyups + refactoring for new structure
* Add implementation tests for orchestrator + logging tests
* Standardise method/class vars for extractors to see if they are suitable
* Fix bugs with removing default loguru logger (allows further customisation)
* Fix bug loading required fields from file
*
2025-01-30 13:21:10 +01:00
Patrick Robertson 00a7018f36 Fix up dependency checking (use 'dependencies' instead of 'external_dependencies' -> simpler/easier to remember 2025-01-29 19:25:22 +01:00
Patrick Robertson 3d37c494aa Tidy ups + unit tests:
1. Allow loading modules from --module_paths=/extra/path/here
2. Improved unit tests for module loading
3. Further small tidy ups/clean ups
2025-01-29 18:42:49 +01:00
Patrick Robertson 4c1c8953ca Add unit tests for timestamping_enricher 2025-01-29 12:20:52 +01:00
Patrick Robertson 7a4871db6b Fix up unit tests for new structure 2025-01-28 14:40:12 +01:00
Patrick Robertson 14e2479599 Merge branch 'more_mainifests' into load_modules 2025-01-27 11:05:56 +01:00
erinhmclark aa7ca93a43 Update manifests and modules 2025-01-24 12:58:16 +00:00
Patrick Robertson 9befb9776c Fix loading modules when entry_point isn't set 2025-01-23 21:08:54 +01:00
Patrick Robertson b27bf8ffeb Fix up loading/storing configs + unit tests 2025-01-23 20:32:19 +01:00
erinhmclark 1274a1b231 More manifests, base modules and rename from archiver to extractor. 2025-01-23 16:40:48 +00:00
erinhmclark 79684f8348 Set up feeder manifests (not merged by source yet) 2025-01-23 09:16:42 +00:00
Patrick Robertson 241b35002c Initial changes to move to '__manifest__' format 2025-01-21 19:02:38 +01:00
Patrick Robertson d3e3eb7639 unit tests for loading dropins 2025-01-21 16:59:45 +01:00
Patrick Robertson dff0105659 Small fixups + implement Truth code for posts with multiple media 2025-01-20 18:40:46 +01:00
Patrick Robertson fd2e7f973b Further tidy-ups, also adds some ytdlp utils to 'utils' 2025-01-20 16:31:28 +01:00
Patrick Robertson befc92deb4 Further unit test tidy ups 2025-01-17 17:29:13 +01:00
Patrick Robertson d4893ee05e Fix unit tests for base_archiver->generic_archiver rename 2025-01-17 17:08:00 +01:00
Patrick Robertson 17c1c9c360 Fix up core unit tests when a twitter api key isn't provided 2025-01-17 12:02:38 +01:00
Patrick Robertson 394bcd8d47 Further refactoring of youtubedl_archiver->base_archiver
* Keep twitter_api_archiver
* Remove unit tests for obsolete archivers
* Guess filename of media using the 'Content-Type' header
* Add mechanism to run 'expensive' tests last (see conftest.py) and also flag expensive tests to fail straight off (pytest.mark.incremental)
2025-01-17 11:56:08 +01:00
Patrick Robertson 3168bed0d9 Add (skipped) test for twitter extraction with youtubedlp 2025-01-15 19:00:57 +01:00
Patrick Robertson 5626bba815 Add test on bluesky and note on why it doesn't work 2025-01-15 18:31:20 +01:00
Patrick Robertson 74cf1f5f23 Merge branch 'main' into youtubedlp-rewrite 2025-01-15 17:47:23 +01:00
Patrick Robertson 4f2b9baa73 refactor youtubedlp archiver to work for all valid websites
1. Extract more metadata
2. Better extract thumbnail
3. Setup framework for specific sites to provide more granular metadata processing
2025-01-15 17:46:47 +01:00
Patrick Robertson 20726c1116 Remove tiktok-downloader - getting info is broken
TODO: switch to using youtube-dlp
2025-01-14 17:40:45 +01:00
Patrick Robertson 6f10270baf Remove unittest and switch to pytest fully 2025-01-14 16:28:39 +01:00
Patrick Robertson 528b78db85 Flag tombstone tweets for twitter_syndication method 2025-01-13 18:17:24 +01:00
Patrick Robertson 57eacdc24a Merge branch 'main' into feat/unittest 2025-01-13 18:06:55 +01:00
Patrick Robertson bbef80de4c Add unit tests for html_formatter, csv_db 2025-01-13 17:58:10 +01:00
Patrick Robertson 63973e2ce7 switch to pytest and pytest-recording 2025-01-13 16:23:20 +01:00
Patrick Robertson e2bc84ccb9 Merge branch 'main' into feat/unittest 2025-01-13 13:15:13 +01:00
Patrick Robertson 3546d4ad79 Fix 'download_syndication' method for tweet archiving (now requires a token)
Plus add in unit tests for token generation + download syndication
2025-01-12 12:55:00 +01:00
Patrick Robertson c932fb7416 Improved logging when an invalid/deleted tweet is attempted to be downloaded
Plus: unit tests for non-existent tweet + invalid tweet ID
2025-01-12 12:00:45 +01:00
Patrick Robertson 9dc4eb35de Switch to pytest and use vcr for request storing 2025-01-08 11:25:13 +01:00
Patrick Robertson 8c044c15f0 Add base test class for archivers with boilerplate code
Plus: create test class for twitter archiver. Currently WIP
2025-01-08 10:38:56 +01:00
Miguel Sozinho Ramalho a697f0a212
adds an unauthenticated Bluesky archiver (#160)
* adds a TODO for next code iterations

* implements bsky archiver

* adds new archiver to example orchestration file

* Fix downloading media for posts with multiple images

(Images are stored in media/images)

* Setup a basic framework for unit tests

Use 'python -m unittest' from the project root to run

---------

Co-authored-by: Patrick Robertson <robertson.patrick@gmail.com>
2025-01-07 10:28:07 +00:00
Patrick Robertson 30d423c8e6 Setup a basic framework for unit tests
Use 'python -m unittest' from the project root to run
2024-12-31 14:29:52 +01:00