Wykres commitów

154 Commity (17d2d14680d51edd5bc91a2012b4fb03e18cb4b9)

Autor SHA1 Wiadomość Data
erinhmclark 1792e02d1d skip authenticated tests in test_gdrive_storage.py 2025-02-11 11:34:36 +00:00
erinhmclark 18666ff027 skip authenticated tests in test_gsheet_feeder.py 2025-02-11 11:28:24 +00:00
erinhmclark a69ac3e509 Fix file hash reference in S3 tests 2025-02-11 09:46:22 +00:00
erinhmclark c4bb667cec Merge branch 'load_modules' into add_module_tests
# Conflicts:
#	src/auto_archiver/modules/s3_storage/s3_storage.py
#	src/auto_archiver/utils/gsheet.py
#	src/auto_archiver/utils/misc.py
2025-02-10 16:17:08 +00:00
erinhmclark f311621e58 Small fixes.
Add timestamp helper method.
2025-02-10 15:57:42 +00:00
Patrick Robertson f3f6b92817 Implementation test cleanup 2025-02-10 12:43:21 +00:00
Patrick Robertson 74207d7821 Implementation tests for auto-archiver 2025-02-10 13:27:11 +01:00
erinhmclark e9ad1e1b85 Pass media to storage cdn_call 2025-02-06 22:01:55 +00:00
erinhmclark 266c7a14e6 Context related fixes, some more tests. 2025-02-06 16:53:00 +00:00
erinhmclark 67504a683e Merge branch 'load_modules' into add_module_tests 2025-02-06 10:13:37 +00:00
erinhmclark 5b0bad832f Updated test, test metadata 2025-02-06 10:11:56 +00:00
Patrick Robertson 6ab8fd2ee4 Tidy up setting modules as Orchestrator attributes on startup.
Don't override the values in config['steps'] – the config should be left as is
2025-02-06 10:20:05 +01:00
erinhmclark 52542812dc Merge tests from version with context. 2025-02-05 16:42:58 +00:00
Patrick Robertson 78e6418249 Unit tests for csv feeder + fix some bugs 2025-02-04 13:37:26 +01:00
Patrick Robertson c25d5cae84 Remove ArchivingContext completely
Context for a specific url/item is now passed around via the metadata (metadata.set_context('key', 'val') and metadata.get_context('key', default='something')
The only other thing that was passed around in ArchivingContext was the storage info, which is already accessible now via self.config
2025-01-30 17:50:54 +01:00
Patrick Robertson d76063c3f3 Fix unit tests 2025-01-30 16:46:53 +01:00
Patrick Robertson d6b4b7a932 Further cleanup
* Removes (partly) the ArchivingOrchestrator
* Removes the cli_feeder module, and makes it the 'default', allowing you to pass URLs directly on the command line, without having to use the cumbersome --cli_feeder.urls. Just do auto-archiver https://my.url.com
* More unit tests
* Improved error handling
2025-01-30 16:44:40 +01:00
Patrick Robertson fade68c6f4 Fix up unit tests - dataclass + subclasses not having @dataclass was breaking it 2025-01-30 13:45:24 +01:00
Patrick Robertson b7d9145f6c Further tidyups + refactoring for new structure
* Add implementation tests for orchestrator + logging tests
* Standardise method/class vars for extractors to see if they are suitable
* Fix bugs with removing default loguru logger (allows further customisation)
* Fix bug loading required fields from file
*
2025-01-30 13:21:10 +01:00
Patrick Robertson 00a7018f36 Fix up dependency checking (use 'dependencies' instead of 'external_dependencies' -> simpler/easier to remember 2025-01-29 19:25:22 +01:00
Patrick Robertson 3d37c494aa Tidy ups + unit tests:
1. Allow loading modules from --module_paths=/extra/path/here
2. Improved unit tests for module loading
3. Further small tidy ups/clean ups
2025-01-29 18:42:49 +01:00
Patrick Robertson 4c1c8953ca Add unit tests for timestamping_enricher 2025-01-29 12:20:52 +01:00
Patrick Robertson 7a4871db6b Fix up unit tests for new structure 2025-01-28 14:40:12 +01:00
Patrick Robertson 14e2479599 Merge branch 'more_mainifests' into load_modules 2025-01-27 11:05:56 +01:00
erinhmclark aa7ca93a43 Update manifests and modules 2025-01-24 12:58:16 +00:00
Patrick Robertson 9befb9776c Fix loading modules when entry_point isn't set 2025-01-23 21:08:54 +01:00
Patrick Robertson b27bf8ffeb Fix up loading/storing configs + unit tests 2025-01-23 20:32:19 +01:00
erinhmclark 1274a1b231 More manifests, base modules and rename from archiver to extractor. 2025-01-23 16:40:48 +00:00
erinhmclark 79684f8348 Set up feeder manifests (not merged by source yet) 2025-01-23 09:16:42 +00:00
Patrick Robertson 241b35002c Initial changes to move to '__manifest__' format 2025-01-21 19:02:38 +01:00
Patrick Robertson d3e3eb7639 unit tests for loading dropins 2025-01-21 16:59:45 +01:00
Patrick Robertson dff0105659 Small fixups + implement Truth code for posts with multiple media 2025-01-20 18:40:46 +01:00
Patrick Robertson fd2e7f973b Further tidy-ups, also adds some ytdlp utils to 'utils' 2025-01-20 16:31:28 +01:00
Patrick Robertson befc92deb4 Further unit test tidy ups 2025-01-17 17:29:13 +01:00
Patrick Robertson d4893ee05e Fix unit tests for base_archiver->generic_archiver rename 2025-01-17 17:08:00 +01:00
Patrick Robertson 17c1c9c360 Fix up core unit tests when a twitter api key isn't provided 2025-01-17 12:02:38 +01:00
Patrick Robertson 394bcd8d47 Further refactoring of youtubedl_archiver->base_archiver
* Keep twitter_api_archiver
* Remove unit tests for obsolete archivers
* Guess filename of media using the 'Content-Type' header
* Add mechanism to run 'expensive' tests last (see conftest.py) and also flag expensive tests to fail straight off (pytest.mark.incremental)
2025-01-17 11:56:08 +01:00
Patrick Robertson 3168bed0d9 Add (skipped) test for twitter extraction with youtubedlp 2025-01-15 19:00:57 +01:00
Patrick Robertson 5626bba815 Add test on bluesky and note on why it doesn't work 2025-01-15 18:31:20 +01:00
Patrick Robertson 74cf1f5f23 Merge branch 'main' into youtubedlp-rewrite 2025-01-15 17:47:23 +01:00
Patrick Robertson 4f2b9baa73 refactor youtubedlp archiver to work for all valid websites
1. Extract more metadata
2. Better extract thumbnail
3. Setup framework for specific sites to provide more granular metadata processing
2025-01-15 17:46:47 +01:00
Patrick Robertson 20726c1116 Remove tiktok-downloader - getting info is broken
TODO: switch to using youtube-dlp
2025-01-14 17:40:45 +01:00
Patrick Robertson 6f10270baf Remove unittest and switch to pytest fully 2025-01-14 16:28:39 +01:00
Patrick Robertson 528b78db85 Flag tombstone tweets for twitter_syndication method 2025-01-13 18:17:24 +01:00
Patrick Robertson 57eacdc24a Merge branch 'main' into feat/unittest 2025-01-13 18:06:55 +01:00
Patrick Robertson bbef80de4c Add unit tests for html_formatter, csv_db 2025-01-13 17:58:10 +01:00
Patrick Robertson 63973e2ce7 switch to pytest and pytest-recording 2025-01-13 16:23:20 +01:00
Patrick Robertson e2bc84ccb9 Merge branch 'main' into feat/unittest 2025-01-13 13:15:13 +01:00
Patrick Robertson 3546d4ad79 Fix 'download_syndication' method for tweet archiving (now requires a token)
Plus add in unit tests for token generation + download syndication
2025-01-12 12:55:00 +01:00
Patrick Robertson c932fb7416 Improved logging when an invalid/deleted tweet is attempted to be downloaded
Plus: unit tests for non-existent tweet + invalid tweet ID
2025-01-12 12:00:45 +01:00
Patrick Robertson 9dc4eb35de Switch to pytest and use vcr for request storing 2025-01-08 11:25:13 +01:00
Patrick Robertson 8c044c15f0 Add base test class for archivers with boilerplate code
Plus: create test class for twitter archiver. Currently WIP
2025-01-08 10:38:56 +01:00
Miguel Sozinho Ramalho a697f0a212
adds an unauthenticated Bluesky archiver (#160)
* adds a TODO for next code iterations

* implements bsky archiver

* adds new archiver to example orchestration file

* Fix downloading media for posts with multiple images

(Images are stored in media/images)

* Setup a basic framework for unit tests

Use 'python -m unittest' from the project root to run

---------

Co-authored-by: Patrick Robertson <robertson.patrick@gmail.com>
2025-01-07 10:28:07 +00:00
Patrick Robertson 30d423c8e6 Setup a basic framework for unit tests
Use 'python -m unittest' from the project root to run
2024-12-31 14:29:52 +01:00