erinhmclark
1792e02d1d
skip authenticated tests in test_gdrive_storage.py
2025-02-11 11:34:36 +00:00
erinhmclark
18666ff027
skip authenticated tests in test_gsheet_feeder.py
2025-02-11 11:28:24 +00:00
erinhmclark
a69ac3e509
Fix file hash reference in S3 tests
2025-02-11 09:46:22 +00:00
erinhmclark
c4bb667cec
Merge branch 'load_modules' into add_module_tests
...
# Conflicts:
# src/auto_archiver/modules/s3_storage/s3_storage.py
# src/auto_archiver/utils/gsheet.py
# src/auto_archiver/utils/misc.py
2025-02-10 16:17:08 +00:00
erinhmclark
f311621e58
Small fixes.
...
Add timestamp helper method.
2025-02-10 15:57:42 +00:00
Patrick Robertson
f3f6b92817
Implementation test cleanup
2025-02-10 12:43:21 +00:00
Patrick Robertson
74207d7821
Implementation tests for auto-archiver
2025-02-10 13:27:11 +01:00
erinhmclark
e9ad1e1b85
Pass media to storage cdn_call
2025-02-06 22:01:55 +00:00
erinhmclark
266c7a14e6
Context related fixes, some more tests.
2025-02-06 16:53:00 +00:00
erinhmclark
67504a683e
Merge branch 'load_modules' into add_module_tests
2025-02-06 10:13:37 +00:00
erinhmclark
5b0bad832f
Updated test, test metadata
2025-02-06 10:11:56 +00:00
Patrick Robertson
6ab8fd2ee4
Tidy up setting modules as Orchestrator attributes on startup.
...
Don't override the values in config['steps'] – the config should be left as is
2025-02-06 10:20:05 +01:00
erinhmclark
52542812dc
Merge tests from version with context.
2025-02-05 16:42:58 +00:00
Patrick Robertson
78e6418249
Unit tests for csv feeder + fix some bugs
2025-02-04 13:37:26 +01:00
Patrick Robertson
c25d5cae84
Remove ArchivingContext completely
...
Context for a specific url/item is now passed around via the metadata (metadata.set_context('key', 'val') and metadata.get_context('key', default='something')
The only other thing that was passed around in ArchivingContext was the storage info, which is already accessible now via self.config
2025-01-30 17:50:54 +01:00
Patrick Robertson
d76063c3f3
Fix unit tests
2025-01-30 16:46:53 +01:00
Patrick Robertson
d6b4b7a932
Further cleanup
...
* Removes (partly) the ArchivingOrchestrator
* Removes the cli_feeder module, and makes it the 'default', allowing you to pass URLs directly on the command line, without having to use the cumbersome --cli_feeder.urls. Just do auto-archiver https://my.url.com
* More unit tests
* Improved error handling
2025-01-30 16:44:40 +01:00
Patrick Robertson
fade68c6f4
Fix up unit tests - dataclass + subclasses not having @dataclass was breaking it
2025-01-30 13:45:24 +01:00
Patrick Robertson
b7d9145f6c
Further tidyups + refactoring for new structure
...
* Add implementation tests for orchestrator + logging tests
* Standardise method/class vars for extractors to see if they are suitable
* Fix bugs with removing default loguru logger (allows further customisation)
* Fix bug loading required fields from file
*
2025-01-30 13:21:10 +01:00
Patrick Robertson
00a7018f36
Fix up dependency checking (use 'dependencies' instead of 'external_dependencies' -> simpler/easier to remember
2025-01-29 19:25:22 +01:00
Patrick Robertson
3d37c494aa
Tidy ups + unit tests:
...
1. Allow loading modules from --module_paths=/extra/path/here
2. Improved unit tests for module loading
3. Further small tidy ups/clean ups
2025-01-29 18:42:49 +01:00
Patrick Robertson
4c1c8953ca
Add unit tests for timestamping_enricher
2025-01-29 12:20:52 +01:00
Patrick Robertson
7a4871db6b
Fix up unit tests for new structure
2025-01-28 14:40:12 +01:00
Patrick Robertson
14e2479599
Merge branch 'more_mainifests' into load_modules
2025-01-27 11:05:56 +01:00
erinhmclark
aa7ca93a43
Update manifests and modules
2025-01-24 12:58:16 +00:00
Patrick Robertson
9befb9776c
Fix loading modules when entry_point isn't set
2025-01-23 21:08:54 +01:00
Patrick Robertson
b27bf8ffeb
Fix up loading/storing configs + unit tests
2025-01-23 20:32:19 +01:00
erinhmclark
1274a1b231
More manifests, base modules and rename from archiver to extractor.
2025-01-23 16:40:48 +00:00
erinhmclark
79684f8348
Set up feeder manifests (not merged by source yet)
2025-01-23 09:16:42 +00:00
Patrick Robertson
241b35002c
Initial changes to move to '__manifest__' format
2025-01-21 19:02:38 +01:00
Patrick Robertson
d3e3eb7639
unit tests for loading dropins
2025-01-21 16:59:45 +01:00
Patrick Robertson
dff0105659
Small fixups + implement Truth code for posts with multiple media
2025-01-20 18:40:46 +01:00
Patrick Robertson
fd2e7f973b
Further tidy-ups, also adds some ytdlp utils to 'utils'
2025-01-20 16:31:28 +01:00
Patrick Robertson
befc92deb4
Further unit test tidy ups
2025-01-17 17:29:13 +01:00
Patrick Robertson
d4893ee05e
Fix unit tests for base_archiver->generic_archiver rename
2025-01-17 17:08:00 +01:00
Patrick Robertson
17c1c9c360
Fix up core unit tests when a twitter api key isn't provided
2025-01-17 12:02:38 +01:00
Patrick Robertson
394bcd8d47
Further refactoring of youtubedl_archiver->base_archiver
...
* Keep twitter_api_archiver
* Remove unit tests for obsolete archivers
* Guess filename of media using the 'Content-Type' header
* Add mechanism to run 'expensive' tests last (see conftest.py) and also flag expensive tests to fail straight off (pytest.mark.incremental)
2025-01-17 11:56:08 +01:00
Patrick Robertson
3168bed0d9
Add (skipped) test for twitter extraction with youtubedlp
2025-01-15 19:00:57 +01:00
Patrick Robertson
5626bba815
Add test on bluesky and note on why it doesn't work
2025-01-15 18:31:20 +01:00
Patrick Robertson
74cf1f5f23
Merge branch 'main' into youtubedlp-rewrite
2025-01-15 17:47:23 +01:00
Patrick Robertson
4f2b9baa73
refactor youtubedlp archiver to work for all valid websites
...
1. Extract more metadata
2. Better extract thumbnail
3. Setup framework for specific sites to provide more granular metadata processing
2025-01-15 17:46:47 +01:00
Patrick Robertson
20726c1116
Remove tiktok-downloader - getting info is broken
...
TODO: switch to using youtube-dlp
2025-01-14 17:40:45 +01:00
Patrick Robertson
6f10270baf
Remove unittest and switch to pytest fully
2025-01-14 16:28:39 +01:00
Patrick Robertson
528b78db85
Flag tombstone tweets for twitter_syndication method
2025-01-13 18:17:24 +01:00
Patrick Robertson
57eacdc24a
Merge branch 'main' into feat/unittest
2025-01-13 18:06:55 +01:00
Patrick Robertson
bbef80de4c
Add unit tests for html_formatter, csv_db
2025-01-13 17:58:10 +01:00
Patrick Robertson
63973e2ce7
switch to pytest and pytest-recording
2025-01-13 16:23:20 +01:00
Patrick Robertson
e2bc84ccb9
Merge branch 'main' into feat/unittest
2025-01-13 13:15:13 +01:00
Patrick Robertson
3546d4ad79
Fix 'download_syndication' method for tweet archiving (now requires a token)
...
Plus add in unit tests for token generation + download syndication
2025-01-12 12:55:00 +01:00
Patrick Robertson
c932fb7416
Improved logging when an invalid/deleted tweet is attempted to be downloaded
...
Plus: unit tests for non-existent tweet + invalid tweet ID
2025-01-12 12:00:45 +01:00
Patrick Robertson
9dc4eb35de
Switch to pytest and use vcr for request storing
2025-01-08 11:25:13 +01:00
Patrick Robertson
8c044c15f0
Add base test class for archivers with boilerplate code
...
Plus: create test class for twitter archiver. Currently WIP
2025-01-08 10:38:56 +01:00
Miguel Sozinho Ramalho
a697f0a212
adds an unauthenticated Bluesky archiver ( #160 )
...
* adds a TODO for next code iterations
* implements bsky archiver
* adds new archiver to example orchestration file
* Fix downloading media for posts with multiple images
(Images are stored in media/images)
* Setup a basic framework for unit tests
Use 'python -m unittest' from the project root to run
---------
Co-authored-by: Patrick Robertson <robertson.patrick@gmail.com>
2025-01-07 10:28:07 +00:00
Patrick Robertson
30d423c8e6
Setup a basic framework for unit tests
...
Use 'python -m unittest' from the project root to run
2024-12-31 14:29:52 +01:00