erinhmclark
c517d35bdf
Merge branch 'load_modules' into more_mainifests
...
# Conflicts:
# src/auto_archiver/databases/__init__.py
2025-01-22 18:19:43 +00:00
erinhmclark
99c8c69085
Manifests for databases
2025-01-22 18:18:13 +00:00
Patrick Robertson
ade5ea0f6f
Tidy up imports + start on loading modules - program now starts much faster
2025-01-22 18:45:58 +01:00
Patrick Robertson
b6b085854c
Switch back to using yaml with dot notation
...
(two simple helper functions to convert between dot and dict notation)
2025-01-22 17:40:51 +01:00
Patrick Robertson
54995ad6ab
Further tweaks based on __manifest__.py files
...
Loading configs now works
2025-01-22 13:11:43 +01:00
erinhmclark
7b3a1468cd
Create manifest files for archiver modules.
2025-01-22 10:21:27 +01:00
Patrick Robertson
4830f99300
Get parsing of manifest and combining with config file working
2025-01-21 20:03:10 +01:00
Patrick Robertson
241b35002c
Initial changes to move to '__manifest__' format
2025-01-21 19:02:38 +01:00
Patrick Robertson
03f3770223
Add __manifest__.py for generic_extractor
2025-01-21 18:00:45 +01:00
Patrick Robertson
bdfc855297
Ignore pylint statements for manifest files
2025-01-21 17:59:52 +01:00
Patrick Robertson
c41d93a634
Use already implemented helper to get version
2025-01-21 17:53:37 +01:00
Patrick Robertson
d4fff0b6eb
Merge pull request #175 from bellingcat/youtubedlp-rewrite
...
Create generic archiver for all valid youtube-dl URLs, add truthsocial extractor, unit tests for twitter_api extractor, utility methods for cleaning HTML and traversing objects
2025-01-21 17:33:39 +01:00
Patrick Robertson
cd2ae3763f
Minor adjustments
...
Co-authored-by: Miguel Sozinho Ramalho <19508417+msramalho@users.noreply.github.com>
2025-01-21 16:24:37 +00:00
Patrick Robertson
d3e3eb7639
unit tests for loading dropins
2025-01-21 16:59:45 +01:00
Patrick Robertson
9dde9b26d0
Patch in upstream changes to ytdlp for now
...
Seems like ytdlp may not merge https://github.com/yt-dlp/yt-dlp/pull/12098 anytime soon
2025-01-21 16:49:49 +01:00
Patrick Robertson
7c0dcbfd81
Re-add doc string to generic_archiver
...
(renamed from youtube_archiver)
2025-01-21 16:49:30 +01:00
Patrick Robertson
6388983815
Merge branch 'main' into youtubedlp-rewrite
2025-01-21 16:43:14 +01:00
Patrick Robertson
4bb4ebdf82
Further cleanup, abstracts 'dropins' out into generic files
2025-01-21 16:36:45 +01:00
Erin Clark
113a4db251
Merge pull request #177 from bellingcat/feat/documentation
...
Add Sphinx documentation and publish to RTD.
2025-01-21 09:54:41 +00:00
erinhmclark
e83ccc0d7f
Cleaning up configs reference and module level.
2025-01-21 09:48:46 +00:00
Patrick Robertson
dff0105659
Small fixups + implement Truth code for posts with multiple media
2025-01-20 18:40:46 +01:00
Patrick Robertson
fd2e7f973b
Further tidy-ups, also adds some ytdlp utils to 'utils'
2025-01-20 16:31:28 +01:00
Patrick Robertson
befc92deb4
Further unit test tidy ups
2025-01-17 17:29:13 +01:00
Patrick Robertson
d4893ee05e
Fix unit tests for base_archiver->generic_archiver rename
2025-01-17 17:08:00 +01:00
Patrick Robertson
9c5a9e1bcd
Rename BaseArchiver to GenericArchiver + some other tidyups
2025-01-17 17:06:04 +01:00
Patrick Robertson
5aa717452e
Quick test that the app actually runs in core tests
2025-01-17 17:02:54 +01:00
Patrick Robertson
5b20288d06
Add a 'version' arg to get the current running version
2025-01-17 16:59:57 +01:00
Patrick Robertson
59eb8f7520
Add TWITTER_BEARER_TOKEN to env for running download tests
2025-01-17 12:04:40 +01:00
Patrick Robertson
17c1c9c360
Fix up core unit tests when a twitter api key isn't provided
2025-01-17 12:02:38 +01:00
Patrick Robertson
394bcd8d47
Further refactoring of youtubedl_archiver->base_archiver
...
* Keep twitter_api_archiver
* Remove unit tests for obsolete archivers
* Guess filename of media using the 'Content-Type' header
* Add mechanism to run 'expensive' tests last (see conftest.py) and also flag expensive tests to fail straight off (pytest.mark.incremental)
2025-01-17 11:56:08 +01:00
erinhmclark
170f8d18a6
Add instructions to README.md, include build directories in .gitignore and do a bit more tidying,
2025-01-16 20:46:10 +00:00
Erin Clark
f03ec42026
Merge pull request #174 from bellingcat/version_updates
...
Update versions for GH Actions and Geckodriver.
2025-01-16 14:31:26 +00:00
erinhmclark
6fabe2a189
Fixed twitter_archiver.py changes.
2025-01-16 09:56:54 +00:00
erinhmclark
a6aacfa3fb
Add example pre-generated configs.rst
2025-01-16 09:31:50 +00:00
erinhmclark
bbb3269c2b
Changes from main.
2025-01-16 09:30:32 +00:00
erinhmclark
235da33a1a
Update .readthedocs.yaml path
2025-01-16 09:24:46 +00:00
erinhmclark
d3eec5d90f
Basic docs structure for RTD
2025-01-15 21:45:29 +00:00
Patrick Robertson
3168bed0d9
Add (skipped) test for twitter extraction with youtubedlp
2025-01-15 19:00:57 +01:00
erinhmclark
33686ea851
Update versions for GH Actions and Geckodriver.
2025-01-15 17:35:42 +00:00
Patrick Robertson
5626bba815
Add test on bluesky and note on why it doesn't work
2025-01-15 18:31:20 +01:00
Patrick Robertson
3ff7a9444d
Update yt-dlp to latest version (2025.1.12) to add bsky support
2025-01-15 17:58:07 +01:00
Patrick Robertson
74cf1f5f23
Merge branch 'main' into youtubedlp-rewrite
2025-01-15 17:47:23 +01:00
Patrick Robertson
4f2b9baa73
refactor youtubedlp archiver to work for all valid websites
...
1. Extract more metadata
2. Better extract thumbnail
3. Setup framework for specific sites to provide more granular metadata processing
2025-01-15 17:46:47 +01:00
Patrick Robertson
c3dd19f309
Sniff filetype of downloaded media and add extension
...
Also download in chunks - fixes 2 x TODOs
2025-01-15 17:46:47 +01:00
Patrick Robertson
05e0c9de93
Merge pull request #169 from bellingcat/remove_dependencies
...
Tidy up and remove dependencies
2025-01-15 17:16:30 +01:00
Patrick Robertson
73b1a3902c
Merge pull request #172 from bellingcat/docker_compose
...
Add docker-compose for easy building and running of docker image in dev
2025-01-15 17:16:22 +01:00
Patrick Robertson
100996f1e5
Add docker-compose for easy building and running of docker image in dev
...
Just use docker compose up
2025-01-15 14:36:02 +01:00
Patrick Robertson
74a4a24a23
Remove toml - unused
...
(pytest etc. use tomli, which is instlled)
2025-01-14 18:13:27 +01:00
Patrick Robertson
306df62a98
Fix all instances of utcnow()
2025-01-14 17:51:41 +01:00
Patrick Robertson
20726c1116
Remove tiktok-downloader - getting info is broken
...
TODO: switch to using youtube-dlp
2025-01-14 17:40:45 +01:00