msramalho
cd6a2b6031
generic_extractor download tests adaptations
2025-06-11 20:05:35 +01:00
msramalho
d7a48e465b
fix copypasta
2025-06-11 18:04:49 +01:00
msramalho
f5be7a50c1
Testing Linkedin Dropin for Antibot
2025-06-11 16:52:03 +01:00
msramalho
3cf51dd874
adds tracker remove feature and tests
2025-06-11 11:56:42 +01:00
msramalho
69ddb72146
separate reddit tests
2025-06-11 11:27:11 +01:00
msramalho
1039e9631f
new reddit tests with .env.test
2025-06-11 11:22:23 +01:00
msramalho
8314833ae8
removes exclude_media_extensions option
2025-06-10 18:34:33 +01:00
msramalho
6bbc7fb47a
improves antibot flow and makes auth_wall detection optional
2025-06-10 16:29:07 +01:00
msramalho
287e823f43
improves twitter URL cleaning and introduces another bestquality check
2025-06-10 16:09:38 +01:00
msramalho
c815488daa
adds new URLs to ignore
2025-06-10 15:44:52 +01:00
msramalho
6f02493ff1
adds clips extraction to VK, though generic_extractor should still be run for those
2025-06-08 14:36:55 +01:00
msramalho
d13a5ef003
adds tests in minor improvements
2025-06-07 19:58:18 +01:00
msramalho
5491f3e9e7
fixing s3 storage tests
2025-06-04 14:41:00 +01:00
msramalho
264ba82ea0
finish removing screenshot_enricher references
2025-06-04 14:31:07 +01:00
msramalho
2c6be4447f
linting
2025-06-04 14:17:38 +01:00
msramalho
22408e2a98
adds test for antibot
2025-06-04 11:59:59 +01:00
msramalho
cbd189c97d
general cleanup
2025-06-04 11:53:01 +01:00
Miguel Sozinho Ramalho
6735fa890b
v1.0.1 dependency updates, generic extractor improvements ( #307 )
...
* wacz: allow exceptional cases where more than one resource image is available
* improves generic extractor edge-cases and yt-dlp updates
* REMOVES vk_extractor until further notice
* bumps browsertrix in docker image
* npm version bump on scripts/settings
* poetry updates
* Changed log level on gsheet_feeder_db started from warning to info (#301 )
* closes 305 and further fixes finding local downloads from uncommon ytdlp extractors
* use ffmpeg -bitexact to reduce duplicate content storing
* formatting
* adds yt-dlp curl-cffi
* version bump
* linting
---------
Co-authored-by: Dave Mateer <davemateer@gmail.com>
2025-06-02 20:57:12 +01:00
erinhmclark
144adaad5b
Only return success for instagram_tbot_extractor.py with content.
2025-03-31 14:14:36 +01:00
erinhmclark
c510c04643
Update config reference in test_generic_extractor.py
2025-03-28 13:43:46 +00:00
erinhmclark
dbcf19d1b8
Update update path reference
2025-03-28 10:55:21 +00:00
erinhmclark
0840b7283c
Format
2025-03-28 10:43:00 +00:00
erinhmclark
b5dc1854a2
Merge branch 'main' into feat/yt-dlp-pots
2025-03-28 10:42:24 +00:00
erinhmclark
efab0f9a91
Add test
2025-03-28 10:37:22 +00:00
Patrick Robertson
b7949a489f
Simplify telethon unit tests for CI (don't use TestExtractorBase - it causes loading issues)
2025-03-26 23:51:21 +04:00
Patrick Robertson
e0e9f93065
Skip update checks for ytdlp when running tests
2025-03-26 23:41:20 +04:00
Patrick Robertson
e06b0c0585
Skip checking if docker is running for tests + more graceful test for filename
2025-03-26 23:03:48 +04:00
Patrick Robertson
95ea9fb231
Telethon unit tests + tidyup
2025-03-26 22:53:27 +04:00
Patrick Robertson
17d2d14680
Fix running 'cleanup' method on extractors that fail to start
2025-03-26 22:52:52 +04:00
erinhmclark
7872d9356c
Merge branch 'main' into feat/yt-dlp-pots
2025-03-26 17:00:38 +00:00
Patrick Robertson
d6be1ff84f
Merge branch 'main' into timestamping_rewrite
2025-03-26 14:37:51 +04:00
erinhmclark
040a864d5c
Merge branch 'refs/heads/main' into feat/yt-dlp-pots
...
# Conflicts:
# poetry.lock
2025-03-25 18:26:43 +00:00
erinhmclark
b4c33318c4
Merge branch 'main' into feat/yt-dlp-pots
...
# Conflicts:
# src/auto_archiver/modules/generic_extractor/__manifest__.py
# tests/test_modules.py
2025-03-25 15:16:31 +00:00
Patrick Robertson
a9fe959ea1
Fix unit tests for latest yt-dlp
...
(Yt-dlp title is now truncated)
2025-03-24 17:48:15 +04:00
Patrick Robertson
31fa7380f5
Fix up unit tests + issue when working with self-signed certs
2025-03-24 16:00:40 +04:00
Patrick Robertson
396ec03bae
Tidy up unit tests further + make more non-download
2025-03-24 15:26:22 +04:00
Patrick Robertson
e811196711
Ruff fixes
2025-03-24 15:10:46 +04:00
Patrick Robertson
dfde6f1995
Merge main into timestamping_enricher
2025-03-24 15:09:29 +04:00
Patrick Robertson
a066bf4ca9
Clean up comments
2025-03-21 14:47:50 +04:00
Patrick Robertson
14c56f4916
Provide better logs for screenshot enricher when auth is/isn't supported (cookies only)
2025-03-21 12:05:47 +04:00
Patrick Robertson
168dfb6254
Unit tests for url utils
2025-03-21 11:53:47 +04:00
Patrick Robertson
e6c5705f70
Merge pull request #261 from bellingcat/wacz_separate_profile
...
Wacz minor adjustments
2025-03-20 15:51:56 +00:00
Erin Clark
613ba0c05d
Merge pull request #262 from bellingcat/generic_extractor_args
...
Add flexible extractor_args to generic_extractor.py
This allows users to pass any of the options listed [here](https://github.com/yt-dlp/yt-dlp/blob/master/README.md#extractor-arguments ) to yt-dlp extractor_args.
example usage:
```
generic_extractor:
facebook_cookie:
...
extractor_args:
youtube:
player_client: web,tv
generic:
is_live: true
```
2025-03-20 15:38:20 +00:00
erinhmclark
54f53886ef
Update tests for default config values
2025-03-20 14:57:26 +00:00
Patrick Robertson
0a5ba3385e
Fix small bug in twitter dropin
...
- previously the 'content' was being set to a json dump of the tweet, it should be set to full_text
2025-03-20 18:55:22 +04:00
Patrick Robertson
6700250891
Add a test for checking module type on setup
2025-03-20 18:18:53 +04:00
Patrick Robertson
1e19ad77c6
Fix tests
2025-03-20 18:08:19 +04:00
Patrick Robertson
f22af5e123
Tweak WACZ enricher docs + add comment on WACZ_ENABLE_DOCKER
2025-03-20 16:48:30 +04:00
erinhmclark
2921061fde
Add flexible extractor_args to generic_extractor.py
2025-03-19 19:19:28 +00:00
erinhmclark
675de50ee7
Update module test to test for default config keys within loaded
2025-03-19 10:47:28 +00:00