Wykres commitów

20 Commity (main)

Autor SHA1 Wiadomość Data
msramalho cd19181d8f
minor improvements 2025-06-11 16:51:42 +01:00
msramalho 3cf51dd874
adds tracker remove feature and tests 2025-06-11 11:56:42 +01:00
msramalho 8314833ae8
removes exclude_media_extensions option 2025-06-10 18:34:33 +01:00
msramalho 287e823f43
improves twitter URL cleaning and introduces another bestquality check 2025-06-10 16:09:38 +01:00
msramalho c815488daa
adds new URLs to ignore 2025-06-10 15:44:52 +01:00
Patrick Robertson 168dfb6254 Unit tests for url utils 2025-03-21 11:53:47 +04:00
erinhmclark 85abe1837a Ruff format with defaults. 2025-03-10 18:44:54 +00:00
Patrick Robertson 7734a551fa Move 'assert_valid_url' out into utils, don't use assert but raise
assert is recommended only for debugging
2025-02-20 11:19:29 +00:00
Patrick Robertson c574b694ed Set up screenshot enricher to use authentication/cookies 2025-02-03 17:25:59 +01:00
Patrick Robertson b7d9145f6c Further tidyups + refactoring for new structure
* Add implementation tests for orchestrator + logging tests
* Standardise method/class vars for extractors to see if they are suitable
* Fix bugs with removing default loguru logger (allows further customisation)
* Fix bug loading required fields from file
*
2025-01-30 13:21:10 +01:00
Galen Reich 381940f5a8
Fix Selenium headless invokation (#106)
Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
2023-11-13 11:56:35 +01:00
msramalho ceb717ea65 exclude vk emojis 2023-08-17 18:11:26 +01:00
msramalho 6e4fb76940 exclude ok resource images from wacz enricher 2023-08-09 11:26:46 +01:00
msramalho 60a1f3a27a minor fixes 2023-07-31 16:08:48 +01:00
msramalho fb197f1064 excluding telegram embeds 2023-07-28 12:57:15 +01:00
msramalho aa71c85a98 improving ignored content from waczs 2023-07-28 12:19:14 +01:00
msramalho 59551b3b20 minor improvements: finding best twitter image quality 2023-07-27 21:36:15 +01:00
msramalho f086d89111 new escape message 2023-07-27 20:14:59 +01:00
msramalho dd034da844 feat: WACZ enricher can now be probed for media, and used as an archiver OR enricher 2023-07-27 15:42:10 +01:00
msramalho 5505255ea3 url auth wall detect 2023-02-17 15:45:58 +00:00