Wykres commitów

627 Commity (main)

Autor SHA1 Wiadomość Data
Miguel Sozinho Ramalho 7cfe1e39cc
#135 fix cleanup of telethon session files (#139)
* closes #135

* version bump
2024-04-16 12:45:45 +01:00
Jett Chen cf8691bad7
Add yt-dlp based archiving for TwitterArchiver (#138)
* Add ytdlp archiving capability

* Add type annotation

* version bump

---------

Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
2024-04-15 19:54:55 +01:00
R. Miles McCain f603400d0d
Add direct Atlos integration (#137)
* Add Atlos feeder

* Add Atlos db

* Add Atlos storage

* Fix Atlos storages

* Fix Atlos feeder

* Only include URLs in Atlos feeder once they're processed

* Remove print

* Add Atlos documentation to README

* Formatting fixes

* Don't archive existing material

* avoid KeyError in atlos_db

* version bump

---------

Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
2024-04-15 19:25:17 +01:00
msramalho eb37f0b45b version bump 2024-04-15 19:02:54 +01:00
msramalho 75497f5773 minor bug fix when using an archiver_enricher in enrichers only 2024-04-15 19:02:40 +01:00
msramalho 623e555713 dependencies updates 2024-04-15 19:02:20 +01:00
msramalho 9c7824de57 browsertrix docker updates 2024-04-15 19:01:55 +01:00
msramalho f4827770e6 adds instagram no stories as success, and fix for telethon-based archivers. 2024-03-05 14:49:10 +00:00
msramalho 601572d76e strip url 2024-02-29 11:54:01 +00:00
msramalho d21e79a272 general security updates 2024-02-29 11:40:30 +00:00
msramalho ccf5f857ef adds configurable limits to instagram/youtube 2024-02-25 15:14:17 +00:00
msramalho 7de317d1b5 avoiding exception 2024-02-23 15:54:33 +00:00
msramalho 70075a1e5e improving insta archiver 2024-02-23 15:37:28 +00:00
msramalho 5b9bc4919a version bump 2024-02-23 14:08:23 +00:00
msramalho f0158ffd9c adds tagged posts and better parsing 2024-02-23 14:08:17 +00:00
msramalho bfb35a43a9 adds more details from yt-dlp 2024-02-23 14:08:05 +00:00
msramalho ef5b39c4f1 dind exception 2024-02-22 18:05:56 +00:00
msramalho 24ceafcb64 missing forward slash 2024-02-22 17:47:13 +00:00
msramalho 9fd4bb56a8 new attempt at dind wacz 2024-02-22 17:24:27 +00:00
msramalho 5324d562ba cleanup wacz patch 2024-02-21 18:14:30 +00:00
msramalho 5bf0a0206d version update 2024-02-21 17:26:07 +00:00
msramalho 4941823565 fix growing volume size in wacz_enricher 2024-02-21 17:25:55 +00:00
msramalho 27310c2911 fixes issue with api requests 2024-02-21 12:25:05 +00:00
msramalho eb973ba42d v0.9.1 fixes to bad parsing in ssl certificates 2024-02-20 19:31:19 +00:00
Miguel Sozinho Ramalho 7a21ae96af
V0.9.0 - closes several open issues: new enrichers and bug fixes (#133)
* clean orchestrator code, add archiver cleanup logic

* improves documentation for database.py

* telethon archivers isolate sessions into copied files

* closes #127

* closes #125

* closes #84

* meta enricher applies to all media

* closes #61 adds subtitles and comments

* minor update

* minor fixes to yt-dlp subtitles and comments

* closes #17 but logic is imperfect.

* closes #85 ssl enhancer

* minimifies html, JS refactor for preview of certificates

* closes #91 adds freetsa timestamp authority

* version bump

* simplify download_url method

* skip ssl if nothing archived

* html preview improvements

* adds retrying lib

* manual download archiver improvements

* meta only runs when relevant data available

* new metadata convenience method

* html template improvements

* removes debug message

* does not close #91 yet, will need a few more certificate chaing logging

* adds verbosity config

* new instagram api archiver

* adds proxy support we

* adds proxy/end support and bug fix for yt-dlp

* proxy support for webdriver

* adds socks proxy to wacz_enricher

* refactor recursivity in inner media and display

* infinite recursive display

* foolproofing timestamping authortities

* version to 0.9.0

* minor fixes from code-review
2024-02-20 18:05:29 +00:00
msramalho 5c49124ac6 Merge branch 'main' of https://github.com/bellingcat/auto-archiver 2024-02-13 15:44:53 +00:00
Kai b9d71d0b3f
Change submit-archive from basic to bearer auth (#128) 2024-02-06 15:24:15 +00:00
msramalho b9b831ce03 v8.0.1 2024-02-01 15:08:55 +00:00
msramalho 2a773a25e8 better handling of telethon data display 2024-02-01 15:08:23 +00:00
msramalho 719645fc2d minor improvement to html_template 2024-02-01 15:03:00 +00:00
Chu-An, Huang 71fcf5a089
fix: Correct the path of service account in google drive settings (#123)
* fix: Correct the path of service account in yaml file

* fix: Remove redefined function

* Update src/auto_archiver/storages/gd.py

* fix: remove unwanted drafting code

---------

Co-authored-by: Miguel Sozinho Ramalho <19508417+msramalho@users.noreply.github.com>
2024-02-01 15:02:04 +00:00
Tomas Apodaca 590d3fe824
Fix typo in readme (#121) 2024-01-24 21:17:31 +00:00
Miguel Sozinho Ramalho e6b6b83007
0.8.0 new features and dependency updates (#119)
* wacz can extract_screenshot only

* new meta enricher

* twitter api can use multiple authentication tokens in sequence

* cleanup non-dup logic

* meta info on archive duration

* minor html report update

* updated dependencies

* new version
2023-12-20 14:13:22 +00:00
msramalho 499832d146 fix datetime parsing 2023-12-13 18:41:48 +00:00
msramalho fa1163532b patching now optional value 2023-12-13 13:55:31 +00:00
msramalho 96f6ea8f09 v0.7.8 2023-12-13 13:03:39 +00:00
Miguel Sozinho Ramalho ff17dfd0aa
enables option to toggle db api writes (#118) 2023-12-13 12:54:47 +00:00
msramalho 0a3053bbc7 version update 2023-12-13 11:29:13 +00:00
Miguel Sozinho Ramalho e69660be82
chooses most complete result from api (#117) 2023-12-13 11:28:27 +00:00
Miguel Sozinho Ramalho a786d4bb0e
chooses most complete result from api (#116) 2023-12-13 11:26:46 +00:00
Miguel Sozinho Ramalho 128d4136e3
fixes empty api search results (#115) 2023-12-13 10:51:25 +00:00
Miguel Sozinho Ramalho 98fb574d89
fixing older db entries formats (#114) 2023-12-12 22:47:54 +00:00
Miguel Sozinho Ramalho 6f36e92e02
enables api_db cache queries if configured with new option (#113) 2023-12-12 19:20:26 +00:00
Miguel Sozinho Ramalho 3e56ef137d
reduce s3 duplicating while keeping random urls via hash (#112) 2023-12-12 19:12:03 +00:00
Jett Chen 9ee323a654
Set _mimetype for final media of html formatter (#111) 2023-12-11 11:47:04 +00:00
Kai 9eb39943c7
Extract text in wacz_enricher (#110) 2023-12-05 22:24:12 +00:00
msramalho 8624e9f177 version update 0.7.1 2023-11-13 11:58:43 +01:00
Galen Reich 381940f5a8
Fix Selenium headless invokation (#106)
Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
2023-11-13 11:56:35 +01:00
msramalho 1382f8b795 version bump and release without commit 2023-09-22 10:18:58 +01:00
Dave Mateer fac8364762
Updated gd.py to work with shared folders (#102)
Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
2023-09-22 10:17:54 +01:00