Wykres commitów

952 Commity (15da907e811ba3f26a641bfd24abf0ba6aa8565c)

Autor SHA1 Wiadomość Data
Patrick Robertson add83c9650 Remove snscrape from twitter_archiver
1. snscrape twitter downloader no longer works (ref: https://github.com/JustAnotherArchivist/snscrape/issues/1045)
2. snscrape limits python to versions <3.12
2025-01-07 19:40:19 +01:00
Miguel Sozinho Ramalho a697f0a212
adds an unauthenticated Bluesky archiver (#160)
* adds a TODO for next code iterations

* implements bsky archiver

* adds new archiver to example orchestration file

* Fix downloading media for posts with multiple images

(Images are stored in media/images)

* Setup a basic framework for unit tests

Use 'python -m unittest' from the project root to run

---------

Co-authored-by: Patrick Robertson <robertson.patrick@gmail.com>
2025-01-07 10:28:07 +00:00
Patrick Robertson bffa3a6254
Merge pull request #159 from bellingcat/print_pdf
Add 'print_pdf' option to the screenshot enricher. Fixes #132
2025-01-06 18:13:38 +01:00
Miguel Sozinho Ramalho ef471f41e1
adds better debug for wayback failures (#161) 2025-01-06 16:49:11 +00:00
Patrick Robertson 928518cda7
Allow setting cookies for yt-dl (#158) 2025-01-06 16:19:53 +00:00
Patrick Robertson 1bd017000e Add Github CI test workflow 2024-12-31 15:20:33 +01:00
Patrick Robertson 33e967ce4b Update pipfile for:
- pyopenssl==24.2.1
- youtube-dlp==2024.09.27
- numpy==2.1.3

Fixes building/local installs. Also fixes #155
2024-12-31 15:20:11 +01:00
Patrick Robertson 30d423c8e6 Setup a basic framework for unit tests
Use 'python -m unittest' from the project root to run
2024-12-31 14:29:52 +01:00
Patrick Robertson 0c803f15a5 Fix showing preview images in the .html file when using local storage
Local storage media urls are prefixed with '/', previously only http(s) media preview src were displayed
2024-12-31 09:29:31 +01:00
Patrick Robertson a46f9997ea Better logging when there's a timestamp parse error 2024-12-31 09:28:08 +01:00
msramalho 83da9ae089 adds pdf preview support for html formatter 2024-12-23 18:19:26 +00:00
Patrick Robertson 663c8ad93a Add 'print_pdf' option to the screenshot enricher. Fixes #132 2024-12-20 07:14:03 +01:00
msramalho e49550163f adds proxy_server option to wacz 2024-10-06 10:45:34 +06:00
msramalho e6f5981afc numpy version downgrade 2024-10-06 10:10:04 +06:00
msramalho c62bf1a34d yt-dlp version bump 2024-10-05 17:43:07 +06:00
msramalho b166d57e61 v0.12.0 bump 2024-08-21 13:34:34 +01:00
msramalho 11c3288267 closes #146 2024-08-21 13:33:58 +01:00
msramalho 004143a58a version bump v0.11.6 2024-07-18 11:27:39 +01:00
msramalho 686f0027c4 adds new entries to example orchestration file 2024-07-18 11:27:15 +01:00
dependabot[bot] b03cf32c73
Bump authlib from 1.3.0 to 1.3.1 (#144)
Bumps [authlib](https://github.com/lepture/authlib) from 1.3.0 to 1.3.1.
- [Release notes](https://github.com/lepture/authlib/releases)
- [Changelog](https://github.com/lepture/authlib/blob/master/docs/changelog.rst)
- [Commits](https://github.com/lepture/authlib/compare/v1.3.0...v1.3.1)

---
updated-dependencies:
- dependency-name: authlib
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-18 11:26:22 +01:00
msramalho dc9e64397e bumping yt-dlp 2024-07-18 11:23:09 +01:00
msramalho c7bc5e2988 cleanup 2024-05-15 11:04:29 +01:00
msramalho 1e375bd740 version bump 2024-05-14 16:42:15 +01:00
Miguel Sozinho Ramalho f8824691dd
refactors free twitter archiver strategies (#142) 2024-05-14 16:23:33 +01:00
msramalho 012cc36609 removes deprecated datetime method 2024-05-14 15:54:50 +01:00
Miguel Sozinho Ramalho 7cfe1e39cc
#135 fix cleanup of telethon session files (#139)
* closes #135

* version bump
2024-04-16 12:45:45 +01:00
Jett Chen cf8691bad7
Add yt-dlp based archiving for TwitterArchiver (#138)
* Add ytdlp archiving capability

* Add type annotation

* version bump

---------

Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
2024-04-15 19:54:55 +01:00
R. Miles McCain f603400d0d
Add direct Atlos integration (#137)
* Add Atlos feeder

* Add Atlos db

* Add Atlos storage

* Fix Atlos storages

* Fix Atlos feeder

* Only include URLs in Atlos feeder once they're processed

* Remove print

* Add Atlos documentation to README

* Formatting fixes

* Don't archive existing material

* avoid KeyError in atlos_db

* version bump

---------

Co-authored-by: msramalho <19508417+msramalho@users.noreply.github.com>
2024-04-15 19:25:17 +01:00
msramalho eb37f0b45b version bump 2024-04-15 19:02:54 +01:00
msramalho 75497f5773 minor bug fix when using an archiver_enricher in enrichers only 2024-04-15 19:02:40 +01:00
msramalho 623e555713 dependencies updates 2024-04-15 19:02:20 +01:00
msramalho 9c7824de57 browsertrix docker updates 2024-04-15 19:01:55 +01:00
msramalho f4827770e6 adds instagram no stories as success, and fix for telethon-based archivers. 2024-03-05 14:49:10 +00:00
msramalho 601572d76e strip url 2024-02-29 11:54:01 +00:00
msramalho d21e79a272 general security updates 2024-02-29 11:40:30 +00:00
msramalho ccf5f857ef adds configurable limits to instagram/youtube 2024-02-25 15:14:17 +00:00
msramalho 7de317d1b5 avoiding exception 2024-02-23 15:54:33 +00:00
msramalho 70075a1e5e improving insta archiver 2024-02-23 15:37:28 +00:00
msramalho 5b9bc4919a version bump 2024-02-23 14:08:23 +00:00
msramalho f0158ffd9c adds tagged posts and better parsing 2024-02-23 14:08:17 +00:00
msramalho bfb35a43a9 adds more details from yt-dlp 2024-02-23 14:08:05 +00:00
msramalho ef5b39c4f1 dind exception 2024-02-22 18:05:56 +00:00
msramalho 24ceafcb64 missing forward slash 2024-02-22 17:47:13 +00:00
msramalho 9fd4bb56a8 new attempt at dind wacz 2024-02-22 17:24:27 +00:00
msramalho 5324d562ba cleanup wacz patch 2024-02-21 18:14:30 +00:00
msramalho 5bf0a0206d version update 2024-02-21 17:26:07 +00:00
msramalho 4941823565 fix growing volume size in wacz_enricher 2024-02-21 17:25:55 +00:00
msramalho 27310c2911 fixes issue with api requests 2024-02-21 12:25:05 +00:00
msramalho eb973ba42d v0.9.1 fixes to bad parsing in ssl certificates 2024-02-20 19:31:19 +00:00
Miguel Sozinho Ramalho 7a21ae96af
V0.9.0 - closes several open issues: new enrichers and bug fixes (#133)
* clean orchestrator code, add archiver cleanup logic

* improves documentation for database.py

* telethon archivers isolate sessions into copied files

* closes #127

* closes #125

* closes #84

* meta enricher applies to all media

* closes #61 adds subtitles and comments

* minor update

* minor fixes to yt-dlp subtitles and comments

* closes #17 but logic is imperfect.

* closes #85 ssl enhancer

* minimifies html, JS refactor for preview of certificates

* closes #91 adds freetsa timestamp authority

* version bump

* simplify download_url method

* skip ssl if nothing archived

* html preview improvements

* adds retrying lib

* manual download archiver improvements

* meta only runs when relevant data available

* new metadata convenience method

* html template improvements

* removes debug message

* does not close #91 yet, will need a few more certificate chaing logging

* adds verbosity config

* new instagram api archiver

* adds proxy support we

* adds proxy/end support and bug fix for yt-dlp

* proxy support for webdriver

* adds socks proxy to wacz_enricher

* refactor recursivity in inner media and display

* infinite recursive display

* foolproofing timestamping authortities

* version to 0.9.0

* minor fixes from code-review
2024-02-20 18:05:29 +00:00