Wykres commitów

368 Commity (1b51f49d8f6b8cfd30fd244bb267a54786a22e0f)

Autor SHA1 Wiadomość Data
msramalho 63d1abbe4b tiktok archiver though info is no longer working 2023-01-18 16:56:35 +00:00
msramalho 1def8bb03d instagram archiver 2023-01-18 16:16:23 +00:00
msramalho 725bab8240 twitter archivers 2023-01-18 00:15:18 +00:00
msramalho f1bc83818d template updates 2023-01-17 17:01:25 +00:00
msramalho 47dc788143 thumbnails enricher 2023-01-17 16:29:27 +00:00
msramalho 74e50eccf1 hash enricher and media refactor 2023-01-13 02:12:08 +00:00
msramalho 6ca46417fe local storage + multiple storage support 2023-01-12 02:09:39 +00:00
msramalho 0cb593fd21 wayback enricher ready 2023-01-11 00:03:47 +00:00
msramalho d4825196f1 html template working with jinja templates 2023-01-10 00:22:16 +00:00
msramalho aac16fa8c2 minor comments 2023-01-09 22:24:44 +00:00
msramalho 1cdc006b27 s3 storaging + WIP gsheets DB 2023-01-04 18:02:44 +00:00
msramalho bb512b36c9 gsheet feeder + db WIP 2023-01-04 16:37:36 +00:00
msramalho 96845305a3 media concept implemented 2022-12-14 19:01:20 +00:00
msramalho 9c056d001c merge logic started 2022-12-14 16:11:06 +00:00
msramalho 53ffa2d4ae telethon_archiver working for multiple media 2022-12-14 15:37:34 +00:00
msramalho b3860cfec1 telethon join channels working 2022-12-14 14:01:39 +00:00
msramalho 955891a411 WIP feeder 2022-12-10 12:03:46 +00:00
msramalho 9dc709d3b9 demo feeder logic working 2022-11-24 15:44:25 +00:00
msramalho 618e7ed0a3 subproperties in config 2022-11-24 11:53:21 +00:00
msramalho 65dd155c90 WIP refactor logic 2022-11-15 15:00:52 +00:00
msramalho 6a0ce5ced1 orchestrator design structure 2022-11-11 02:08:48 +00:00
msramalho 04263094ad WIP docker changes for cli and auto_archiver 2022-11-10 17:46:40 +00:00
msramalho 390b84eb22 dockerization complete 2022-11-08 15:55:33 +00:00
msramalho 81eadd4672 disable browsertrix on docker, see #66 2022-11-08 14:22:13 +00:00
msramalho a8f7055696 reduces uncontrolled exceptions 2022-11-08 13:59:59 +00:00
msramalho 09f47383a3 dockerfile improvements 2022-11-08 13:59:35 +00:00
msramalho 629cd586db adds session_file for missing archivers 2022-11-08 13:59:09 +00:00
msramalho 889eb1d270 Merge branch 'dev' into dockerize 2022-11-02 17:01:00 +00:00
msramalho 50e03ba565 closes #65 with simpler solution 2022-11-02 16:59:44 +00:00
msramalho a9df992f66 WiP 2022-11-02 16:51:32 +00:00
msramalho c8fa077df7 docker initial files 2022-10-31 17:10:55 +00:00
msramalho 29e1872e87 fix: rm stopped containers only 2022-10-31 10:41:27 +00:00
msramalho 7a700acd8e hotfix for #65 2022-10-31 10:35:01 +00:00
msramalho 22363cb8b9 adds information on browsertrix usage 2022-10-20 11:59:23 +01:00
msramalho ac4f1b6132 readme updates 2022-10-19 11:37:04 +01:00
msramalho 4d2b7b4040 reverse order of login attempts 2022-10-19 11:27:17 +01:00
msramalho 54c572258c fix tty 2022-10-18 17:46:40 +01:00
msramalho 6c80a5b82d session file logic 2022-10-18 17:35:59 +01:00
msramalho 63f53358d3 adds traceback 2022-10-18 16:38:12 +01:00
msramalho 3f121d800e catch bad instagram login 2022-10-18 16:36:27 +01:00
msramalho 93be1af93f adds instagram post/profile 2022-10-18 15:45:10 +01:00
msramalho f0f844a569 improves browsertrix configurations 2022-10-18 11:21:10 +01:00
msramalho df502f3bde updates yt-dlp 2022-10-18 11:20:53 +01:00
msramalho 26903190fd adds wacz link 2022-10-17 14:41:34 +01:00
Miguel Sozinho Ramalho 683f2d7500
Merge pull request #64 from bellingcat/dev 2022-10-17 14:40:15 +01:00
Miguel Sozinho Ramalho 23a4dc20c5
Merge pull request #63 from edsu/browsertrix-crawler 2022-10-17 14:39:34 +01:00
msramalho 57464f1506 refactors for edges in browsertrix and s3 upload, adds timeout parameter 2022-10-17 14:07:31 +01:00
msramalho dc0ca8bdd6 adds browsertrix to all archivers flows 2022-10-17 14:06:50 +01:00
Ed Summers 20ca50dc90
Clean up browsertrix-crawler files
Remove any local browsertrix-crawler files after the WACZ has been
copied to storage. Note, until this issue has a release on DockerHub the
local files won't be able to be deleted since Docker on Linux creates
the files as root:

https://github.com/webrecorder/browsertrix-crawler/issues/170

The code will catch this exception and log a warning instead of failing
and losing the work that has been completed.
2022-10-11 16:49:19 -04:00
Ed Summers c34fb9cf10
Add browsertrix profile config option
This commit adds a browsertrix profile option to the configuration. In
order to not require the passing of the browsertrix config to every
Archiver, the Archiver constructors (include the base) were modified to
accept a Storage and Config instance. Some of the constructors them pick
out the pieces they need from the Config, in addition to calling the
parent constructor. In order to avoid a circular import that this
created the Config object now defines the default hash function to use,
rather than having it be a static property of the Archiver class.
2022-10-11 16:21:42 -04:00