bellingcat Python script to automatically archive social media posts, videos, and images from a Google Sheets document.

Go to file

Miguel Sozinho Ramalho 80d61e8b85 Merge pull request #341 from bellingcat/dev Address several small bugs, includes tiktok photos extraction, and data-saving for proxy usage in generic_extractor.		2025-07-05 20:28:00 +01:00
.github	make python api tests work on gh when no env is set	2025-06-30 02:20:51 +01:00
docs	updates docs to reflect new general approach extractor	2025-07-05 15:56:13 +01:00
scripts	concludes logging standardization refactor	2025-06-26 17:20:04 +01:00
src/auto_archiver	some wayback errors are expected and should be warnings	2025-07-05 18:31:39 +01:00
tests	adds antibot tiktok logic for photos closes #295	2025-07-05 18:31:12 +01:00
.dockerignore	docker initial files	2022-10-31 17:10:55 +00:00
.gitignore	Gitgnore to include launch.json and installtion docs to include build script.	2025-06-16 14:37:21 +01:00
.pre-commit-config.yaml	Fix pre-commit for ruff check	2025-03-14 13:40:57 +00:00
.pylintrc	Ignore pylint statements for manifest files	2025-01-21 17:59:52 +01:00
.readthedocs.yaml	installs ffmpeg in readthedocs	2025-06-18 00:29:36 +01:00
CONTRIBUTING.md	Fix links to docs	2025-02-12 11:41:54 +00:00
Dockerfile	browsertrix version bump	2025-06-17 19:22:20 +01:00
LICENSE	Add LICENSE	2021-06-24 16:14:32 +02:00
Makefile	Update style_guide.md to clarify pre-commit setup, add Docker commands to Makefile and merge ruff actions.	2025-03-13 20:26:29 +00:00
README.md	update repo badges	2025-03-31 16:19:29 +01:00
docker-compose.yaml	Cleanup docker-compose	2025-03-20 16:48:30 +04:00
poetry.lock	fixing pypaperclip see issue #339	2025-07-05 19:07:23 +01:00
pyproject.toml	fixing pypaperclip see issue #339	2025-07-05 19:07:23 +01:00

README.md

Auto Archiver

Auto Archiver is a Python tool to automatically archive content on the web in a secure and verifiable way. It takes URLs from different sources (e.g. a CSV file, Google Sheets, command line etc.) and archives the content of each one. It can archive social media posts, videos, images and webpages. Content can be enriched, then saved either locally or remotely (S3 bucket, Google Drive). The status of the archiving process can be appended to a CSV report, or if using Google Sheets – back to the original sheet.

See the Auto Archiver documentation for more information.

Read the article about Auto Archiver on bellingcat.com.

Installation

View the Installation Guide for full instructions

Advanced:

To get started quickly using Docker:

docker pull bellingcat/auto-archiver && docker run -it --rm -v secrets:/app/secrets bellingcat/auto-archiver --config secrets/orchestration.yaml

Or pip:

pip install auto-archiver && auto-archiver --help

Contributing

We welcome contributions to the Auto Archiver project! See the Contributing Guide for how to get involved!