bellingcat Python script to automatically archive social media posts, videos, and images from a Google Sheets document.

Go to file

Miguel Sozinho Ramalho 6735fa890b v1.0.1 dependency updates, generic extractor improvements (#307 ) * wacz: allow exceptional cases where more than one resource image is available * improves generic extractor edge-cases and yt-dlp updates * REMOVES vk_extractor until further notice * bumps browsertrix in docker image * npm version bump on scripts/settings * poetry updates * Changed log level on gsheet_feeder_db started from warning to info (#301) * closes 305 and further fixes finding local downloads from uncommon ytdlp extractors * use ffmpeg -bitexact to reduce duplicate content storing * formatting * adds yt-dlp curl-cffi * version bump * linting --------- Co-authored-by: Dave Mateer <davemateer@gmail.com>		2025-06-02 20:57:12 +01:00
.github	update runner os to matrix os.	2025-03-31 12:00:24 +01:00
docs	v1.0.1 dependency updates, generic extractor improvements (#307 )	2025-06-02 20:57:12 +01:00
scripts	v1.0.1 dependency updates, generic extractor improvements (#307 )	2025-06-02 20:57:12 +01:00
src/auto_archiver	v1.0.1 dependency updates, generic extractor improvements (#307 )	2025-06-02 20:57:12 +01:00
tests	v1.0.1 dependency updates, generic extractor improvements (#307 )	2025-06-02 20:57:12 +01:00
.dockerignore	…
.gitignore	Script to auto-generate a service account	2025-03-17 15:42:43 +00:00
.pre-commit-config.yaml	…
.pylintrc	…
.readthedocs.yaml	…
CONTRIBUTING.md	…
Dockerfile	v1.0.1 dependency updates, generic extractor improvements (#307 )	2025-06-02 20:57:12 +01:00
LICENSE	…
Makefile	…
README.md	update repo badges	2025-03-31 16:19:29 +01:00
docker-compose.yaml	Cleanup docker-compose	2025-03-20 16:48:30 +04:00
poetry.lock	v1.0.1 dependency updates, generic extractor improvements (#307 )	2025-06-02 20:57:12 +01:00
pyproject.toml	v1.0.1 dependency updates, generic extractor improvements (#307 )	2025-06-02 20:57:12 +01:00

README.md

Auto Archiver

Auto Archiver is a Python tool to automatically archive content on the web in a secure and verifiable way. It takes URLs from different sources (e.g. a CSV file, Google Sheets, command line etc.) and archives the content of each one. It can archive social media posts, videos, images and webpages. Content can be enriched, then saved either locally or remotely (S3 bucket, Google Drive). The status of the archiving process can be appended to a CSV report, or if using Google Sheets – back to the original sheet.

See the Auto Archiver documentation for more information.

Read the article about Auto Archiver on bellingcat.com.

Installation

View the Installation Guide for full instructions

Advanced:

To get started quickly using Docker:

docker pull bellingcat/auto-archiver && docker run -it --rm -v secrets:/app/secrets bellingcat/auto-archiver --config secrets/orchestration.yaml

Or pip:

pip install auto-archiver && auto-archiver --help

Contributing

We welcome contributions to the Auto Archiver project! See the Contributing Guide for how to get involved!