From 71636233cbd537a6eca1cd5b47eb895abbbf89aa Mon Sep 17 00:00:00 2001 From: msramalho <19508417+msramalho@users.noreply.github.com> Date: Tue, 10 Jun 2025 17:07:10 +0100 Subject: [PATCH] adds migration information and VkDropin info. --- .../source/how_to/upgrading_1_0_1_to_1_1_0.md | 40 +++++++++++++++++++ .../modules/generic_extractor/__manifest__.py | 2 + 2 files changed, 42 insertions(+) create mode 100644 docs/source/how_to/upgrading_1_0_1_to_1_1_0.md diff --git a/docs/source/how_to/upgrading_1_0_1_to_1_1_0.md b/docs/source/how_to/upgrading_1_0_1_to_1_1_0.md new file mode 100644 index 0000000..7e8d398 --- /dev/null +++ b/docs/source/how_to/upgrading_1_0_1_to_1_1_0.md @@ -0,0 +1,40 @@ +# Upgrading from v1.0.1 + +```{note} This how-to is only relevant for people who used Auto Archiver before June 2025 (versions prior to 1.0.1). + +If you are new to Auto Archiver, then you are already using the latest configuration format and this how-to is not relevant for you. +``` + +Versions 1.1.0+ of Auto Archiver has breaking changes in the configuration format, which means earlier configuration formats will not work without slight modifications. + + +## Dropping `vk_extractor` module +We have dropped the `vk_extractor` because of problems in a project we relied on. You will need to remove it from your configuration file, otherwise you will see an error like: + +```{code} console +Module 'vk_extractor' not found. Are you sure it's installed/exists? +``` + +## New `antibot_extractor_enricher` module +We have added a new `antibot_extractor_enricher` module that uses a computer-controlled browser to extract content from websites that use anti-bot measures. You can add it to your configuration file like this: + +```{code} yaml +steps: + extractor_enrichers: + - antibot_extractor_enricher +``` + +It comes with Dropins that we will be adding and maintaining. + +> Dropin: A module that is loaded automatically. You don't need to add them to your configuration steps for them to run. Sometimes they need `authentication` configurations though. + +One such Dropin is the VkDropin which uses this automated browser to access VKontakte (VK) pages. You should add a username/password to the configuration file if you get authentication blocks from VK, to do so use the [authentication settings](authentication_how_to.md): + +```{code} yaml +authentication: + vk: + username: your_username + password: your_password +``` + +See all available Dropins in [the source code](https://github.com/bellingcat/auto-archiver/tree/main/src/auto_archiver/modules/antibot_extractor_enricher/dropins). Usually each Dropin needs its own authentication settings, similarly to the VkDropin. \ No newline at end of file diff --git a/src/auto_archiver/modules/generic_extractor/__manifest__.py b/src/auto_archiver/modules/generic_extractor/__manifest__.py index 72db630..52cf8b8 100644 --- a/src/auto_archiver/modules/generic_extractor/__manifest__.py +++ b/src/auto_archiver/modules/generic_extractor/__manifest__.py @@ -30,6 +30,8 @@ For a full list of video platforms supported by `yt-dlp`, see the custom dropins can be created to handle additional websites and passed to the archiver via the command line using the `--dropins` option (TODO!). +You can see all currently implemented dropins in [the source code](https://github.com/bellingcat/auto-archiver/tree/main/src/auto_archiver/modules/generic_extractor). + ### Auto-Updates The Generic Extractor will also automatically check for updates to `yt-dlp` (every 5 days by default).