improves docs for how-to and migrations

pull/319/head
msramalho 2025-06-11 13:37:03 +01:00
rodzic 3cf51dd874
commit e567bba6f9
Nie znaleziono w bazie danych klucza dla tego podpisu
7 zmienionych plików z 20 dodań i 11 usunięć

Wyświetl plik

@ -15,8 +15,16 @@ We have dropped the `vk_extractor` because of problems in a project we relied on
Module 'vk_extractor' not found. Are you sure it's installed/exists?
```
## Dropping `screenshot_enricher` module
We have dropped the `screenshot_enricher` module because a new `antibot_extractor_enricher` (see below) module replaces its functionality more robustly and with less dependency hassle on geckodriver/firefox. You will need to remove it from your configuration file, otherwise you will see an error like:
```{code} console
Module 'screenshot_enricher' not found. Are you sure it's installed/exists?
```
## New `antibot_extractor_enricher` module and VkDropin
We have added a new `antibot_extractor_enricher` module that uses a computer-controlled browser to extract content from websites that use anti-bot measures. You can add it to your configuration file like this:
We have added a new [`antibot_extractor_enricher`](../modules/autogen/extractor/antibot_extractor_enricher.md) module that uses a computer-controlled browser to extract content from websites that use anti-bot measures. You can add it to your configuration file like this:
```{code} yaml
steps:
@ -28,6 +36,8 @@ steps:
- antibot_extractor_enricher
```
It will take a full page screenshot, a PDF capture, extract HTML source code, and any other relevant media.
It comes with Dropins that we will be adding and maintaining.
> Dropin: A module with site-specific behaviours that is loaded automatically. You don't need to add them to your configuration steps for them to run. Sometimes they need `authentication` configurations though.
@ -36,7 +46,7 @@ One such Dropin is the VkDropin which uses this automated browser to access VKon
```{code} yaml
authentication:
vk:
vk.com:
username: your_username
password: your_password
```

Wyświetl plik

@ -7,8 +7,7 @@ from slugify import slugify
from auto_archiver.core.metadata import Metadata, Media
from auto_archiver.utils import url as UrlUtil, get_datetime_from_str
from auto_archiver.core.extractor import Extractor
from .dropin import GenericDropin, InfoExtractor
from auto_archiver.modules.generic_extractor.dropin import GenericDropin, InfoExtractor
class Twitter(GenericDropin):