improves docs for how-to and migrations

pull/319/head
msramalho 2025-06-11 13:37:03 +01:00
rodzic 3cf51dd874
commit e567bba6f9
Nie znaleziono w bazie danych klucza dla tego podpisu
7 zmienionych plików z 20 dodań i 11 usunięć

Wyświetl plik

@ -15,8 +15,16 @@ We have dropped the `vk_extractor` because of problems in a project we relied on
Module 'vk_extractor' not found. Are you sure it's installed/exists? Module 'vk_extractor' not found. Are you sure it's installed/exists?
``` ```
## Dropping `screenshot_enricher` module
We have dropped the `screenshot_enricher` module because a new `antibot_extractor_enricher` (see below) module replaces its functionality more robustly and with less dependency hassle on geckodriver/firefox. You will need to remove it from your configuration file, otherwise you will see an error like:
```{code} console
Module 'screenshot_enricher' not found. Are you sure it's installed/exists?
```
## New `antibot_extractor_enricher` module and VkDropin ## New `antibot_extractor_enricher` module and VkDropin
We have added a new `antibot_extractor_enricher` module that uses a computer-controlled browser to extract content from websites that use anti-bot measures. You can add it to your configuration file like this: We have added a new [`antibot_extractor_enricher`](../modules/autogen/extractor/antibot_extractor_enricher.md) module that uses a computer-controlled browser to extract content from websites that use anti-bot measures. You can add it to your configuration file like this:
```{code} yaml ```{code} yaml
steps: steps:
@ -28,6 +36,8 @@ steps:
- antibot_extractor_enricher - antibot_extractor_enricher
``` ```
It will take a full page screenshot, a PDF capture, extract HTML source code, and any other relevant media.
It comes with Dropins that we will be adding and maintaining. It comes with Dropins that we will be adding and maintaining.
> Dropin: A module with site-specific behaviours that is loaded automatically. You don't need to add them to your configuration steps for them to run. Sometimes they need `authentication` configurations though. > Dropin: A module with site-specific behaviours that is loaded automatically. You don't need to add them to your configuration steps for them to run. Sometimes they need `authentication` configurations though.
@ -36,7 +46,7 @@ One such Dropin is the VkDropin which uses this automated browser to access VKon
```{code} yaml ```{code} yaml
authentication: authentication:
vk: vk.com:
username: your_username username: your_username
password: your_password password: your_password
``` ```

Wyświetl plik

@ -7,8 +7,7 @@ from slugify import slugify
from auto_archiver.core.metadata import Metadata, Media from auto_archiver.core.metadata import Metadata, Media
from auto_archiver.utils import url as UrlUtil, get_datetime_from_str from auto_archiver.utils import url as UrlUtil, get_datetime_from_str
from auto_archiver.core.extractor import Extractor from auto_archiver.core.extractor import Extractor
from auto_archiver.modules.generic_extractor.dropin import GenericDropin, InfoExtractor
from .dropin import GenericDropin, InfoExtractor
class Twitter(GenericDropin): class Twitter(GenericDropin):