kopia lustrzana https://github.com/bellingcat/auto-archiver
improves docs for how-to and migrations
rodzic
3cf51dd874
commit
e567bba6f9
|
@ -15,8 +15,16 @@ We have dropped the `vk_extractor` because of problems in a project we relied on
|
|||
Module 'vk_extractor' not found. Are you sure it's installed/exists?
|
||||
```
|
||||
|
||||
## Dropping `screenshot_enricher` module
|
||||
We have dropped the `screenshot_enricher` module because a new `antibot_extractor_enricher` (see below) module replaces its functionality more robustly and with less dependency hassle on geckodriver/firefox. You will need to remove it from your configuration file, otherwise you will see an error like:
|
||||
|
||||
```{code} console
|
||||
Module 'screenshot_enricher' not found. Are you sure it's installed/exists?
|
||||
```
|
||||
|
||||
|
||||
## New `antibot_extractor_enricher` module and VkDropin
|
||||
We have added a new `antibot_extractor_enricher` module that uses a computer-controlled browser to extract content from websites that use anti-bot measures. You can add it to your configuration file like this:
|
||||
We have added a new [`antibot_extractor_enricher`](../modules/autogen/extractor/antibot_extractor_enricher.md) module that uses a computer-controlled browser to extract content from websites that use anti-bot measures. You can add it to your configuration file like this:
|
||||
|
||||
```{code} yaml
|
||||
steps:
|
||||
|
@ -28,6 +36,8 @@ steps:
|
|||
- antibot_extractor_enricher
|
||||
```
|
||||
|
||||
It will take a full page screenshot, a PDF capture, extract HTML source code, and any other relevant media.
|
||||
|
||||
It comes with Dropins that we will be adding and maintaining.
|
||||
|
||||
> Dropin: A module with site-specific behaviours that is loaded automatically. You don't need to add them to your configuration steps for them to run. Sometimes they need `authentication` configurations though.
|
||||
|
@ -36,7 +46,7 @@ One such Dropin is the VkDropin which uses this automated browser to access VKon
|
|||
|
||||
```{code} yaml
|
||||
authentication:
|
||||
vk:
|
||||
vk.com:
|
||||
username: your_username
|
||||
password: your_password
|
||||
```
|
|
@ -7,8 +7,7 @@ from slugify import slugify
|
|||
from auto_archiver.core.metadata import Metadata, Media
|
||||
from auto_archiver.utils import url as UrlUtil, get_datetime_from_str
|
||||
from auto_archiver.core.extractor import Extractor
|
||||
|
||||
from .dropin import GenericDropin, InfoExtractor
|
||||
from auto_archiver.modules.generic_extractor.dropin import GenericDropin, InfoExtractor
|
||||
|
||||
|
||||
class Twitter(GenericDropin):
|
||||
|
|
Ładowanie…
Reference in New Issue