kopia lustrzana https://github.com/bellingcat/auto-archiver
Use 'Auto Archiver' naming for consistency.
auto-archiver is reserved in the docs for when talking about the command line usagepull/211/head
rodzic
061f29c885
commit
40488e0869
|
@ -1,8 +1,8 @@
|
|||
# Module Documentation
|
||||
|
||||
These pages describe the core modules that come with `auto-archiver` and provide the main functionality for archiving websites on the internet. There are five core module types:
|
||||
These pages describe the core modules that come with Auto Archiver and provide the main functionality for archiving websites on the internet. There are five core module types:
|
||||
|
||||
1. Feeders - these 'feed' information (the URLs) from various sources to the `auto-archiver` for processing
|
||||
1. Feeders - these 'feed' information (the URLs) from various sources to the Auto Archiver for processing
|
||||
2. Extractors - these 'extract' the page data for a given URL that is fed in by a feeder
|
||||
3. Enrichers - these 'enrich' the data extracted in the previous step with additional information
|
||||
4. Storage - these 'store' the data in a persistent location (on disk, Google Drive etc.)
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# Creating Your Own Modules
|
||||
|
||||
Modules are what's used to extend `auto-archiver` to process different websites or media, and/or transform the data in a way that suits your needs. In most cases, the [Core Modules](../core_modules.md) should be sufficient for every day use, but the most common use-cases for making your own Modules include:
|
||||
Modules are what's used to extend Auto Archiver to process different websites or media, and/or transform the data in a way that suits your needs. In most cases, the [Core Modules](../core_modules.md) should be sufficient for every day use, but the most common use-cases for making your own Modules include:
|
||||
|
||||
1. Extracting data from a website which doesn't work with the current core extractors.
|
||||
2. Enriching or altering the data before saving with additional information that the core enrichers do not offer.
|
||||
|
@ -21,7 +21,7 @@ When done, you should have a module structure as follows:
|
|||
│ └── awesome_extractor.py
|
||||
```
|
||||
|
||||
Check out the [core modules](https://github.com/bellingcat/auto-archiver/tree/main/src/auto_archiver/modules) in the `auto-archiver` repository for examples of the folder structure for real-world modules.
|
||||
Check out the [core modules](https://github.com/bellingcat/auto-archiver/tree/main/src/auto_archiver/modules) in the Auto Archiver repository for examples of the folder structure for real-world modules.
|
||||
|
||||
## Populating the Manifest File
|
||||
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# How-To Guides
|
||||
|
||||
The follow pages contain helpful how-to guides for comon use cases of the Auto-Archiver.
|
||||
The follow pages contain helpful how-to guides for common use cases of the Auto Archiver.
|
||||
---
|
||||
|
||||
```{toctree}
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
# Upgrading to 0.13 Configuration Format
|
||||
|
||||
```{note} This how-to is only relevant for people who used Auto-Archiver before February 2025 (versions prior to 0.13).
|
||||
```{note} This how-to is only relevant for people who used Auto Archiver before February 2025 (versions prior to 0.13).
|
||||
|
||||
If you are new to Auto-Archiver, then you are already using the latest configuration format and this how-to is not relevant for you.
|
||||
If you are new to Auto Archiver, then you are already using the latest configuration format and this how-to is not relevant for you.
|
||||
```
|
||||
|
||||
Version 0.13 of Auto Archiver has breaking changes in the configuration format, which means earlier configuration formats will not work without slight modifications.
|
||||
|
@ -55,7 +55,7 @@ steps:
|
|||
- html_formatter
|
||||
```
|
||||
|
||||
```{note} Auto-Archiver still only supports one feeder and formatter, but from v0.13 onwards they must be added to the configuration file as a list.
|
||||
```{note} Auto Archiver still only supports one feeder and formatter, but from v0.13 onwards they must be added to the configuration file as a list.
|
||||
```
|
||||
|
||||
2. Extractor (formerly Archiver) Steps Settings
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
# Feeder Modules
|
||||
|
||||
Feeder modules are used to feed URLs into the `auto-archiver` for processing. Feeders can take these URLs from a variety of sources, such as a file, a database, or the command line.
|
||||
Feeder modules are used to feed URLs into the Auto Archiver for processing. Feeders can take these URLs from a variety of sources, such as a file, a database, or the command line.
|
||||
|
||||
The default feeder is the command line feeder (`cli_feeder`), which allows you to input URLs directly into the `auto-archiver` from the command line.
|
||||
The default feeder is the command line feeder (`cli_feeder`), which allows you to input URLs directly into `auto-archiver` from the command line.
|
||||
|
||||
Command line feeder usage:
|
||||
```{code} bash
|
||||
|
|
|
@ -13,7 +13,7 @@ from abc import abstractmethod
|
|||
from auto_archiver.core import Metadata, BaseModule
|
||||
|
||||
class Enricher(BaseModule):
|
||||
"""Base classes and utilities for enrichers in the Auto-Archiver system.
|
||||
"""Base classes and utilities for enrichers in the Auto Archiver system.
|
||||
|
||||
Enricher modules must implement the `enrich` method to define their behavior.
|
||||
"""
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
{
|
||||
"name": "Auto-Archiver API Database",
|
||||
"name": "Auto Archiver API Database",
|
||||
"type": ["database"],
|
||||
"entry_point": "api_db::AAApiDb",
|
||||
"requires_setup": True,
|
||||
|
@ -39,7 +39,7 @@
|
|||
},
|
||||
},
|
||||
"description": """
|
||||
Provides integration with the Auto-Archiver API for querying and storing archival data.
|
||||
Provides integration with the Auto Archiver API for querying and storing archival data.
|
||||
|
||||
### Features
|
||||
- **API Integration**: Supports querying for existing archives and submitting results.
|
||||
|
@ -49,6 +49,6 @@
|
|||
- **Optional Storage**: Archives results conditionally based on configuration.
|
||||
|
||||
### Setup
|
||||
Requires access to an Auto-Archiver API instance and a valid API token.
|
||||
Requires access to an Auto Archiver API instance and a valid API token.
|
||||
""",
|
||||
}
|
||||
|
|
Ładowanie…
Reference in New Issue