auto-archiver/docs/source/how_to/new_config_format.md

147 wiersze
4.8 KiB
Markdown

# Upgrading from v0.12
```{note} This how-to is only relevant for people who used Auto Archiver before February 2025 (versions prior to 0.13).
If you are new to Auto Archiver, then you are already using the latest configuration format and this how-to is not relevant for you.
```
Versions 0.13+ of Auto Archiver has breaking changes in the configuration format, which means earlier configuration formats will not work without slight modifications.
## How do I know if I need to update my configuration format?
There are two simple ways to check if you need to update your format:
2025-02-20 15:33:50 +00:00
1. When you try and run auto-archiver using your existing configuration file, you get an error about no feeders or formatters being configured, like:
2025-02-20 15:33:50 +00:00
```{code} console
AssertionError: No feeders were configured. Make sure to set at least one feeder in
your configuration file or on the command line (using --feeders)
```
2. Within your configuration file, you have a `feeder:` option. This is the old format. An example old format:
2025-02-20 15:33:50 +00:00
```{code} yaml
steps:
feeder: cli_feeder
...
```
The next two sections outline the two methods you have for updating your file.
## 1. Manually edit the configuration file and change the values.
This is recommended if you want to keep all your old settings. Follow the steps below to change the relevant settings:
2025-02-20 15:33:50 +00:00
#### a) Feeder & Formatter Steps Settings
The feeder and formatter settings have been changed from a single string to a list.
2025-02-20 15:33:50 +00:00
- `steps.feeder (string)``steps.feeders (list)`
- `steps.formatter (string)``steps.formatters (list)`
Example:
2025-02-20 15:45:48 +00:00
```{code} yaml
steps:
feeder: cli_feeder
...
formatter: html_formatter
# the above should be changed to:
steps:
feeders:
- cli_feeder
...
2025-02-20 15:45:48 +00:00
formatters:
- html_formatter
```
```{note} Auto Archiver still only supports one feeder and formatter, but from v0.13 onwards they must be added to the configuration file as a list.
```
2025-02-20 15:33:50 +00:00
#### b) Extractor (formerly Archiver) Steps Settings
2025-02-20 15:45:48 +00:00
With v0.13 of Auto Archiver, `archivers` have been renamed to `extractors` to better reflect what they actually do - extract information from a URL. Change the configuration by renaming:
2025-02-20 15:33:50 +00:00
- `steps.archivers``steps.extractors`
The names of the actual modules have also changed, so for any extractor modules you have enabled, you will need to rename the `archiver` part to `extractor`. Some examples:
2025-02-20 15:33:50 +00:00
- `telethon_archiver``telethon_extractor`
- `wacz_archiver_enricher``wacz_extractor_enricher`
- `wayback_archiver_enricher``wayback_extractor_enricher`
- `vk_archiver``vk_extractor`
#### c) Module Renaming
The `youtube_archiver` has been renamed to `generic_extractor` as it is considered the default/fallback extractor. Read more about the [generic extractor](../modules/autogen/extractor/generic_extractor.md).
The `atlos` modules have been merged into one, as have the `gsheets` feeder and database.
- `atlos_feeder``atlos_feeder_db_storage`
- `atlos_storage``atlos_feeder_db_storage`
- `atlos_db``atlos_feeder_db_storage`
- `gsheet_feeder``gsheet_feeder_db`
- `gsheet_db``gsheet_feeder_db`
Example:
2025-02-20 15:45:48 +00:00
```{code} yaml
steps:
feeders:
- gsheet_feeder_db # formerly gsheet_feeder
...
extractors: # formerly 'archivers'
- telethon_extractor # formerly telethon_archiver
- generic_extractor # formerly youtube_archiver
- vk_extractor # formerly vk_archiver
databases:
- gsheet_feeder_db # formerly gsheet_db
...
```
```{note}
Don't forget to also rename the configuration settings. For example:
```{code} yaml
gsheet_feeder_db: # formerly gsheet_feeder
service_account: secrets/service_account.json
sheet: My Google Sheet
...
```
#### d) Redundant / Obsolete Modules
2025-02-20 15:33:50 +00:00
With v0.13 of Auto Archiver, the following modules have been removed and their features have been built in to the generic_extractor. You should remove them from the 'steps' section of your configuration file:
* `twitter_archiver` - use the `generic_extractor` for general extraction, or the `twitter_api_extractor` for API access.
* `tiktok_archiver` - use the `generic_extractor` to extract TikTok videos.
## 2. Auto-generate a new config, then copy over your settings.
Using this method, you can have Auto Archiver auto-generate a configuration file for you, then you can copy over the desired settings from your old config file. This is probably the easiest method and quickest to setup, but it may require some trial and error as you copy over your settings.
First, move your existing `orchestration.yaml` file to a different folder or rename it.
Then, you can generate a `simple` or `full` config using:
```{code} console
>>> # generate a simple config
>>> auto-archiver
>>> # config will be written to orchestration.yaml
2025-02-20 15:45:48 +00:00
>>>
>>> # generate a full config
>>> auto-archiver --mode=full
2025-02-20 15:45:48 +00:00
>>>
```
After this, copy over any settings from your old config to the new config.