Various docs improvements based on Friday Office Hours discussion

2025-03-17 09:23:43 +00:00 · 2025-03-17 09:23:43 +00:00 · 42162c5e3f
commit 42162c5e3f
--- a/docs/source/installation/faq.md
+++ b/docs/source/installation/faq.md
@ -0,0 +1,60 @@
+# Frequently Asked Questions
+
+
+### Q: What websites does the Auto Archiver support?
+**A:** The Auto Archiver works for a large variety of sites. Firstly, the Auto Archiver can download
+and archive any video website supported by YT-DLP, a powerful video-downloading tool ([full list of of
+sites here](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)). Aside from these sites,
+there are various different 'Extractors' for specific websites. See the full list of extractors that 
+are available on the [extractors](../modules/extractor.md) page. Some sites supported include:
+
+* Twitter
+* Instagram
+* Telegram
+* VKontact
+* Tiktok
+* Bluesky
+
+```{note} What websites the Auto Archiver can archie depends on what extractors you have enabled in
+your configuration. See [configuration](./configurations.md) for more info.
+```
+
+### Q: Does the Auto Archiver only work for social media posts ?
+**A:** No, the Auto Archiver can archive any web page on the internet, not just social media posts.
+However, for social media posts Auto Archiver can extract more relevant/useful information (such as 
+post comments, likes, author etc.) which may not be available for a generic website. If you are looking
+to more generally archive webpages, then you should make sure to enable the [](../modules/autogen/extractor/wacz_extractor_enricher.md)
+and the [](../modules/autogen/extractor/wayback_extractor_enricher.md).
+
+### Q: What kind of data is stored for each webpage that's archived?
+**A:** This depends on the website archived, but more generally, for social media posts any videos and photos in
+the post will be archived. For video sites, the video will be downloaded separately. For most of these sites, additional
+metadata such as published date, uploader/author and ratings/comments will also be saved. Additionally, further data can be
+saved depending on the enrichers that you have enabled. Some other types of data saved are timestamps if you have the 
+[](../modules/autogen/enricher/timestamping_enricher.md) or [](../modules/autogen/enricher/opentimestamps_enricher.md) enabled,
+screenshots of the web page with the [](../modules/autogen/enricher/screenshot_enricher.md), and for videos, thumbnails of the
+video with the [](../modules/autogen/enricher/thumbnail_enricher.md). You can also store things like hashes (SHA256, or pdq hashes)
+with the various hash enrichers.
+
+### Q: Where is my data stored?
+**A:** With the default configuration, data is stored on your local computer in the `local_storage` folder. You can adjust these settings by
+changing the [storage modules](../modules/storage.md) you have enabled. For example, you could choose to store your data in an S3 bucket or 
+on Google Drive. 
+
+```{note}
+You can choose to store your data in multiple places, for example your local drive **and** an S3 bucket for redundancy.
+```
+
+### Q: What should I do is something doesn't work?
+**A:** First, read through the log files to see if you can find a specific reason why something isn't working. Learn more about logging
+and how to enable debug logging in the [Logging Howto](../how_to/logging.md).
+
+If you cannot find an answer in the logs, then try searching this documentation or existing / closed issues on the [Github Issue Tracker](https://github.com/bellingcat/auto-archiver/issues?q=is%3Aissue%20). If you still cannot find an answer, then consider opening an issue on the Github Issue Tracker or asking in the Bellingcat Discord
+'Auto Archiver' group.
+
+#### Common reasons why an archiving might not work:
+
+* The website may have temporarily adjusted its settings - sometimes sites like Telegram or Twitter adjust their scraping protection settings. Often,
+waiting a day or two and then trying again can work.
+* The site requires you to be logged in - make sure the 
+* The website you're trying to archive has changed its settings/structure. Make sure you're using the latest version of Auto Archiver and try again.
--- a/docs/source/installation/installation.md
+++ b/docs/source/installation/installation.md
@ -1,5 +1,11 @@
 # Installation

+```{toctree}
+:maxdepth: 1
+
+upgrading.md
+```
+
 There are 3 main ways to use the auto-archiver. We recommend the 'docker' method for most uses. This installs all the requirements in one command.

 1. Easiest (recommended): [via docker](#installing-with-docker)
--- a/docs/source/installation/setup.md
+++ b/docs/source/installation/setup.md
@ -1,7 +1,6 @@
 # Getting Started

 ```{toctree}
-:maxdepth: 1
 :hidden:

 installation.md
@ -9,6 +8,7 @@ configurations.md
 config_editor.md
 authentication.md
 requirements.md
+faq.md
 config_cheatsheet.md
 ```

--- a/docs/source/installation/upgrading.md
+++ b/docs/source/installation/upgrading.md
@ -0,0 +1,30 @@
+
+# Upgrading
+
+If an update is available, then you will see a message in the logs when you
+run Auto Archiver. Here's what those logs look like:
+
+```{code} bash
+********* IMPORTANT: UPDATE AVAILABLE ********
+A new version of auto-archiver is available (v0.13.6, you have 0.13.4)
+Make sure to update to the latest version using: `pip install --upgrade auto-archiver`
+```
+
+Upgrading Auto Archiver depends on the way you installed it.
+
+## Docker
+
+To upgrade using docker, update the docker image with:
+
+```
+docker pull bellingcat/auto-archiver:latest 
+```
+
+## Pip
+
+To upgrade the pip package, use:
+
+```
+pip install --upgrade auto-archiver
+```
+
--- a/src/auto_archiver/modules/generic_extractor/manifest.py
+++ b/src/auto_archiver/modules/generic_extractor/manifest.py
@ -15,6 +15,9 @@ supported by `yt-dlp`, such as YouTube, Facebook, and others. It provides functi
 for retrieving videos, subtitles, comments, and other metadata, and it integrates with
 the broader archiving framework.

+For a full list of video platforms supported by `yt-dlp`, see the
+[official documentation](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)
+
 ### Features
 - Supports downloading videos and playlists.
 - Retrieves metadata like titles, descriptions, upload dates, and durations.