kopia lustrzana https://github.com/bellingcat/auto-archiver
Merge pull request #222 from bellingcat/feat/yt-dlp-pots
yt-dlp proposed extractor_args and PO Token client.pull/287/head
commit
1d18399d70
|
@ -106,5 +106,117 @@ Finally,Some important things to remember:
|
|||
|
||||
## Authenticating on XXXX site with username/password
|
||||
|
||||
```{note} This section is still under construction 🚧
|
||||
```{note}
|
||||
This section is still under construction 🚧
|
||||
```
|
||||
|
||||
|
||||
# Proof of Origin Tokens
|
||||
|
||||
YouTube uses **Proof of Origin Tokens (POT)** as part of its bot detection system to verify that requests originate from valid clients. If a token is missing or invalid, some videos may return errors like "Sign in to confirm you're not a bot."
|
||||
|
||||
yt-dlp provides [a detailed guide to POTs](https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide).
|
||||
|
||||
### How Auto Archiver Uses POT
|
||||
This feature is enabled for the Generic Archiver via two yt-dlp plugins:
|
||||
|
||||
- **Client-side plugin**: [yt-dlp-get-pot](https://github.com/coletdjnz/yt-dlp-get-pot)
|
||||
Detects when a token is required and requests one from a provider.
|
||||
|
||||
- **Provider plugin**: [bgutil-ytdlp-pot-provider](https://github.com/Brainicism/bgutil-ytdlp-pot-provider)
|
||||
Includes both a Python plugin and a **Node.js server or script** to generate the token.
|
||||
|
||||
These are installed in our Poetry environment.
|
||||
|
||||
### Integration Methods
|
||||
|
||||
**Docker (Recommended)**:
|
||||
|
||||
When running the Auto Archiver using the Docker image, we use the [Node.js token generation script](https://github.com/Brainicism/bgutil-ytdlp-pot-provider/tree/master/server).
|
||||
This is to avoid managing a separate server process, and is handled automatically inside the Docker container when needed.
|
||||
|
||||
This is already included in the Docker image, however if you need to disable this you can set the config option `bguils_po_token_method` under the `generic_extractor` section of your `orchestration.yaml` config file to "disabled".
|
||||
```yaml
|
||||
generic_extractor:
|
||||
bguils_po_token_method: "disabled"
|
||||
```
|
||||
|
||||
**PyPi/ Local**:
|
||||
|
||||
When using the Auto Archiver PyPI package, or running locally, you will need additional system requirements to run the token generation script, namely either Docker, or Node.js and Yarn.
|
||||
|
||||
See the [bgutil-ytdlp-pot-provider](https://github.com/Brainicism/bgutil-ytdlp-pot-provider?tab=readme-ov-file#a-http-server-option) documentation for more details.
|
||||
|
||||
⚠️WARNING⚠️: This will add the server scripts to the home directory of wherever this is running.
|
||||
|
||||
- You can set the config option `bguils_po_token_method` under the `generic_extractor` section of your `orchestration.yaml` config file to "script" to enable the token generation script process locally.
|
||||
- Alternatively you can run the bgutil-ytdlp-pot-provider server separately using their Docker image or Node.js server.
|
||||
|
||||
### Notes
|
||||
|
||||
- The token generation script is only triggered when needed by yt-dlp, so it should have no effect unless YouTube requests a POT.
|
||||
- If you're running the Auto Archiver in Docker, this is set up automatically.
|
||||
- If you're running locally, you'll need to run the setup script manually or enable the feature in your config.
|
||||
- You can set up both the server and the script, and the plugin will fallback on each other if needed. This is recommended for robustness!
|
||||
|
||||
### Configurations:
|
||||
|
||||
## Configurations Summary
|
||||
|
||||
| Option | Behavior | Docker Default? |
|
||||
|------------| ------------------------------------------------------------------------------------------------------------------------------------------ | --------------- |
|
||||
| `auto` | Docker: Automatically downloads and uses the token generation script. Local: Does nothing; assumes a separate server is running externally. | ✅ Yes |
|
||||
| `script` | Explicitly downloads and uses the token generation script, even locally. | ❌ No |
|
||||
| `disabled` | Disables token generation completely. | ❌ No |
|
||||
|
||||
Example configuration:
|
||||
|
||||
|
||||
```yaml
|
||||
generic_extractor:
|
||||
# ...
|
||||
bguils_po_token_method: "script"
|
||||
# For debugging add the verbose flag here:
|
||||
ytdlp_args: "--no-abort-on-error --abort-on-error --verbose"
|
||||
|
||||
```
|
||||
|
||||
**Advanced Configuration:**
|
||||
|
||||
If you change the default port of the bgutil-ytdlp-pot-provider server, you can pass the updated values using our `extractor_args` option for the gereric extractor.
|
||||
|
||||
```yaml
|
||||
generic_extractor:
|
||||
ytdlp_args: "--no-abort-on-error --abort-on-error --verbose"
|
||||
ytdlp_update_interval: 5
|
||||
bguils_po_token_method: "script"
|
||||
extractor_args:
|
||||
youtube:
|
||||
getpot_bgutil_baseurl: "http://127.0.0.1:8080"
|
||||
player_client: web,tv
|
||||
```
|
||||
For more details on this for bgutils see [here](https://github.com/Brainicism/bgutil-ytdlp-pot-provider?tab=readme-ov-file#usage)
|
||||
|
||||
### Checking the logs
|
||||
|
||||
To verify that the POT process working, look for the following lines in your log after adding the config option:
|
||||
|
||||
```shell
|
||||
[GetPOT] BgUtilScript: Generating POT via script: /Users/you/bgutil-ytdlp-pot-provider/server/build/generate_once.js
|
||||
[debug] [GetPOT] BgUtilScript: Executing command to get POT via script: /Users/you/.nvm/versions/node/v20.18.0/bin/node /Users/you/bgutil-ytdlp-pot-provider/server/build/generate_once.js -v ymCMy8OflKM
|
||||
[debug] [GetPOT] BgUtilScript: stdout:
|
||||
{"poToken":"MlMxojNFhEJvUzGeHEkVRSK_luXtwcDnwSNIOgaUutqB7t99nmlNvtWgYayboopG6ZopZgmQ-6PJCWEMHv89MIiFGGlJRY25Fkwzxmia_8uYgf5AWf==","generatedAt":"2025-03-26T10:45:26.156Z","visitIdentifier":"ymCMy8OflKM"}
|
||||
[debug] [GetPOT] Fetching gvs PO Token for tv client
|
||||
```
|
||||
|
||||
If it can't find the script or something, you'll see something like this:
|
||||
```shell
|
||||
[debug] [GetPOT] Fetching player PO Token for tv client
|
||||
WARNING: [GetPOT] BgUtilScript: Script path doesn't exist: /Users/you/bgutil-ytdlp-pot-provider/server/build/generate_once.js. Please make sure the script has been transpiled correctly.
|
||||
WARNING: [GetPOT] BgUtilHTTP: Error reaching GET http://127.0.0.1:4416/ping (caused by TransportError). Please make sure that the server is reachable at http://127.0.0.1:4416.
|
||||
[debug] [GetPOT] No player PO Token provider available for tv client
|
||||
```
|
||||
|
||||
In this case check that the script has been transpiled correctly and is available at the path specified in the log,
|
||||
or that the server is running and reachable.
|
||||
|
||||
|
|
|
@ -158,6 +158,21 @@ charset-normalizer = ["charset-normalizer"]
|
|||
html5lib = ["html5lib"]
|
||||
lxml = ["lxml"]
|
||||
|
||||
[[package]]
|
||||
name = "bgutil-ytdlp-pot-provider"
|
||||
version = "0.7.4"
|
||||
description = ""
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
groups = ["main"]
|
||||
files = [
|
||||
{file = "bgutil_ytdlp_pot_provider-0.7.4-py3-none-any.whl", hash = "sha256:5f0b1d884fec66dff703c421ea06f5fc9b11022d9c0babdaa0cab13ed99b9d77"},
|
||||
{file = "bgutil_ytdlp_pot_provider-0.7.4.tar.gz", hash = "sha256:b6c1462b8f979540078085cd82462ef967b8b70cd0810d469243a31f5081e5c6"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
yt-dlp-get-pot = ">=0.1.1"
|
||||
|
||||
[[package]]
|
||||
name = "boto3"
|
||||
version = "1.37.18"
|
||||
|
@ -2265,23 +2280,6 @@ files = [
|
|||
[package.dependencies]
|
||||
rich = ">=11.0.0"
|
||||
|
||||
[[package]]
|
||||
name = "roman-numerals-py"
|
||||
version = "3.1.0"
|
||||
description = "Manipulate well-formed Roman numerals"
|
||||
optional = false
|
||||
python-versions = ">=3.9"
|
||||
groups = ["docs"]
|
||||
markers = "python_version >= \"3.12\""
|
||||
files = [
|
||||
{file = "roman_numerals_py-3.1.0-py3-none-any.whl", hash = "sha256:9da2ad2fb670bcf24e81070ceb3be72f6c11c440d73bd579fbeca1e9f330954c"},
|
||||
{file = "roman_numerals_py-3.1.0.tar.gz", hash = "sha256:be4bf804f083a4ce001b5eb7e3c0862479d10f94c936f6c4e5f250aa5ff5bd2d"},
|
||||
]
|
||||
|
||||
[package.extras]
|
||||
lint = ["mypy (==1.15.0)", "pyright (==1.1.394)", "ruff (==0.9.7)"]
|
||||
test = ["pytest (>=8)"]
|
||||
|
||||
[[package]]
|
||||
name = "rsa"
|
||||
version = "4.9"
|
||||
|
@ -2506,7 +2504,6 @@ description = "Python documentation generator"
|
|||
optional = false
|
||||
python-versions = ">=3.10"
|
||||
groups = ["docs"]
|
||||
markers = "python_version < \"3.12\""
|
||||
files = [
|
||||
{file = "sphinx-8.1.3-py3-none-any.whl", hash = "sha256:09719015511837b76bf6e03e42eb7595ac8c2e41eeb9c29c5b755c6b677992a2"},
|
||||
{file = "sphinx-8.1.3.tar.gz", hash = "sha256:43c1911eecb0d3e161ad78611bc905d1ad0e523e4ddc202a58a821773dc4c927"},
|
||||
|
@ -2536,43 +2533,6 @@ docs = ["sphinxcontrib-websupport"]
|
|||
lint = ["flake8 (>=6.0)", "mypy (==1.11.1)", "pyright (==1.1.384)", "pytest (>=6.0)", "ruff (==0.6.9)", "sphinx-lint (>=0.9)", "tomli (>=2)", "types-Pillow (==10.2.0.20240822)", "types-Pygments (==2.18.0.20240506)", "types-colorama (==0.4.15.20240311)", "types-defusedxml (==0.7.0.20240218)", "types-docutils (==0.21.0.20241005)", "types-requests (==2.32.0.20240914)", "types-urllib3 (==1.26.25.14)"]
|
||||
test = ["cython (>=3.0)", "defusedxml (>=0.7.1)", "pytest (>=8.0)", "setuptools (>=70.0)", "typing_extensions (>=4.9)"]
|
||||
|
||||
[[package]]
|
||||
name = "sphinx"
|
||||
version = "8.2.3"
|
||||
description = "Python documentation generator"
|
||||
optional = false
|
||||
python-versions = ">=3.11"
|
||||
groups = ["docs"]
|
||||
markers = "python_version >= \"3.12\""
|
||||
files = [
|
||||
{file = "sphinx-8.2.3-py3-none-any.whl", hash = "sha256:4405915165f13521d875a8c29c8970800a0141c14cc5416a38feca4ea5d9b9c3"},
|
||||
{file = "sphinx-8.2.3.tar.gz", hash = "sha256:398ad29dee7f63a75888314e9424d40f52ce5a6a87ae88e7071e80af296ec348"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
alabaster = ">=0.7.14"
|
||||
babel = ">=2.13"
|
||||
colorama = {version = ">=0.4.6", markers = "sys_platform == \"win32\""}
|
||||
docutils = ">=0.20,<0.22"
|
||||
imagesize = ">=1.3"
|
||||
Jinja2 = ">=3.1"
|
||||
packaging = ">=23.0"
|
||||
Pygments = ">=2.17"
|
||||
requests = ">=2.30.0"
|
||||
roman-numerals-py = ">=1.0.0"
|
||||
snowballstemmer = ">=2.2"
|
||||
sphinxcontrib-applehelp = ">=1.0.7"
|
||||
sphinxcontrib-devhelp = ">=1.0.6"
|
||||
sphinxcontrib-htmlhelp = ">=2.0.6"
|
||||
sphinxcontrib-jsmath = ">=1.0.1"
|
||||
sphinxcontrib-qthelp = ">=1.0.6"
|
||||
sphinxcontrib-serializinghtml = ">=1.1.9"
|
||||
|
||||
[package.extras]
|
||||
docs = ["sphinxcontrib-websupport"]
|
||||
lint = ["betterproto (==2.0.0b6)", "mypy (==1.15.0)", "pypi-attestations (==0.0.21)", "pyright (==1.1.395)", "pytest (>=8.0)", "ruff (==0.9.9)", "sphinx-lint (>=0.9)", "types-Pillow (==10.2.0.20240822)", "types-Pygments (==2.19.0.20250219)", "types-colorama (==0.4.15.20240311)", "types-defusedxml (==0.7.0.20240218)", "types-docutils (==0.21.0.20241128)", "types-requests (==2.32.0.20241016)", "types-urllib3 (==1.26.25.14)"]
|
||||
test = ["cython (>=3.0)", "defusedxml (>=0.7.1)", "pytest (>=8.0)", "pytest-xdist[psutil] (>=3.4)", "setuptools (>=70.0)", "typing_extensions (>=4.9)"]
|
||||
|
||||
[[package]]
|
||||
name = "sphinx-autoapi"
|
||||
version = "3.6.0"
|
||||
|
@ -3376,7 +3336,19 @@ secretstorage = ["cffi", "secretstorage"]
|
|||
static-analysis = ["autopep8 (>=2.0,<3.0)", "ruff (>=0.11.0,<0.12.0)"]
|
||||
test = ["pytest (>=8.1,<9.0)", "pytest-rerunfailures (>=14.0,<15.0)"]
|
||||
|
||||
[[package]]
|
||||
name = "yt-dlp-get-pot"
|
||||
version = "0.3.0"
|
||||
description = ""
|
||||
optional = false
|
||||
python-versions = ">=3.9"
|
||||
groups = ["main"]
|
||||
files = [
|
||||
{file = "yt_dlp_get_pot-0.3.0-py3-none-any.whl", hash = "sha256:a49a596a3e3c02cd9ce051192ea3fe8168cf24ece8954bed6aa331a87d86954f"},
|
||||
{file = "yt_dlp_get_pot-0.3.0.tar.gz", hash = "sha256:ac9530b9e7b3d667235b9119da475f595d2dc7e6f6bbf98b965011be454e8833"},
|
||||
]
|
||||
|
||||
[metadata]
|
||||
lock-version = "2.1"
|
||||
python-versions = ">=3.10,<3.13"
|
||||
content-hash = "ac5d473189adbadb3ee5d8a36e1898a39725755704e0677768303ae46bc246c8"
|
||||
content-hash = "c612e9f98ca5199092141bb04a0de4cd5314a8fdc8cb12c1d63eafe26bbf16aa"
|
||||
|
|
|
@ -56,6 +56,7 @@ dependencies = [
|
|||
"rfc3161-client (>=1.0.1,<2.0.0)",
|
||||
"cryptography (>44.0.1,<45.0.0)",
|
||||
"opentimestamps (>=0.4.5,<0.5.0)",
|
||||
"bgutil-ytdlp-pot-provider (>=0.7.3,<0.8.0)",
|
||||
]
|
||||
|
||||
[tool.poetry.group.dev.dependencies]
|
||||
|
|
|
@ -74,6 +74,11 @@ If you are having issues with the extractor, you can review the version of `yt-d
|
|||
"default": "inf",
|
||||
"help": "Use to limit the number of videos to download when a channel or long page is being extracted. 'inf' means no limit.",
|
||||
},
|
||||
"bguils_po_token_method": {
|
||||
"default": "auto",
|
||||
"help": "Set up a Proof of origin token provider. This process has additional requirements. See [authentication](https://auto-archiver.readthedocs.io/en/latest/how_to/authentication_how_to.html) for more information.",
|
||||
"choices": ["auto", "script", "disabled"],
|
||||
},
|
||||
"extractor_args": {
|
||||
"default": {},
|
||||
"help": "Additional arguments to pass to the yt-dlp extractor. See https://github.com/yt-dlp/yt-dlp/blob/master/README.md#extractor-arguments.",
|
||||
|
|
|
@ -1,10 +1,13 @@
|
|||
import shutil
|
||||
import sys
|
||||
import datetime
|
||||
import os
|
||||
import importlib
|
||||
import subprocess
|
||||
import zipfile
|
||||
|
||||
from typing import Generator, Type
|
||||
from urllib.request import urlretrieve
|
||||
|
||||
import yt_dlp
|
||||
from yt_dlp.extractor.common import InfoExtractor
|
||||
|
@ -26,59 +29,138 @@ class GenericExtractor(Extractor):
|
|||
_dropins = {}
|
||||
|
||||
def setup(self):
|
||||
# check for file .ytdlp-update in the secrets folder
|
||||
self.check_for_extractor_updates()
|
||||
self.setup_po_tokens()
|
||||
|
||||
def check_for_extractor_updates(self):
|
||||
"""Checks whether yt-dlp or its plugins need updating and triggers a restart if so."""
|
||||
if self.ytdlp_update_interval < 0:
|
||||
return
|
||||
|
||||
use_secrets = os.path.exists("secrets")
|
||||
path = os.path.join("secrets" if use_secrets else "", ".ytdlp-update")
|
||||
next_update_check = None
|
||||
if os.path.exists(path):
|
||||
with open(path, "r") as f:
|
||||
next_update_check = datetime.datetime.fromisoformat(f.read())
|
||||
update_file = os.path.join("secrets" if os.path.exists("secrets") else "", ".ytdlp-update")
|
||||
next_check = None
|
||||
if os.path.exists(update_file):
|
||||
with open(update_file, "r") as f:
|
||||
next_check = datetime.datetime.fromisoformat(f.read())
|
||||
|
||||
if not next_update_check or next_update_check < datetime.datetime.now():
|
||||
updated = self.update_ytdlp()
|
||||
if next_check and next_check > datetime.datetime.now():
|
||||
return
|
||||
|
||||
next_update_check = datetime.datetime.now() + datetime.timedelta(days=self.ytdlp_update_interval)
|
||||
with open(path, "w") as f:
|
||||
f.write(next_update_check.isoformat())
|
||||
yt_dlp_updated = self.update_package("yt-dlp")
|
||||
bgutil_updated = self.update_package("bgutil-ytdlp-pot-provider")
|
||||
|
||||
if not updated:
|
||||
return
|
||||
# Write the new timestamp
|
||||
with open(update_file, "w") as f:
|
||||
next_check = datetime.datetime.now() + datetime.timedelta(days=self.ytdlp_update_interval)
|
||||
f.write(next_check.isoformat())
|
||||
|
||||
if yt_dlp_updated or bgutil_updated:
|
||||
if os.environ.get("AUTO_ARCHIVER_ALLOW_RESTART", "1") != "1":
|
||||
logger.warning(
|
||||
"yt-dlp has been updated. Auto archiver should be restarted for these changes to take effect"
|
||||
)
|
||||
logger.warning("yt-dlp or plugin was updated — please restart auto-archiver manually")
|
||||
else:
|
||||
logger.warning("Restarting auto-archiver to apply yt-dlp update")
|
||||
logger.warning("yt-dlp or plugin was updated — restarting auto-archiver")
|
||||
logger.warning(" ======= RESTARTING ======= ")
|
||||
os.execv(sys.executable, [sys.executable] + sys.argv)
|
||||
|
||||
def update_ytdlp(self):
|
||||
logger.info("Checking and updating yt-dlp...")
|
||||
logger.info(
|
||||
f"Tip: change the 'ytdlp_update_interval' setting to control how often yt-dlp is updated. Set to -1 to disable or 0 to enable on every run. Current setting: {self.ytdlp_update_interval}"
|
||||
)
|
||||
def update_package(self, package_name: str) -> bool:
|
||||
logger.info(f"Checking and updating {package_name}...")
|
||||
from importlib.metadata import version as get_version
|
||||
|
||||
old_version = get_version("yt-dlp")
|
||||
old_version = get_version(package_name)
|
||||
try:
|
||||
# try and update with pip (this works inside poetry environment and in a normal virtualenv)
|
||||
result = subprocess.run(["pip", "install", "--upgrade", "yt-dlp"], check=True, capture_output=True)
|
||||
|
||||
if "Successfully installed yt-dlp" in result.stdout.decode():
|
||||
new_version = importlib.metadata.version("yt-dlp")
|
||||
logger.info(f"yt-dlp successfully (from {old_version} to {new_version})")
|
||||
result = subprocess.run(["pip", "install", "--upgrade", package_name], check=True, capture_output=True)
|
||||
if f"Successfully installed {package_name}" in result.stdout.decode():
|
||||
new_version = importlib.metadata.version(package_name)
|
||||
logger.info(f"{package_name} updated from {old_version} to {new_version}")
|
||||
return True
|
||||
logger.info(f"{package_name} already up to date")
|
||||
except Exception as e:
|
||||
logger.error(f"Error updating {package_name}: {e}")
|
||||
return False
|
||||
|
||||
def setup_po_tokens(self) -> None:
|
||||
"""Setup Proof of Origin Token method conditionally.
|
||||
Uses provider: https://github.com/Brainicism/bgutil-ytdlp-pot-provider.
|
||||
"""
|
||||
in_docker = os.environ.get("RUNNING_IN_DOCKER")
|
||||
if self.bguils_po_token_method == "disabled":
|
||||
# This allows disabling of the PO Token generation script in the Docker implementation.
|
||||
logger.warning("Proof of Origin Token generation is disabled.")
|
||||
return
|
||||
|
||||
if self.bguils_po_token_method == "auto" and not in_docker:
|
||||
logger.info(
|
||||
"Proof of Origin Token method not explicitly set. "
|
||||
"If you're running an external HTTP server separately, you can safely ignore this message. "
|
||||
"To reduce the likelihood of bot detection, enable one of the methods described in the documentation: "
|
||||
"https://auto-archiver.readthedocs.io/en/settings_page/installation/authentication.html#proof-of-origin-tokens"
|
||||
)
|
||||
return
|
||||
|
||||
# Either running in Docker, or "script" method is set beyond this point
|
||||
self.setup_token_generation_script()
|
||||
|
||||
def setup_token_generation_script(self) -> None:
|
||||
"""This function sets up the Proof of Origin Token generation script method for
|
||||
bgutil-ytdlp-pot-provider if enabled or in Docker."""
|
||||
missing_tools = [tool for tool in ("node", "yarn", "npx") if shutil.which(tool) is None]
|
||||
if missing_tools:
|
||||
logger.error(
|
||||
f"Cannot set up PO Token script; missing required tools: {', '.join(missing_tools)}. "
|
||||
"Install these tools or run bgutils via Docker. "
|
||||
"See: https://github.com/Brainicism/bgutil-ytdlp-pot-provider"
|
||||
)
|
||||
return
|
||||
try:
|
||||
from importlib.metadata import version as get_version
|
||||
|
||||
plugin_version = get_version("bgutil-ytdlp-pot-provider")
|
||||
base_dir = os.path.expanduser("~/bgutil-ytdlp-pot-provider")
|
||||
server_dir = os.path.join(base_dir, "server")
|
||||
version_file = os.path.join(server_dir, ".VERSION")
|
||||
transpiled_script = os.path.join(server_dir, "build", "generate_once.js")
|
||||
|
||||
# Skip setup if version is correct and transpiled script exists
|
||||
if os.path.isfile(transpiled_script) and os.path.isfile(version_file):
|
||||
with open(version_file) as vf:
|
||||
if vf.read().strip() == plugin_version:
|
||||
logger.info("PO Token script already set up and up to date.")
|
||||
else:
|
||||
logger.info("yt-dlp already up to date")
|
||||
return False
|
||||
# Remove an outdated directory and pull a new version
|
||||
if os.path.exists(base_dir):
|
||||
shutil.rmtree(base_dir)
|
||||
os.makedirs(base_dir, exist_ok=True)
|
||||
|
||||
zip_url = (
|
||||
f"https://github.com/Brainicism/bgutil-ytdlp-pot-provider/archive/refs/tags/{plugin_version}.zip"
|
||||
)
|
||||
zip_path = os.path.join(base_dir, f"{plugin_version}.zip")
|
||||
logger.info(f"Downloading bgutils release zip for version {plugin_version}...")
|
||||
urlretrieve(zip_url, zip_path)
|
||||
with zipfile.ZipFile(zip_path, "r") as z:
|
||||
z.extractall(base_dir)
|
||||
os.remove(zip_path)
|
||||
|
||||
extracted_root = os.path.join(base_dir, f"bgutil-ytdlp-pot-provider-{plugin_version}")
|
||||
shutil.move(os.path.join(extracted_root, "server"), server_dir)
|
||||
shutil.rmtree(extracted_root)
|
||||
logger.info("Installing dependencies and transpiling PoT Generator script...")
|
||||
subprocess.run(["yarn", "install", "--frozen-lockfile"], cwd=server_dir, check=True)
|
||||
subprocess.run(["npx", "tsc"], cwd=server_dir, check=True)
|
||||
|
||||
with open(version_file, "w") as vf:
|
||||
vf.write(plugin_version)
|
||||
|
||||
script_path = os.path.join(server_dir, "build", "generate_once.js")
|
||||
if not os.path.exists(script_path):
|
||||
logger.error("generate_once.js not found after transpilation.")
|
||||
return
|
||||
|
||||
self.extractor_args.setdefault("youtube", {})["getpot_bgutil_script"] = script_path
|
||||
logger.info(f"PO Token script configured at: {script_path}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error updating yt-dlp: {e}")
|
||||
return False
|
||||
logger.error(f"Failed to set up PO Token script: {e}")
|
||||
|
||||
def suitable_extractors(self, url: str) -> Generator[str, None, None]:
|
||||
"""
|
||||
|
|
|
@ -24,7 +24,7 @@ TESTS_TO_RUN_LAST = ["test_generic_archiver", "test_twitter_api_archiver"]
|
|||
@pytest.fixture(autouse=True)
|
||||
def skip_check_for_update(mocker):
|
||||
update_ytdlp = mocker.patch(
|
||||
"auto_archiver.modules.generic_extractor.generic_extractor.GenericExtractor.update_ytdlp"
|
||||
"auto_archiver.modules.generic_extractor.generic_extractor.GenericExtractor.update_package"
|
||||
)
|
||||
update_ytdlp.return_value = False
|
||||
|
||||
|
|
|
@ -29,6 +29,7 @@ class TestGenericExtractor(TestExtractorBase):
|
|||
"proxy": None,
|
||||
"cookies_from_browser": False,
|
||||
"cookie_file": None,
|
||||
"pot_provider": False,
|
||||
}
|
||||
|
||||
def test_load_dropin(self):
|
||||
|
@ -291,3 +292,42 @@ class TestGenericExtractor(TestExtractorBase):
|
|||
post = self.extractor.download(make_item(url))
|
||||
assert "Bellingcat researcher Kolina Koltai delves deeper into Clothoff" in post.get("content")
|
||||
assert post.get_title() == "Bellingcat"
|
||||
|
||||
|
||||
class TestGenericExtractorPoToken:
|
||||
@pytest.fixture
|
||||
def extractor(self, mocker):
|
||||
extractor = GenericExtractor()
|
||||
extractor.extractor_args = {}
|
||||
extractor.setup_token_generation_script = mocker.Mock()
|
||||
return extractor
|
||||
|
||||
def test_po_token_disabled_does_not_call_setup(self, extractor):
|
||||
extractor.bguils_po_token_method = "disabled"
|
||||
extractor.in_docker = True
|
||||
extractor.setup_po_tokens()
|
||||
extractor.setup_token_generation_script.assert_not_called()
|
||||
|
||||
def test_po_token_default_in_docker_calls_setup(self, extractor, mocker):
|
||||
extractor.bguils_po_token_method = "auto"
|
||||
mocker.patch.dict(os.environ, {"RUNNING_IN_DOCKER": "1"})
|
||||
extractor.setup_po_tokens()
|
||||
extractor.setup_token_generation_script.assert_called_once()
|
||||
|
||||
def test_po_token_default_local_does_not_call_setup(self, extractor, caplog, mocker):
|
||||
extractor.bguils_po_token_method = "auto"
|
||||
# clears env vars for this test
|
||||
mocker.patch.dict(os.environ, {}, clear=True)
|
||||
extractor.setup_po_tokens()
|
||||
extractor.setup_token_generation_script.assert_not_called()
|
||||
assert "Proof of Origin Token method not explicitly set" in caplog.text
|
||||
|
||||
def test_po_token_script_always_calls_setup(self, extractor):
|
||||
extractor.bguils_po_token_method = "script"
|
||||
extractor.in_docker = False
|
||||
extractor.setup_po_tokens()
|
||||
extractor.setup_token_generation_script.assert_called_once()
|
||||
extractor.setup_token_generation_script.reset_mock()
|
||||
extractor.in_docker = True
|
||||
extractor.setup_po_tokens()
|
||||
extractor.setup_token_generation_script.assert_called_once()
|
||||
|
|
Ładowanie…
Reference in New Issue