Porównaj commity

...

108 Commity

Autor SHA1 Wiadomość Data
Haxy cf212d0a33
[ie/youtube] Add `mediaconnect` client (#9546)
Authored by: clienthax
2024-05-12 16:03:36 +00:00
alard 6db96268c5
[ie/TV5Monde] Fix extractor (#9143)
Closes #9118
Authored by: alard, seproDev

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
2024-05-11 23:58:15 +02:00
Eric Lam 800a43983e
[ie/EuroParlWebstream] Support new URL format (#9647)
Authored by: voidful, seproDev

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
2024-05-11 23:50:59 +02:00
DaPotato69 7e4259dff0
Better warning when requested subs format not found (#9873)
Closes #9760
Authored by: DaPotato69
2024-05-11 21:11:40 +00:00
Stefan Lobbenmeier f1f158976e
[cookies] Get chrome session cookies with `--cookies-from-browser` (#9747)
Partially addresses #5534
Authored by: StefanLobbenmeier
2024-05-11 17:25:39 +00:00
llamasblade 31b417e1d1
[ie/hytale] Use `CloudflareStreamIE` explicitly (#9672)
Authored by: llamasblade
2024-05-11 17:01:56 +00:00
Hugo Azevedo fc2879ecb0
[ie/alura] Fix extractor (#9658)
Authored by: hugohaa
2024-05-11 16:54:29 +00:00
rrgomes 0a1a8e3005
[ie/nfb] Fix extractors (#9650)
Authored by: rrgomes
2024-05-11 16:38:41 +00:00
c-basalt 4cc99d7b6c
[ie/BilibiliSpaceVideo] Fix extraction (#9905)
Closes #9892
Authored by: c-basalt
2024-05-10 22:34:53 +00:00
coletdjnz 3c7a287e28
[test] Add HTTP proxy tests (#9578)
Also fixes HTTPS proxies for curl_cffi

Authored by: coletdjnz
2024-05-11 10:06:58 +12:00
sepro 98d71d8c5e
[ie/commonmistakes] Raise error on blob URLs (#9897)
Authored by: seproDev
2024-05-10 19:20:55 +02:00
kclauhk 00a9f2e1f7
[ie/canalalpha] Fix extractor (#9675)
Authored by: kclauhk
2024-05-10 19:19:57 +02:00
Mozi 73f12119b5
[ie/netease:program] Improve `--no-playlist` message (#9488)
Authored by: pzhlkj6612
2024-05-10 19:13:35 +02:00
Alexandre Huot 6b54cccdcb
[ie/Qub] Fix extractor (#7019)
Closes #4989
Authored by: alexhuot1, dirkf
2024-05-08 22:10:06 +00:00
src-tinkerer c4b87dd885
[ie/ZenYandex] Fix extractor (#9813)
Closes #9803
Authored by: src-tinkerer
2024-05-08 21:27:30 +00:00
fireattack 2338827072
[ie/bilibili] Fix `--geo-verification-proxy` support (#9817)
Closes #9797
Authored by: fireattack
2024-05-08 21:24:44 +00:00
fireattack 06d52c8731
[ie/BilibiliSpaceVideo] Better error message (#9839)
Closes #9528
Authored by: fireattack
2024-05-08 21:09:38 +00:00
sepro df5c9e733a
[ie/vk] Improve format extraction (#9885)
Closes #5675
Authored by: seproDev
2024-05-08 23:02:22 +02:00
Mozi b38018b781
[ie/mixch] Extract comments (#9860)
Authored by: pzhlkj6612
2024-05-08 20:51:16 +00:00
Rasmus Antons 145dc6f656
[ie/boosty] Add cookies support (#9522)
Closes #9401
Authored by: RasmusAntons
2024-05-08 20:16:32 +00:00
bashonly 5904853ae5
[ie/crunchyroll] Support browser impersonation (#9857)
Closes #7442
Authored by: bashonly
2024-05-05 23:15:32 +00:00
Chris Caruso c8bf48f3a8
[ie/cbc.ca:player] Improve `_VALID_URL` (#9866)
Closes #9825
Authored by: carusocr
2024-05-05 23:02:24 +00:00
The-MAGI 351368cb9a
[ie/youporn] Fix extractor (#8827)
Closes #7967
Authored by: The-MAGI
2024-05-05 22:57:38 +00:00
sepro 96da952504
[core] Warn if lack of ffmpeg alters format selection (#9805)
Authored by: seproDev, pukkandan
2024-05-05 00:44:08 +02:00
bashonly bec9a59e8e
[networking] Add `extensions` attribute to `Response` (#9756)
CurlCFFIRH now provides an `impersonate` field in its responses' extensions

Authored by: bashonly
2024-05-04 22:19:42 +00:00
bashonly 036e0d92c6
[ie/patreon] Extract multiple embeds (#9850)
Closes #9848
Authored by: bashonly
2024-05-04 22:11:11 +00:00
bashonly cb2fb4a643
[ie/crunchyroll] Always make metadata available (#9772)
Closes #9750
Authored by: bashonly
2024-05-04 16:15:44 +00:00
bashonly 231c2eacc4
[ie/soundcloud] Extract `genres` (#9821)
Authored by: bashonly
2024-05-04 16:14:36 +00:00
bashonly c4853655cb
[ie/wrestleuniverse] Avoid partial stream formats (#9800)
Authored by: bashonly
2024-05-04 16:07:15 +00:00
Simon Sawicki ac817bc83e
[build] Migrate `linux_exe` to static musl builds (#9811)
Authored by: Grub4K, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2024-04-28 22:19:25 +00:00
bashonly 1a366403d9
[build] Run `macos_legacy` job on `macos-12` (#9804)
`macos-latest` has been bumped to `macos-14-arm64` which breaks the builds

Authored by: bashonly
2024-04-28 15:35:17 +00:00
Simon Sawicki 7e26bd53f9
[core/windows] Fix tests for `sys.executable` with spaces (Fix for 64766459e3)
Authored by: Grub4K
2024-04-28 15:47:55 +02:00
Simon Sawicki 64766459e3
[core/windows] Improve shell quoting and tests (#9802)
Authored by: Grub4K
2024-04-27 10:37:26 +02:00
bashonly 89f535e265
[ci] Fix `curl-cffi` installation (Bugfix for 02483bea1c)
Authored by: bashonly
2024-04-22 20:36:01 +00:00
bashonly ff38a011d5
[ie/crunchyroll] Fix auth and remove cookies support (#9749)
Closes #9745
Authored by: bashonly
2024-04-21 22:41:40 +00:00
bashonly 8056a3026e
[ie/theatercomplextown] Fix extractors (#9754)
Authored by: bashonly
2024-04-21 16:05:42 +00:00
Simon Sawicki 3ee1194288
[ie] Make `_search_nextjs_data` non fatal (#8937)
Authored by: Grub4K
2024-04-21 13:40:38 +02:00
bashonly e3b42d8b1b
[ie/facebook] Fix DASH formats extraction (#9734)
Closes #9720
Authored by: bashonly
2024-04-20 10:23:12 +00:00
bashonly c9ce57d9bf
[ie/patreon] Fix Vimeo embed extraction (#9712)
Fixes regression in 36b240f9a7

Closes #9709
Authored by: bashonly
2024-04-18 23:18:56 +00:00
bashonly 02483bea1c
[build] Normalize `curl_cffi` group to `curl-cffi` (#9698)
Closes #9682
Authored by: bashonly
2024-04-18 23:11:12 +00:00
bashonly 315b354429
[ie/afreecatv:live] Add `cdn` extractor-arg (#9666)
Closes #6497
Authored by: bashonly
2024-04-13 16:40:53 +00:00
bashonly 0c21c53885
[ie/jiosaavn] Extract via API and fix playlists (#9656)
Closes #9648
Authored by: bashonly
2024-04-13 16:08:25 +00:00
github-actions[bot] 168e72dcd3 Release 2024.04.09
Created by: Grub4K

:ci skip all :ci run dl
2024-04-09 17:03:28 +00:00
Simon Sawicki ff07792676
[core] Prevent RCE when using `--exec` with `%q` (CVE-2024-22423)
The shell escape function now properly escapes `%`, `\\` and `\n`. `utils.Popen` as well as `%q` output template expansion have been patched accordingly.

Prior to this fix using `--exec` together with `%q` when on Windows could cause remote code to execute. See https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-hjq6-52gw-2g7p for more details.

Authored by: Grub4K
2024-04-09 18:36:13 +02:00
bashonly 216f6a3cb5
[cleanup] Misc (#9426)
Authored by: bashonly, pukkandan
2024-04-09 16:12:26 +00:00
bashonly b19ae095fd
[build] Do not include `curl_cffi` in `macos_legacy` (#9653)
Authored by: bashonly
2024-04-08 23:20:58 +00:00
Simon Sawicki 9590cc6b47
Add new option `--progress-delta` (#9082)
Authored by: Grub4K
2024-04-08 22:47:38 +02:00
luiso1979 79a451e576
[networking] Respect `SSLKEYLOGFILE` environment variable (#9543)
Authored by: luiso1979
2024-04-08 21:53:30 +02:00
Leo Heitmann Ruiz df0e138fc0
[docs] Various manpage fixes
Authored by: leoheitmannruiz
2024-04-08 21:24:58 +02:00
bashonly 2e94602f24
[ie/jiosaavn] Support playlists (#9622)
Closes #9616
Authored by: bashonly
2024-04-07 20:55:46 +00:00
bashonly 4af9d5c2f6
[ie/nhk] Fix NHK World extractors (#9623)
Closes #9513
Authored by: bashonly
2024-04-07 16:59:38 +00:00
John Victor 36b240f9a7
[ie/patreon] Do not extract dead embed URLs (#9613)
Closes #8702
Authored by: johnvictorfs
2024-04-07 16:26:44 +00:00
bashonly fc53ec13ff
[ie/tiktok] Restore `carrier_region` API parameter (#9637)
Avoids some geo-blocks

Authored by: bashonly
2024-04-07 15:32:11 +00:00
Dmitry Meyer 2ab2651a4a
[cookies] Add `--cookies-from-browser` support for Firefox Flatpak (#9619)
Authored by: un-def
2024-04-07 15:28:59 +00:00
bashonly b15b0c1d21
[ie/vkplay] Fix `_VALID_URL` (#9636)
Closes #9635
Authored by: bashonly
2024-04-06 20:42:51 +00:00
bashonly c8a61a9100
[ie/kick] Support browser impersonation (#9611)
Closes #6748
Authored by: bashonly
2024-04-06 17:42:32 +00:00
Mozi f2fd449b46
[ie/joqrag] Fix live status detection (#9624)
Authored by: pzhlkj6612
2024-04-06 17:34:51 +00:00
Tomoka1 9415f1a5ef
[ie/afreecatv] Overhaul extractor (#9566)
Closes #4592, Closes #8862, Closes #9544
Authored by: bashonly, Tomoka1

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2024-04-06 17:23:16 +00:00
bashonly a48cc86d6f
[ie/dropbox] Fix formats extraction (#9627)
Closes #9533
Authored by: bashonly
2024-04-06 17:19:44 +00:00
bytedream 954e57e405
[ie/crunchyroll] Fix extractor (#9615)
Authored by: bytedream
2024-04-06 12:53:20 +02:00
Dong Heon Hee 9073ae6458
[ie/afreecatv:live] Fix extractor (#9348)
Closes #4466, Closes #9345
Authored by: hui1601
2024-04-04 16:48:05 +00:00
Offert4324 4cd9e251b9
[ie/medici] Fix extractor (#9518)
Closes #8813
Authored by: Offert4324
2024-04-04 16:45:19 +00:00
bashonly 0ae16ceb18
[ie/jiosaavn] Extract artists (#9612)
Closes #9607
Authored by: bashonly
2024-04-03 23:23:04 +00:00
bashonly 443e206ec4
[ie/jiosaavn] Fix format extensions (#9609)
Authored by: bashonly
2024-04-03 23:21:28 +00:00
bashonly 4c3b7a0769
[ie/mixch] Fix extractor (#9608)
Closes #9536
Authored by: bashonly, nipotan
2024-04-03 22:53:42 +00:00
bashonly 16be117729
Add option `--no-break-on-existing` (#9610)
Authored by: bashonly
2024-04-03 22:51:41 +00:00
trainman261 b49d5ffc53
[ie/cbc.ca:player] Support new URL format (#9561)
Closes #9534
Authored by: trainman261
2024-04-03 19:11:13 +00:00
HobbyistDev 36baaa10e0
[ie/Radio1Be] Add extractor (#9122)
Closes #8707
Authored by: HobbyistDev
2024-04-03 18:51:14 +00:00
Kacper Michajłow 02f93ff51b
[ie/twitch] Extract AV1 and HEVC formats (#9158)
Authored by: kasper93
2024-04-03 18:38:51 +00:00
Mozi c59de48e2b
[ie/mixch:archive] Fix extractor (#8761)
Closes #2373
Authored by: pzhlkj6612
2024-04-01 22:41:09 +00:00
Mozi 0284f1fee2
[ie/asobistage] Add extractor (#8735)
Authored by: pzhlkj6612
2024-04-01 22:29:14 +00:00
bashonly e8032503b9
[build] Print SHA sums to GHA logs (#9582)
Authored by: bashonly
2024-04-01 17:02:25 +00:00
bashonly 97362712a1
[ie/soundcloud] Support cookies (#9586)
Closes #997
Authored by: bashonly
2024-04-01 16:58:48 +00:00
bashonly 246571ae1d
[ie/soundcloud] Support retries for API rate-limit (#9585)
Authored by: bashonly
2024-04-01 16:21:46 +00:00
Simon Sawicki 32abfb00bd
[utils] `traverse_obj`: Convenience improvements (#9577)
Add support for:
- `http.cookies.Morsel`
- Multi type filters (`{type, type}`)

Authored by: Grub4K
2024-04-01 02:12:03 +02:00
pukkandan c305a25c1b
[cleanup] Standardize `import datetime as dt` (#8978) 2024-04-01 05:32:15 +05:30
pukkandan e3a3ed8a98
[ie, cleanup] No `from` stdlib imports in extractors (#8978) 2024-04-01 05:31:09 +05:30
pukkandan a25a424323
[ie/youtube] Calculate more accurate `filesize`
YouTube provides slightly different duration for each format.
Calculating file-size based on this duration instead of the
video duration gives more accurate results.

Ref: https://github.com/yt-dlp/yt-dlp/issues/1400#issuecomment-2007441207
2024-04-01 04:56:09 +05:30
sepro 86e3b82261
[core] Fix `filesize_approx` calculation (#9560)
Reverts 22e4dfacb6

Despite being documented as `Kbit/s`, the extractors/manifests were returning bitrates in SI units of kilobits/sec.

Authored by: seproDev, pukkandan
2024-04-01 04:47:24 +05:30
pukkandan e7b17fce14
[ie/youtube] Update `android` params
Discovered by LuanRT - https://github.com/LuanRT/YouTube.js/pull/624

Closes #9554
2024-04-01 01:31:53 +05:30
bashonly a2d0840739
[ie/soundcloud] Adjust format sorting (#9584)
- Adapt to 86a972033e

Authored by: bashonly
2024-03-31 20:01:33 +00:00
pukkandan 86a972033e
Infer `acodec` for single-codec containers 2024-03-31 22:50:21 +05:30
bashonly 50c2935231
[ie] Add extractor impersonate API (#9474)
Authored by: bashonly, Grub4K, pukkandan
2024-03-30 23:18:07 +00:00
bashonly 0df63cce69
[ie/thisoldhouse] Support Brightcove embeds (#9576)
Closes #9570
Authored by: bashonly
2024-03-30 23:06:20 +00:00
bashonly 63f685f341
[ie/tiktok] Prefer non-bytevc2 formats (#9575)
Closes #9567
Authored by: bashonly
2024-03-30 22:54:00 +00:00
Simon Sawicki 3699eeb67c
[utils] `traverse_obj`: Allow unbranching using `all` and `any` (#9571)
Authored by: Grub4K
2024-03-30 19:54:43 +01:00
Simon Sawicki 979ce2e786
[test] `traversal`: Separate traversal tests (#9574)
Authored by: Grub4K
2024-03-30 19:32:07 +01:00
bashonly 58dd0f8d1e
[build] Optional dependencies cleanup (#9550)
Authored by: bashonly
2024-03-29 23:24:40 +00:00
bashonly cb61e20c26
[ie/tiktok] Fix API extraction (#9548)
Closes #9506
Authored by: bashonly, Grub4K

Co-authored-by: Simon Sawicki <contact@grub4k.xyz>
2024-03-29 23:20:14 +00:00
bashonly 9c42b7eef5
[fd/ffmpeg] Accept output args from info dict (#9278)
Authored by: bashonly
2024-03-29 23:16:46 +00:00
coletdjnz e5d4f11104
[rh:websockets] Workaround race condition causing issues on PyPy (#9514)
Authored by: coletdjnz
2024-03-23 11:27:10 +13:00
src-tinkerer bc2b8c0596
[ie/fathom] Add extractor (#9495)
Closes #8541
Authored by: src-tinkerer
2024-03-22 14:31:01 +00:00
sta1us aa7e9ae4f4
[ie/xvideos] Support new URL format (#9493) (#9502)
Closes #9493
Authored by: sta1us
2024-03-22 14:28:09 +00:00
Shreyas Minocha 07f5b2f757
[ie/box] Support URLs without file IDs (#9504)
Authored by: shreyasminocha
2024-03-20 23:26:37 +00:00
Daniel Vogt ff349ff94a
[ie/sharepoint] Add extractor (#6531)
Authored by: C0D3D3V, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2024-03-20 23:20:50 +00:00
Hasan Rüzgar f859ed3ba1
[ie/loom] Add extractors (#8686)
Closes #3715
Authored by: bashonly, hruzgar

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2024-03-20 23:14:37 +00:00
Aron Buzinkay 17d248a587
[ie/youtube:search] Fix params for uncensored results (#9456)
Closes #9156
Authored by: alb, pukkandan
2024-03-19 23:25:04 +00:00
sepro 388c979ac6
[docs] Update yt-dlp tagline (#9481)
Authored by: seproDev, bashonly, coletdjnz, Grub4K, pukkandan
2024-03-19 18:14:04 +01:00
sepro 22e4dfacb6
[ie/youtube] Fix tbr calculation (#9489)
Authored by: pukkandan

Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
2024-03-18 18:07:22 +01:00
Trustin 86d2f4d248
[ie/imgur] Fix extraction (#9471)
Closes #9458
Authored by: trwstin
2024-03-17 05:04:55 +00:00
coletdjnz 52f5be1f1e
[rh:curlcffi] Add support for `curl_cffi`
Authored by: coletdjnz, Grub4K, pukkandan, bashonly

Co-authored-by: Simon Sawicki <contact@grub4k.xyz>
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
Co-authored-by: bashonly <bashonly@protonmail.com>
2024-03-16 23:15:11 -05:00
coletdjnz 0b81d4d252
Add new options `--impersonate` and `--list-impersonate-targets`
Authored by: coletdjnz, Grub4K, pukkandan, bashonly

Co-authored-by: Simon Sawicki <contact@grub4k.xyz>
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
Co-authored-by: bashonly <bashonly@protonmail.com>
2024-03-16 23:14:13 -05:00
coletdjnz f849d77ab5
[test] Workaround websocket server hanging (#9467)
Authored by: coletdjnz
2024-03-16 16:57:21 +13:00
bashonly f2868b26e9
[ie/SonyLIVSeries] Fix season extraction (#9423)
Authored by: bashonly
2024-03-14 23:21:27 +00:00
bashonly be77923ffe
[ie/crunchyroll] Extract `vo_adaptive_hls` formats by default (#9447)
Closes #9439
Authored by: bashonly
2024-03-14 21:42:35 +00:00
bashonly 8c05b3ebae
[ie/tiktok] Update API hostname (#9444)
Closes #9441
Authored by: bashonly
2024-03-14 21:35:46 +00:00
jazz1611 0da66980d3
[ie/gofile] Fix extractor (#9446)
Authored by: jazz1611
2024-03-14 21:34:10 +00:00
bashonly 17b96974a3
[build] Update changelog for tarball and sdist (#9425)
Closes #9417
Authored by: bashonly
2024-03-14 21:10:20 +00:00
127 zmienionych plików z 5582 dodań i 2114 usunięć

10
.github/banner.svg vendored

File diff suppressed because one or more lines are too long

Przed

Szerokość:  |  Wysokość:  |  Rozmiar: 24 KiB

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 15 KiB

Wyświetl plik

@ -12,6 +12,9 @@ on:
unix:
default: true
type: boolean
linux_static:
default: true
type: boolean
linux_arm:
default: true
type: boolean
@ -27,9 +30,6 @@ on:
windows32:
default: true
type: boolean
meta_files:
default: true
type: boolean
origin:
required: false
default: ''
@ -52,7 +52,11 @@ on:
default: stable
type: string
unix:
description: yt-dlp, yt-dlp.tar.gz, yt-dlp_linux, yt-dlp_linux.zip
description: yt-dlp, yt-dlp.tar.gz
default: true
type: boolean
linux_static:
description: yt-dlp_linux
default: true
type: boolean
linux_arm:
@ -75,10 +79,6 @@ on:
description: yt-dlp_x86.exe
default: true
type: boolean
meta_files:
description: SHA2-256SUMS, SHA2-512SUMS, _update_spec
default: true
type: boolean
origin:
description: Origin
required: false
@ -107,60 +107,31 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Needed for changelog
- uses: actions/setup-python@v5
with:
python-version: "3.10"
- uses: conda-incubator/setup-miniconda@v3
with:
miniforge-variant: Mambaforge
use-mamba: true
channels: conda-forge
auto-update-conda: true
activate-environment: ""
auto-activate-base: false
- name: Install Requirements
run: |
sudo apt -y install zip pandoc man sed
cat > ./requirements.txt << EOF
python=3.10.*
brotli-python
EOF
python devscripts/install_deps.py --print \
--exclude brotli --exclude brotlicffi \
--include secretstorage --include pyinstaller >> ./requirements.txt
mamba create -n build --file ./requirements.txt
- name: Prepare
run: |
python devscripts/update-version.py -c "${{ inputs.channel }}" -r "${{ needs.process.outputs.origin }}" "${{ inputs.version }}"
python devscripts/update_changelog.py -vv
python devscripts/make_lazy_extractors.py
- name: Build Unix platform-independent binary
run: |
make all tar
- name: Build Unix standalone binary
shell: bash -l {0}
run: |
unset LD_LIBRARY_PATH # Harmful; set by setup-python
conda activate build
python -m bundle.pyinstaller --onedir
(cd ./dist/yt-dlp_linux && zip -r ../yt-dlp_linux.zip .)
python -m bundle.pyinstaller
mv ./dist/yt-dlp_linux ./yt-dlp_linux
mv ./dist/yt-dlp_linux.zip ./yt-dlp_linux.zip
- name: Verify --update-to
if: vars.UPDATE_TO_VERIFICATION
run: |
binaries=("yt-dlp" "yt-dlp_linux")
for binary in "${binaries[@]}"; do
chmod +x ./${binary}
cp ./${binary} ./${binary}_downgraded
version="$(./${binary} --version)"
./${binary}_downgraded -v --update-to yt-dlp/yt-dlp@2023.03.04
downgraded_version="$(./${binary}_downgraded --version)"
[[ "$version" != "$downgraded_version" ]]
done
chmod +x ./yt-dlp
cp ./yt-dlp ./yt-dlp_downgraded
version="$(./yt-dlp --version)"
./yt-dlp_downgraded -v --update-to yt-dlp/yt-dlp@2023.03.04
downgraded_version="$(./yt-dlp_downgraded --version)"
[[ "$version" != "$downgraded_version" ]]
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
@ -168,8 +139,39 @@ jobs:
path: |
yt-dlp
yt-dlp.tar.gz
yt-dlp_linux
yt-dlp_linux.zip
compression-level: 0
linux_static:
needs: process
if: inputs.linux_static
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build static executable
env:
channel: ${{ inputs.channel }}
origin: ${{ needs.process.outputs.origin }}
version: ${{ inputs.version }}
run: |
mkdir ~/build
cd bundle/docker
docker compose up --build static
sudo chown "${USER}:docker" ~/build/yt-dlp_linux
- name: Verify --update-to
if: vars.UPDATE_TO_VERIFICATION
run: |
chmod +x ~/build/yt-dlp_linux
cp ~/build/yt-dlp_linux ~/build/yt-dlp_linux_downgraded
version="$(~/build/yt-dlp_linux --version)"
~/build/yt-dlp_linux_downgraded -v --update-to yt-dlp/yt-dlp@2023.03.04
downgraded_version="$(~/build/yt-dlp_linux_downgraded --version)"
[[ "$version" != "$downgraded_version" ]]
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: build-bin-${{ github.job }}
path: |
~/build/yt-dlp_linux
compression-level: 0
linux_arm:
@ -247,6 +249,22 @@ jobs:
python3 devscripts/install_deps.py --print --include pyinstaller > requirements.txt
# We need to ignore wheels otherwise we break universal2 builds
python3 -m pip install -U --user --no-binary :all: -r requirements.txt
# We need to fuse our own universal2 wheels for curl_cffi
python3 -m pip install -U --user delocate
mkdir curl_cffi_whls curl_cffi_universal2
python3 devscripts/install_deps.py --print -o --include curl-cffi > requirements.txt
for platform in "macosx_11_0_arm64" "macosx_11_0_x86_64"; do
python3 -m pip download \
--only-binary=:all: \
--platform "${platform}" \
--pre -d curl_cffi_whls \
-r requirements.txt
done
python3 -m delocate.cmd.delocate_fuse curl_cffi_whls/curl_cffi*.whl -w curl_cffi_universal2
python3 -m delocate.cmd.delocate_fuse curl_cffi_whls/cffi*.whl -w curl_cffi_universal2
cd curl_cffi_universal2
for wheel in *cffi*.whl; do mv -n -- "${wheel}" "${wheel/x86_64/universal2}"; done
python3 -m pip install -U --user *cffi*.whl
- name: Prepare
run: |
@ -280,7 +298,7 @@ jobs:
macos_legacy:
needs: process
if: inputs.macos_legacy
runs-on: macos-latest
runs-on: macos-12
steps:
- uses: actions/checkout@v4
@ -342,7 +360,7 @@ jobs:
- name: Install Requirements
run: | # Custom pyinstaller built with https://github.com/yt-dlp/pyinstaller-builds
python devscripts/install_deps.py -o --include build
python devscripts/install_deps.py --include py2exe
python devscripts/install_deps.py --include py2exe --include curl-cffi
python -m pip install -U "https://yt-dlp.github.io/Pyinstaller-Builds/x86_64/pyinstaller-5.8.0-py3-none-any.whl"
- name: Prepare
@ -427,10 +445,11 @@ jobs:
compression-level: 0
meta_files:
if: inputs.meta_files && always() && !cancelled()
if: always() && !cancelled()
needs:
- process
- unix
- linux_static
- linux_arm
- macos
- macos_legacy
@ -447,8 +466,9 @@ jobs:
- name: Make SHA2-SUMS files
run: |
cd ./artifact/
sha256sum * > ../SHA2-256SUMS
sha512sum * > ../SHA2-512SUMS
# make sure SHA sums are also printed to stdout
sha256sum * | tee ../SHA2-256SUMS
sha512sum * | tee ../SHA2-512SUMS
- name: Make Update spec
run: |

Wyświetl plik

@ -53,7 +53,7 @@ jobs:
with:
python-version: ${{ matrix.python-version }}
- name: Install test requirements
run: python3 ./devscripts/install_deps.py --include dev
run: python3 ./devscripts/install_deps.py --include dev --include curl-cffi
- name: Run tests
continue-on-error: False
run: |

Wyświetl plik

@ -27,6 +27,8 @@ jobs:
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.8'
- name: Install flake8
run: python3 ./devscripts/install_deps.py -o --include dev
- name: Make lazy extractors

Wyświetl plik

@ -189,13 +189,8 @@ jobs:
if: |
!inputs.prerelease && env.target_repo == github.repository
run: |
python devscripts/update_changelog.py -vv
make doc
sed '/### /Q' Changelog.md >> ./CHANGELOG
echo '### ${{ env.version }}' >> ./CHANGELOG
python ./devscripts/make_changelog.py -vv -c >> ./CHANGELOG
echo >> ./CHANGELOG
grep -Poz '(?s)### \d+\.\d+\.\d+.+' 'Changelog.md' | head -n -1 >> ./CHANGELOG
cat ./CHANGELOG > Changelog.md
- name: Push to release
id: push_release
@ -266,6 +261,7 @@ jobs:
pypi_project: ${{ needs.prepare.outputs.pypi_project }}
run: |
python devscripts/update-version.py -c "${{ env.channel }}" -r "${{ env.target_repo }}" -s "${{ env.suffix }}" "${{ env.version }}"
python devscripts/update_changelog.py -vv
python devscripts/make_lazy_extractors.py
sed -i -E '0,/(name = ")[^"]+(")/s//\1${{ env.pypi_project }}\2/' pyproject.toml

Wyświetl plik

@ -600,3 +600,13 @@ xpadev-net
Xpl0itU
YoshichikaAAA
zhijinwuu
alb
hruzgar
kasper93
leoheitmannruiz
luiso1979
nipotan
Offert4324
sta1us
Tomoka1
trwstin

Wyświetl plik

@ -4,6 +4,101 @@
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
-->
### 2024.04.09
#### Important changes
- Security: [[CVE-2024-22423](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-22423)] [Prevent RCE when using `--exec` with `%q` on Windows](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-hjq6-52gw-2g7p)
- The shell escape function now properly escapes `%`, `\` and `\n`.
- `utils.Popen` has been patched accordingly.
#### Core changes
- [Add new option `--progress-delta`](https://github.com/yt-dlp/yt-dlp/commit/9590cc6b4768e190183d7d071a6c78170889116a) ([#9082](https://github.com/yt-dlp/yt-dlp/issues/9082)) by [Grub4K](https://github.com/Grub4K)
- [Add new options `--impersonate` and `--list-impersonate-targets`](https://github.com/yt-dlp/yt-dlp/commit/0b81d4d252bd065ccd352722987ea34fe17f9244) by [bashonly](https://github.com/bashonly), [coletdjnz](https://github.com/coletdjnz), [Grub4K](https://github.com/Grub4K), [pukkandan](https://github.com/pukkandan)
- [Add option `--no-break-on-existing`](https://github.com/yt-dlp/yt-dlp/commit/16be117729150b2784f3b17755c886cb0cf73374) ([#9610](https://github.com/yt-dlp/yt-dlp/issues/9610)) by [bashonly](https://github.com/bashonly)
- [Fix `filesize_approx` calculation](https://github.com/yt-dlp/yt-dlp/commit/86e3b82261e8ebc6c6707c09544c9dfb8907c0fd) ([#9560](https://github.com/yt-dlp/yt-dlp/issues/9560)) by [pukkandan](https://github.com/pukkandan), [seproDev](https://github.com/seproDev)
- [Infer `acodec` for single-codec containers](https://github.com/yt-dlp/yt-dlp/commit/86a972033e05fea80e5fe7f2aff6723dbe2f3952) by [pukkandan](https://github.com/pukkandan)
- [Prevent RCE when using `--exec` with `%q` (CVE-2024-22423)](https://github.com/yt-dlp/yt-dlp/commit/ff07792676f404ffff6ee61b5638c9dc1a33a37a) by [Grub4K](https://github.com/Grub4K)
- **cookies**: [Add `--cookies-from-browser` support for Firefox Flatpak](https://github.com/yt-dlp/yt-dlp/commit/2ab2651a4a7be18939e2b4cb21be79fe477c797a) ([#9619](https://github.com/yt-dlp/yt-dlp/issues/9619)) by [un-def](https://github.com/un-def)
- **utils**
- `traverse_obj`
- [Allow unbranching using `all` and `any`](https://github.com/yt-dlp/yt-dlp/commit/3699eeb67cad333272b14a42dd3843d93fda1a2e) ([#9571](https://github.com/yt-dlp/yt-dlp/issues/9571)) by [Grub4K](https://github.com/Grub4K)
- [Convenience improvements](https://github.com/yt-dlp/yt-dlp/commit/32abfb00bdbd119ca675fdc6d1719331f0a2741a) ([#9577](https://github.com/yt-dlp/yt-dlp/issues/9577)) by [Grub4K](https://github.com/Grub4K)
#### Extractor changes
- [Add extractor impersonate API](https://github.com/yt-dlp/yt-dlp/commit/50c29352312f5662acf9a64b0012766f5c40af61) ([#9474](https://github.com/yt-dlp/yt-dlp/issues/9474)) by [bashonly](https://github.com/bashonly), [Grub4K](https://github.com/Grub4K), [pukkandan](https://github.com/pukkandan)
- **afreecatv**
- [Overhaul extractor](https://github.com/yt-dlp/yt-dlp/commit/9415f1a5ef88482ebafe3083e8bcb778ac512df7) ([#9566](https://github.com/yt-dlp/yt-dlp/issues/9566)) by [bashonly](https://github.com/bashonly), [Tomoka1](https://github.com/Tomoka1)
- live: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/9073ae6458f4c6a832aa832c67174c61852869be) ([#9348](https://github.com/yt-dlp/yt-dlp/issues/9348)) by [hui1601](https://github.com/hui1601)
- **asobistage**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/0284f1fee202302a78888420f933deae19d9f4e1) ([#8735](https://github.com/yt-dlp/yt-dlp/issues/8735)) by [pzhlkj6612](https://github.com/pzhlkj6612)
- **box**: [Support URLs without file IDs](https://github.com/yt-dlp/yt-dlp/commit/07f5b2f7570fd9ac85aed17f4c0118f6eac77beb) ([#9504](https://github.com/yt-dlp/yt-dlp/issues/9504)) by [shreyasminocha](https://github.com/shreyasminocha)
- **cbc.ca**: player: [Support new URL format](https://github.com/yt-dlp/yt-dlp/commit/b49d5ffc53a72d8245ba319ff07bdc5b8c6a4f0c) ([#9561](https://github.com/yt-dlp/yt-dlp/issues/9561)) by [trainman261](https://github.com/trainman261)
- **crunchyroll**
- [Extract `vo_adaptive_hls` formats by default](https://github.com/yt-dlp/yt-dlp/commit/be77923ffe842f667971019460f6005f3cad01eb) ([#9447](https://github.com/yt-dlp/yt-dlp/issues/9447)) by [bashonly](https://github.com/bashonly)
- [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/954e57e405f79188450eb30103a9308732cd318f) ([#9615](https://github.com/yt-dlp/yt-dlp/issues/9615)) by [bytedream](https://github.com/bytedream)
- **dropbox**: [Fix formats extraction](https://github.com/yt-dlp/yt-dlp/commit/a48cc86d6f6b20427553620c2ddb990ede6a4b41) ([#9627](https://github.com/yt-dlp/yt-dlp/issues/9627)) by [bashonly](https://github.com/bashonly)
- **fathom**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/bc2b8c0596fd6b75af24822c4f0f1da6783d71f7) ([#9495](https://github.com/yt-dlp/yt-dlp/issues/9495)) by [src-tinkerer](https://github.com/src-tinkerer)
- **gofile**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/0da66980d3193cad3dae0120cddddbfcabddf7a1) ([#9446](https://github.com/yt-dlp/yt-dlp/issues/9446)) by [jazz1611](https://github.com/jazz1611)
- **imgur**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/86d2f4d24849af0d1f3af7c0e2ac43bf8a058f74) ([#9471](https://github.com/yt-dlp/yt-dlp/issues/9471)) by [trwstin](https://github.com/trwstin)
- **jiosaavn**
- [Extract artists](https://github.com/yt-dlp/yt-dlp/commit/0ae16ceb1846cc4e609b70ce7c5d8e7458efceb2) ([#9612](https://github.com/yt-dlp/yt-dlp/issues/9612)) by [bashonly](https://github.com/bashonly)
- [Fix format extensions](https://github.com/yt-dlp/yt-dlp/commit/443e206ec41e64ca2aef61d8ef91640fb69b3113) ([#9609](https://github.com/yt-dlp/yt-dlp/issues/9609)) by [bashonly](https://github.com/bashonly)
- [Support playlists](https://github.com/yt-dlp/yt-dlp/commit/2e94602f241f6e41bdc48576c61089435529339b) ([#9622](https://github.com/yt-dlp/yt-dlp/issues/9622)) by [bashonly](https://github.com/bashonly)
- **joqrag**: [Fix live status detection](https://github.com/yt-dlp/yt-dlp/commit/f2fd449b46c4058222e1744f7a35caa20b2d003d) ([#9624](https://github.com/yt-dlp/yt-dlp/issues/9624)) by [pzhlkj6612](https://github.com/pzhlkj6612)
- **kick**: [Support browser impersonation](https://github.com/yt-dlp/yt-dlp/commit/c8a61a910096c77ce08dad5e1b2fbda5eb964156) ([#9611](https://github.com/yt-dlp/yt-dlp/issues/9611)) by [bashonly](https://github.com/bashonly)
- **loom**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/f859ed3ba1e8b129ae6a467592c65687e73fbca1) ([#8686](https://github.com/yt-dlp/yt-dlp/issues/8686)) by [bashonly](https://github.com/bashonly), [hruzgar](https://github.com/hruzgar)
- **medici**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/4cd9e251b9abada107b10830de997bf4d79ca369) ([#9518](https://github.com/yt-dlp/yt-dlp/issues/9518)) by [Offert4324](https://github.com/Offert4324)
- **mixch**
- [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/4c3b7a0769706f7f0ea24adf1f219d5ae82d2b07) ([#9608](https://github.com/yt-dlp/yt-dlp/issues/9608)) by [bashonly](https://github.com/bashonly), [nipotan](https://github.com/nipotan)
- archive: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/c59de48e2bb4c681b03b93b584a05f52609ce4a0) ([#8761](https://github.com/yt-dlp/yt-dlp/issues/8761)) by [pzhlkj6612](https://github.com/pzhlkj6612)
- **nhk**: [Fix NHK World extractors](https://github.com/yt-dlp/yt-dlp/commit/4af9d5c2f6aa81403ae2a8a5ae3cc824730f0b86) ([#9623](https://github.com/yt-dlp/yt-dlp/issues/9623)) by [bashonly](https://github.com/bashonly)
- **patreon**: [Do not extract dead embed URLs](https://github.com/yt-dlp/yt-dlp/commit/36b240f9a72af57eb2c9d927ebb7fd1c917ebf18) ([#9613](https://github.com/yt-dlp/yt-dlp/issues/9613)) by [johnvictorfs](https://github.com/johnvictorfs)
- **radio1be**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/36baaa10e06715ccba06b78885b2042c4844c826) ([#9122](https://github.com/yt-dlp/yt-dlp/issues/9122)) by [HobbyistDev](https://github.com/HobbyistDev)
- **sharepoint**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/ff349ff94aae0b2b148bd3670f7c91d39c2f1d8e) ([#6531](https://github.com/yt-dlp/yt-dlp/issues/6531)) by [bashonly](https://github.com/bashonly), [C0D3D3V](https://github.com/C0D3D3V)
- **sonylivseries**: [Fix season extraction](https://github.com/yt-dlp/yt-dlp/commit/f2868b26e917354203f82a370ad2396646edb813) ([#9423](https://github.com/yt-dlp/yt-dlp/issues/9423)) by [bashonly](https://github.com/bashonly)
- **soundcloud**
- [Adjust format sorting](https://github.com/yt-dlp/yt-dlp/commit/a2d0840739cddd585d24e0ce4796394fc8a4fa2e) ([#9584](https://github.com/yt-dlp/yt-dlp/issues/9584)) by [bashonly](https://github.com/bashonly)
- [Support cookies](https://github.com/yt-dlp/yt-dlp/commit/97362712a1f2b04e735bdf54f749ad99165a62fe) ([#9586](https://github.com/yt-dlp/yt-dlp/issues/9586)) by [bashonly](https://github.com/bashonly)
- [Support retries for API rate-limit](https://github.com/yt-dlp/yt-dlp/commit/246571ae1d867df8bf31a056bdf3bbbfd398366a) ([#9585](https://github.com/yt-dlp/yt-dlp/issues/9585)) by [bashonly](https://github.com/bashonly)
- **thisoldhouse**: [Support Brightcove embeds](https://github.com/yt-dlp/yt-dlp/commit/0df63cce69026d2f4c0cbb4dd36163e83eac93dc) ([#9576](https://github.com/yt-dlp/yt-dlp/issues/9576)) by [bashonly](https://github.com/bashonly)
- **tiktok**
- [Fix API extraction](https://github.com/yt-dlp/yt-dlp/commit/cb61e20c266facabb7a30f9ce53bd79dfc158475) ([#9548](https://github.com/yt-dlp/yt-dlp/issues/9548)) by [bashonly](https://github.com/bashonly), [Grub4K](https://github.com/Grub4K)
- [Prefer non-bytevc2 formats](https://github.com/yt-dlp/yt-dlp/commit/63f685f341f35f6f02b0368d1ba53bdb5b520410) ([#9575](https://github.com/yt-dlp/yt-dlp/issues/9575)) by [bashonly](https://github.com/bashonly)
- [Restore `carrier_region` API parameter](https://github.com/yt-dlp/yt-dlp/commit/fc53ec13ff1ee926a3e533a68cfca8acc887b661) ([#9637](https://github.com/yt-dlp/yt-dlp/issues/9637)) by [bashonly](https://github.com/bashonly)
- [Update API hostname](https://github.com/yt-dlp/yt-dlp/commit/8c05b3ebae23c5b444857549a85b84004c01a536) ([#9444](https://github.com/yt-dlp/yt-dlp/issues/9444)) by [bashonly](https://github.com/bashonly)
- **twitch**: [Extract AV1 and HEVC formats](https://github.com/yt-dlp/yt-dlp/commit/02f93ff51b3ff9436d60c4993562b366eaae8851) ([#9158](https://github.com/yt-dlp/yt-dlp/issues/9158)) by [kasper93](https://github.com/kasper93)
- **vkplay**: [Fix `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/b15b0c1d2106437ec61a5c436c543e8760eac160) ([#9636](https://github.com/yt-dlp/yt-dlp/issues/9636)) by [bashonly](https://github.com/bashonly)
- **xvideos**: [Support new URL format](https://github.com/yt-dlp/yt-dlp/commit/aa7e9ae4f48276bd5d0173966c77db9484f65a0a) ([#9502](https://github.com/yt-dlp/yt-dlp/issues/9502)) by [sta1us](https://github.com/sta1us)
- **youtube**
- [Calculate more accurate `filesize`](https://github.com/yt-dlp/yt-dlp/commit/a25a424323267e3f6f9f63c0b62df499bd7b8d46) by [pukkandan](https://github.com/pukkandan)
- [Update `android` params](https://github.com/yt-dlp/yt-dlp/commit/e7b17fce14775bd2448695c8eb7379b8d31d3537) by [pukkandan](https://github.com/pukkandan)
- search: [Fix params for uncensored results](https://github.com/yt-dlp/yt-dlp/commit/17d248a58781e2588d18a5ebe00c441d10011fcd) ([#9456](https://github.com/yt-dlp/yt-dlp/issues/9456)) by [alb](https://github.com/alb), [pukkandan](https://github.com/pukkandan)
#### Downloader changes
- **ffmpeg**: [Accept output args from info dict](https://github.com/yt-dlp/yt-dlp/commit/9c42b7eef547e826e9fcc7beb6706a2523949d05) ([#9278](https://github.com/yt-dlp/yt-dlp/issues/9278)) by [bashonly](https://github.com/bashonly)
#### Networking changes
- [Respect `SSLKEYLOGFILE` environment variable](https://github.com/yt-dlp/yt-dlp/commit/79a451e5763eda8b10d00684d5d3378f3255ee01) ([#9543](https://github.com/yt-dlp/yt-dlp/issues/9543)) by [luiso1979](https://github.com/luiso1979)
- **Request Handler**
- curlcffi: [Add support for `curl_cffi`](https://github.com/yt-dlp/yt-dlp/commit/52f5be1f1e0dc45bb397ab950f564721976a39bf) by [bashonly](https://github.com/bashonly), [coletdjnz](https://github.com/coletdjnz), [Grub4K](https://github.com/Grub4K), [pukkandan](https://github.com/pukkandan)
- websockets: [Workaround race condition causing issues on PyPy](https://github.com/yt-dlp/yt-dlp/commit/e5d4f11104ce7ea1717a90eea82c0f7d230ea5d5) ([#9514](https://github.com/yt-dlp/yt-dlp/issues/9514)) by [coletdjnz](https://github.com/coletdjnz)
#### Misc. changes
- **build**
- [Do not include `curl_cffi` in `macos_legacy`](https://github.com/yt-dlp/yt-dlp/commit/b19ae095fdddd43c2a2c67d10fbe0d9a645bb98f) ([#9653](https://github.com/yt-dlp/yt-dlp/issues/9653)) by [bashonly](https://github.com/bashonly)
- [Optional dependencies cleanup](https://github.com/yt-dlp/yt-dlp/commit/58dd0f8d1eee6bc9fdc57f1923bed772fa3c946d) ([#9550](https://github.com/yt-dlp/yt-dlp/issues/9550)) by [bashonly](https://github.com/bashonly)
- [Print SHA sums to GHA logs](https://github.com/yt-dlp/yt-dlp/commit/e8032503b9517465b0e86d776fc1e60d8795d673) ([#9582](https://github.com/yt-dlp/yt-dlp/issues/9582)) by [bashonly](https://github.com/bashonly)
- [Update changelog for tarball and sdist](https://github.com/yt-dlp/yt-dlp/commit/17b96974a334688f76b57d350e07cae8cda46877) ([#9425](https://github.com/yt-dlp/yt-dlp/issues/9425)) by [bashonly](https://github.com/bashonly)
- **cleanup**
- [Standardize `import datetime as dt`](https://github.com/yt-dlp/yt-dlp/commit/c305a25c1b16bcf7a5ec499c3b786ed1e2c748da) ([#8978](https://github.com/yt-dlp/yt-dlp/issues/8978)) by [pukkandan](https://github.com/pukkandan)
- ie: [No `from` stdlib imports in extractors](https://github.com/yt-dlp/yt-dlp/commit/e3a3ed8a981d9395c4859b6ef56cd02bc3148db2) by [pukkandan](https://github.com/pukkandan)
- Miscellaneous: [216f6a3](https://github.com/yt-dlp/yt-dlp/commit/216f6a3cb57824e6a3c859649ce058c199b1b247) by [bashonly](https://github.com/bashonly), [pukkandan](https://github.com/pukkandan)
- **docs**
- [Update yt-dlp tagline](https://github.com/yt-dlp/yt-dlp/commit/388c979ac63a8774339fac2516fe1cc852b4276e) ([#9481](https://github.com/yt-dlp/yt-dlp/issues/9481)) by [bashonly](https://github.com/bashonly), [coletdjnz](https://github.com/coletdjnz), [Grub4K](https://github.com/Grub4K), [pukkandan](https://github.com/pukkandan), [seproDev](https://github.com/seproDev)
- [Various manpage fixes](https://github.com/yt-dlp/yt-dlp/commit/df0e138fc02ae2764a44f2f59fc93c756c4d3ee2) by [leoheitmannruiz](https://github.com/leoheitmannruiz)
- **test**
- [Workaround websocket server hanging](https://github.com/yt-dlp/yt-dlp/commit/f849d77ab54788446b995d256e1ee0894c4fb927) ([#9467](https://github.com/yt-dlp/yt-dlp/issues/9467)) by [coletdjnz](https://github.com/coletdjnz)
- `traversal`: [Separate traversal tests](https://github.com/yt-dlp/yt-dlp/commit/979ce2e786f2ee3fc783b6dc1ef4188d8805c923) ([#9574](https://github.com/yt-dlp/yt-dlp/issues/9574)) by [Grub4K](https://github.com/Grub4K)
### 2024.03.10
#### Core changes

Wyświetl plik

@ -2,7 +2,7 @@ all: lazy-extractors yt-dlp doc pypi-files
clean: clean-test clean-dist
clean-all: clean clean-cache
completions: completion-bash completion-fish completion-zsh
doc: README.md CONTRIBUTING.md issuetemplates supportedsites
doc: README.md CONTRIBUTING.md CONTRIBUTORS issuetemplates supportedsites
ot: offlinetest
tar: yt-dlp.tar.gz
@ -10,9 +10,12 @@ tar: yt-dlp.tar.gz
# intended use: when building a source distribution,
# make pypi-files && python3 -m build -sn .
pypi-files: AUTHORS Changelog.md LICENSE README.md README.txt supportedsites \
completions yt-dlp.1 pyproject.toml setup.cfg devscripts/* test/*
completions yt-dlp.1 pyproject.toml setup.cfg devscripts/* test/*
.PHONY: all clean install test tar pypi-files completions ot offlinetest codetest supportedsites
.PHONY: all clean clean-all clean-test clean-dist clean-cache \
completions completion-bash completion-fish completion-zsh \
doc issuetemplates supportedsites ot offlinetest codetest test \
tar pypi-files lazy-extractors install uninstall
clean-test:
rm -rf test/testdata/sigs/player-*.js tmp/ *.annotations.xml *.aria2 *.description *.dump *.frag \
@ -156,5 +159,14 @@ yt-dlp.tar.gz: all
Makefile yt-dlp.1 README.txt completions .gitignore \
setup.cfg yt-dlp yt_dlp pyproject.toml devscripts test
AUTHORS:
git shortlog -s -n HEAD | cut -f2 | sort > AUTHORS
AUTHORS: Changelog.md
@if [ -d '.git' ] && command -v git > /dev/null ; then \
echo 'Generating $@ from git commit history' ; \
git shortlog -s -n HEAD | cut -f2 | sort > $@ ; \
fi
CONTRIBUTORS: Changelog.md
@if [ -d '.git' ] && command -v git > /dev/null ; then \
echo 'Updating $@ from git commit history' ; \
$(PYTHON) devscripts/make_changelog.py -v -c > /dev/null ; \
fi

Wyświetl plik

@ -17,7 +17,7 @@
</div>
<!-- MANPAGE: END EXCLUDED SECTION -->
yt-dlp is a [youtube-dl](https://github.com/ytdl-org/youtube-dl) fork based on the now inactive [youtube-dlc](https://github.com/blackjack4494/yt-dlc). The main focus of this project is adding new features and patches while also keeping up to date with the original project
yt-dlp is a feature-rich command-line audio/video downloader with support for [thousands of sites](supportedsites.md). The project is a fork of [youtube-dl](https://github.com/ytdl-org/youtube-dl) based on the now inactive [youtube-dlc](https://github.com/blackjack4494/yt-dlc).
<!-- MANPAGE: MOVE "USAGE AND OPTIONS" SECTION HERE -->
@ -158,6 +158,7 @@ When using `--update`/`-U`, a release binary will only update to its current cha
You may also use `--update-to <repository>` (`<owner>/<repository>`) to update to a channel on a completely different repository. Be careful with what repository you are updating to though, there is no verification done for binaries from different repositories.
Example usage:
* `yt-dlp --update-to master` switch to the `master` channel and update to its latest release
* `yt-dlp --update-to stable@2023.07.06` upgrade/downgrade to release to `stable` channel tag `2023.07.06`
* `yt-dlp --update-to 2023.10.07` upgrade/downgrade to tag `2023.10.07` if it exists on the current channel
@ -196,6 +197,15 @@ While all the other dependencies are optional, `ffmpeg` and `ffprobe` are highly
* [**websockets**](https://github.com/aaugustin/websockets)\* - For downloading over websocket. Licensed under [BSD-3-Clause](https://github.com/aaugustin/websockets/blob/main/LICENSE)
* [**requests**](https://github.com/psf/requests)\* - HTTP library. For HTTPS proxy and persistent connections support. Licensed under [Apache-2.0](https://github.com/psf/requests/blob/main/LICENSE)
#### Impersonation
The following provide support for impersonating browser requests. This may be required for some sites that employ TLS fingerprinting.
* [**curl_cffi**](https://github.com/yifeikong/curl_cffi) (recommended) - Python binding for [curl-impersonate](https://github.com/lwthiker/curl-impersonate). Provides impersonation targets for Chrome, Edge and Safari. Licensed under [MIT](https://github.com/yifeikong/curl_cffi/blob/main/LICENSE)
* Can be installed with the `curl-cffi` group, e.g. `pip install yt-dlp[default,curl-cffi]`
* Currently only included in `yt-dlp.exe` and `yt-dlp_macos` builds
### Metadata
* [**mutagen**](https://github.com/quodlibet/mutagen)\* - For `--embed-thumbnail` in certain formats. Licensed under [GPLv2+](https://github.com/quodlibet/mutagen/blob/master/COPYING)
@ -389,6 +399,10 @@ If you fork the project on GitHub, you can run your fork's [build workflow](.git
direct connection
--socket-timeout SECONDS Time to wait before giving up, in seconds
--source-address IP Client-side IP address to bind to
--impersonate CLIENT[:OS] Client to impersonate for requests. E.g.
chrome, chrome-110, chrome:windows-10. Pass
--impersonate="" to impersonate any client.
--list-impersonate-targets List available clients to impersonate.
-4, --force-ipv4 Make all connections via IPv4
-6, --force-ipv6 Make all connections via IPv6
--enable-file-urls Enable file:// URLs. This is disabled by
@ -468,6 +482,9 @@ If you fork the project on GitHub, you can run your fork's [build workflow](.git
--max-downloads NUMBER Abort after downloading NUMBER files
--break-on-existing Stop the download process when encountering
a file that is in the archive
--no-break-on-existing Do not stop the download process when
encountering a file that is in the archive
(default)
--break-per-input Alters --max-downloads, --break-on-existing,
--break-match-filter, and autonumber to
reset per input URL
@ -741,6 +758,7 @@ If you fork the project on GitHub, you can run your fork's [build workflow](.git
accessible under "progress" key. E.g.
--console-title --progress-template
"download-title:%(info.id)s-%(progress.eta)s"
--progress-delta SECONDS Time between progress output (default: 0)
-v, --verbose Print various debugging information
--dump-pages Print downloaded pages encoded using base64
to debug problems (very verbose)
@ -1459,9 +1477,9 @@ The following numeric meta fields can be used with comparisons `<`, `<=`, `>`, `
- `width`: Width of the video, if known
- `height`: Height of the video, if known
- `aspect_ratio`: Aspect ratio of the video, if known
- `tbr`: Average bitrate of audio and video in KBit/s
- `abr`: Average audio bitrate in KBit/s
- `vbr`: Average video bitrate in KBit/s
- `tbr`: Average bitrate of audio and video in [kbps](## "1000 bits/sec")
- `abr`: Average audio bitrate in [kbps](## "1000 bits/sec")
- `vbr`: Average video bitrate in [kbps](## "1000 bits/sec")
- `asr`: Audio sampling rate in Hertz
- `fps`: Frame rate
- `audio_channels`: The number of audio channels
@ -1486,7 +1504,7 @@ Any string comparison may be prefixed with negation `!` in order to produce an o
**Note**: None of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the website. Any other field made available by the extractor can also be used for filtering.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "bv[height<=?720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. You can also use the filters with `all` to download all formats that satisfy the filter, e.g. `-f "all[vcodec=none]"` selects all audio-only formats.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "bv[height<=?720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 kbps. You can also use the filters with `all` to download all formats that satisfy the filter, e.g. `-f "all[vcodec=none]"` selects all audio-only formats.
Format selectors can also be grouped using parentheses; e.g. `-f "(mp4,webm)[height<480]"` will download the best pre-merged mp4 and webm formats with a height lower than 480.
@ -1518,10 +1536,10 @@ The available fields are:
- `fps`: Framerate of video
- `hdr`: The dynamic range of the video (`DV` > `HDR12` > `HDR10+` > `HDR10` > `HLG` > `SDR`)
- `channels`: The number of audio channels
- `tbr`: Total average bitrate in KBit/s
- `vbr`: Average video bitrate in KBit/s
- `abr`: Average audio bitrate in KBit/s
- `br`: Average bitrate in KBit/s, `tbr`/`vbr`/`abr`
- `tbr`: Total average bitrate in [kbps](## "1000 bits/sec")
- `vbr`: Average video bitrate in [kbps](## "1000 bits/sec")
- `abr`: Average audio bitrate in [kbps](## "1000 bits/sec")
- `br`: Average bitrate in [kbps](## "1000 bits/sec"), `tbr`/`vbr`/`abr`
- `asr`: Audio sample rate in Hz
**Deprecation warning**: Many of these fields have (currently undocumented) aliases, that may be removed in a future version. It is recommended to use only the documented field names.
@ -1742,7 +1760,7 @@ The following extractors use this feature:
#### youtube
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The main clients are `web`, `android` and `ios` with variants `_music`, `_embedded`, `_embedscreen`, `_creator` (e.g. `web_embedded`); and `mweb`, `mweb_embedscreen` and `tv_embedded` (agegate bypass) with no variants. By default, `ios,android,web` is used, but `tv_embedded` and `creator` variants are added as required for age-gated videos. Similarly, the music variants are added for `music.youtube.com` urls. You can use `all` to use all the clients, and `default` for the default clients.
* `player_client`: Clients to extract video data from. The main clients are `web`, `android` and `ios` with variants `_music`, `_embedded`, `_embedscreen`, `_creator` (e.g. `web_embedded`); and `mweb`, `mweb_embedscreen`, `mediaconnect` and `tv_embedded` (agegate bypass) with no variants. By default, `ios,android,web` is used, but `tv_embedded` and `creator` variants are added as required for age-gated videos. Similarly, the music variants are added for `music.youtube.com` urls. You can use `all` to use all the clients, and `default` for the default clients.
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
@ -1768,8 +1786,7 @@ The following extractors use this feature:
* `version`: The video version to extract - `uncut` or `simulcast`
#### crunchyrollbeta (Crunchyroll)
* `format`: Which stream type(s) to extract (default: `adaptive_hls`). Potentially useful values include `adaptive_hls`, `adaptive_dash`, `vo_adaptive_hls`, `vo_adaptive_dash`, `download_hls`, `download_dash`, `multitrack_adaptive_hls_v2`
* `hardsub`: Preference order for which hardsub versions to extract, or `all` (default: `None` = no hardsubs), e.g. `crunchyrollbeta:hardsub=en-US,None`
* `hardsub`: One or more hardsub versions to extract (in order of preference), or `all` (default: `None` = no hardsubs will be extracted), e.g. `crunchyrollbeta:hardsub=en-US,de-DE`
#### vikichannel
* `video_types`: Types of videos to download - one or more of `episodes`, `movies`, `clips`, `trailers`
@ -1792,9 +1809,12 @@ The following extractors use this feature:
* `max_comments`: Maximum number of comments to extract - default is `120`
#### tiktok
* `api_hostname`: Hostname to use for mobile API requests, e.g. `api-h2.tiktokv.com`
* `app_version`: App version to call mobile APIs with - should be set along with `manifest_app_version`, e.g. `20.2.1`
* `manifest_app_version`: Numeric app version to call mobile APIs with, e.g. `221`
* `api_hostname`: Hostname to use for mobile API calls, e.g. `api22-normal-c-alisg.tiktokv.com`
* `app_name`: Default app name to use with mobile API calls, e.g. `trill`
* `app_version`: Default app version to use with mobile API calls - should be set along with `manifest_app_version`, e.g. `34.1.2`
* `manifest_app_version`: Default numeric app version to use with mobile API calls, e.g. `2023401020`
* `aid`: Default app ID to use with API calls, e.g. `1180`
* `app_info`: One or more app info strings in the format of `<iid>/[app_name]/[app_version]/[manifest_app_version]/[aid]`, where `iid` is the unique app install ID. `iid` is the only required value; all other values and their `/` separators can be omitted, e.g. `tiktok:app_info=1234567890123456789` or `tiktok:app_info=123,456/trill///1180,789//34.0.1/340001`
#### rokfinchannel
* `tab`: Which tab to download - one of `new`, `top`, `videos`, `podcasts`, `streams`, `stacks`
@ -1817,6 +1837,9 @@ The following extractors use this feature:
#### jiosaavn
* `bitrate`: Audio bitrates to request. One or more of `16`, `32`, `64`, `128`, `320`. Default is `128,320`
#### afreecatvlive
* `cdn`: One or more CDN IDs to use with the API call for stream URLs, e.g. `gcp_cdn`, `gs_cdn_pc_app`, `gs_cdn_mobile_web`, `gs_cdn_pc_web`
**Note**: These options may be changed/removed in the future without concern for backward compatibility
<!-- MANPAGE: MOVE "INSTALLATION" SECTION HERE -->
@ -1874,6 +1897,7 @@ Plugins can be installed using various methods and locations.
`.zip`, `.egg` and `.whl` archives containing a `yt_dlp_plugins` namespace folder in their root are also supported as plugin packages.
* e.g. `${XDG_CONFIG_HOME}/yt-dlp/plugins/mypluginpkg.zip` where `mypluginpkg.zip` contains `yt_dlp_plugins/<type>/myplugin.py`
Run yt-dlp with `--verbose` to check if the plugin has been loaded.

Wyświetl plik

@ -0,0 +1,10 @@
services:
static:
build: static
environment:
channel: ${channel}
origin: ${origin}
version: ${version}
volumes:
- ~/build:/build
- ../..:/yt-dlp

Wyświetl plik

@ -0,0 +1,21 @@
FROM alpine:3.19 as base
RUN apk --update add --no-cache \
build-base \
python3 \
pipx \
;
RUN pipx install pyinstaller
# Requires above step to prepare the shared venv
RUN ~/.local/share/pipx/shared/bin/python -m pip install -U wheel
RUN apk --update add --no-cache \
scons \
patchelf \
binutils \
;
RUN pipx install staticx
WORKDIR /yt-dlp
COPY entrypoint.sh /entrypoint.sh
ENTRYPOINT /entrypoint.sh

Wyświetl plik

@ -0,0 +1,13 @@
#!/bin/ash
set -e
source ~/.local/share/pipx/venvs/pyinstaller/bin/activate
python -m devscripts.install_deps --include secretstorage
python -m devscripts.make_lazy_extractors
python devscripts/update-version.py -c "${channel}" -r "${origin}" "${version}"
python -m bundle.pyinstaller
deactivate
source ~/.local/share/pipx/venvs/staticx/bin/activate
staticx /yt-dlp/dist/yt-dlp_linux /build/yt-dlp_linux
deactivate

Wyświetl plik

@ -28,7 +28,7 @@ def main():
}],
version_info={
'version': VERSION,
'description': 'A youtube-dl fork with additional features and patches',
'description': 'A feature-rich command-line audio/video downloader',
'comments': 'Official repository: <https://github.com/yt-dlp/yt-dlp>',
'product_name': 'yt-dlp',
'product_version': VERSION,

Wyświetl plik

@ -126,5 +126,26 @@
"when": "4ce57d3b873c2887814cbec03d029533e82f7db5",
"short": "[ie] Support multi-period MPD streams (#6654)",
"authors": ["alard", "pukkandan"]
},
{
"action": "change",
"when": "aa7e9ae4f48276bd5d0173966c77db9484f65a0a",
"short": "[ie/xvideos] Support new URL format (#9502)",
"authors": ["sta1us"]
},
{
"action": "remove",
"when": "22e4dfacb61f62dfbb3eb41b31c7b69ba1059b80"
},
{
"action": "change",
"when": "e3a3ed8a981d9395c4859b6ef56cd02bc3148db2",
"short": "[cleanup:ie] No `from` stdlib imports in extractors",
"authors": ["pukkandan"]
},
{
"action": "add",
"when": "9590cc6b4768e190183d7d071a6c78170889116a",
"short": "[priority] Security: [[CVE-2024-22423](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-22423)] [Prevent RCE when using `--exec` with `%q` on Windows](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-hjq6-52gw-2g7p)\n - The shell escape function now properly escapes `%`, `\\` and `\\n`.\n - `utils.Popen` has been patched accordingly."
}
]

Wyświetl plik

@ -10,6 +10,8 @@ import argparse
import re
import subprocess
from pathlib import Path
from devscripts.tomlparse import parse_toml
from devscripts.utils import read_file
@ -17,17 +19,23 @@ from devscripts.utils import read_file
def parse_args():
parser = argparse.ArgumentParser(description='Install dependencies for yt-dlp')
parser.add_argument(
'input', nargs='?', metavar='TOMLFILE', default='pyproject.toml', help='Input file (default: %(default)s)')
'input', nargs='?', metavar='TOMLFILE', default=Path(__file__).parent.parent / 'pyproject.toml',
help='input file (default: %(default)s)')
parser.add_argument(
'-e', '--exclude', metavar='DEPENDENCY', action='append', help='Exclude a dependency')
'-e', '--exclude', metavar='DEPENDENCY', action='append',
help='exclude a dependency')
parser.add_argument(
'-i', '--include', metavar='GROUP', action='append', help='Include an optional dependency group')
'-i', '--include', metavar='GROUP', action='append',
help='include an optional dependency group')
parser.add_argument(
'-o', '--only-optional', action='store_true', help='Only install optional dependencies')
'-o', '--only-optional', action='store_true',
help='only install optional dependencies')
parser.add_argument(
'-p', '--print', action='store_true', help='Only print a requirements.txt to stdout')
'-p', '--print', action='store_true',
help='only print requirements to stdout')
parser.add_argument(
'-u', '--user', action='store_true', help='Install with pip as --user')
'-u', '--user', action='store_true',
help='install with pip as --user')
return parser.parse_args()
@ -37,24 +45,16 @@ def main():
optional_groups = project_table['optional-dependencies']
excludes = args.exclude or []
deps = []
targets = []
if not args.only_optional: # `-o` should exclude 'dependencies' and the 'default' group
deps.extend(project_table['dependencies'])
targets.extend(project_table['dependencies'])
if 'default' not in excludes: # `--exclude default` should exclude entire 'default' group
deps.extend(optional_groups['default'])
def name(dependency):
return re.match(r'[\w-]+', dependency)[0].lower()
target_map = {name(dep): dep for dep in deps}
targets.extend(optional_groups['default'])
for include in filter(None, map(optional_groups.get, args.include or [])):
target_map.update(zip(map(name, include), include))
targets.extend(include)
for exclude in map(name, excludes):
target_map.pop(exclude, None)
targets = list(target_map.values())
targets = [t for t in targets if re.match(r'[\w-]+', t).group(0).lower() not in excludes]
if args.print:
for target in targets:

Wyświetl plik

@ -445,7 +445,32 @@ def get_new_contributors(contributors_path, commits):
return sorted(new_contributors, key=str.casefold)
if __name__ == '__main__':
def create_changelog(args):
logging.basicConfig(
datefmt='%Y-%m-%d %H-%M-%S', format='{asctime} | {levelname:<8} | {message}',
level=logging.WARNING - 10 * args.verbosity, style='{', stream=sys.stderr)
commits = CommitRange(None, args.commitish, args.default_author)
if not args.no_override:
if args.override_path.exists():
overrides = json.loads(read_file(args.override_path))
commits.apply_overrides(overrides)
else:
logger.warning(f'File {args.override_path.as_posix()} does not exist')
logger.info(f'Loaded {len(commits)} commits')
new_contributors = get_new_contributors(args.contributors_path, commits)
if new_contributors:
if args.contributors:
write_file(args.contributors_path, '\n'.join(new_contributors) + '\n', mode='a')
logger.info(f'New contributors: {", ".join(new_contributors)}')
return Changelog(commits.groups(), args.repo, args.collapsible)
def create_parser():
import argparse
parser = argparse.ArgumentParser(
@ -477,27 +502,9 @@ if __name__ == '__main__':
parser.add_argument(
'--collapsible', action='store_true',
help='make changelog collapsible (default: %(default)s)')
args = parser.parse_args()
logging.basicConfig(
datefmt='%Y-%m-%d %H-%M-%S', format='{asctime} | {levelname:<8} | {message}',
level=logging.WARNING - 10 * args.verbosity, style='{', stream=sys.stderr)
return parser
commits = CommitRange(None, args.commitish, args.default_author)
if not args.no_override:
if args.override_path.exists():
overrides = json.loads(read_file(args.override_path))
commits.apply_overrides(overrides)
else:
logger.warning(f'File {args.override_path.as_posix()} does not exist')
logger.info(f'Loaded {len(commits)} commits')
new_contributors = get_new_contributors(args.contributors_path, commits)
if new_contributors:
if args.contributors:
write_file(args.contributors_path, '\n'.join(new_contributors) + '\n', mode='a')
logger.info(f'New contributors: {", ".join(new_contributors)}')
print(Changelog(commits.groups(), args.repo, args.collapsible))
if __name__ == '__main__':
print(create_changelog(create_parser().parse_args()))

Wyświetl plik

@ -24,7 +24,7 @@ PREFIX = r'''%yt-dlp(1)
# NAME
yt\-dlp \- A youtube-dl fork with additional features and patches
yt\-dlp \- A feature\-rich command\-line audio/video downloader
# SYNOPSIS
@ -43,6 +43,27 @@ def filter_excluded_sections(readme):
'', readme)
def _convert_code_blocks(readme):
current_code_block = None
for line in readme.splitlines(True):
if current_code_block:
if line == current_code_block:
current_code_block = None
yield '\n'
else:
yield f' {line}'
elif line.startswith('```'):
current_code_block = line.count('`') * '`' + '\n'
yield '\n'
else:
yield line
def convert_code_blocks(readme):
return ''.join(_convert_code_blocks(readme))
def move_sections(readme):
MOVE_TAG_TEMPLATE = '<!-- MANPAGE: MOVE "%s" SECTION HERE -->'
sections = re.findall(r'(?m)^%s$' % (
@ -65,8 +86,10 @@ def move_sections(readme):
def filter_options(readme):
section = re.search(r'(?sm)^# USAGE AND OPTIONS\n.+?(?=^# )', readme).group(0)
section_new = section.replace('*', R'\*')
options = '# OPTIONS\n'
for line in section.split('\n')[1:]:
for line in section_new.split('\n')[1:]:
mobj = re.fullmatch(r'''(?x)
\s{4}(?P<opt>-(?:,\s|[^\s])+)
(?:\s(?P<meta>(?:[^\s]|\s(?!\s))+))?
@ -86,7 +109,7 @@ def filter_options(readme):
return readme.replace(section, options, 1)
TRANSFORM = compose_functions(filter_excluded_sections, move_sections, filter_options)
TRANSFORM = compose_functions(filter_excluded_sections, convert_code_blocks, move_sections, filter_options)
def main():

Wyświetl plik

@ -11,7 +11,7 @@ IMPORTANT: INVALID FILES OR MULTILINE STRINGS ARE NOT SUPPORTED!
from __future__ import annotations
import datetime
import datetime as dt
import json
import re
@ -115,9 +115,9 @@ def parse_value(data: str, index: int):
for func in [
int,
float,
datetime.time.fromisoformat,
datetime.date.fromisoformat,
datetime.datetime.fromisoformat,
dt.time.fromisoformat,
dt.date.fromisoformat,
dt.datetime.fromisoformat,
{'true': True, 'false': False}.get,
]:
try:
@ -179,7 +179,7 @@ def main():
data = file.read()
def default(obj):
if isinstance(obj, (datetime.date, datetime.time, datetime.datetime)):
if isinstance(obj, (dt.date, dt.time, dt.datetime)):
return obj.isoformat()
print(json.dumps(parse_toml(data), default=default))

Wyświetl plik

@ -9,15 +9,15 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import argparse
import contextlib
import datetime as dt
import sys
from datetime import datetime, timezone
from devscripts.utils import read_version, run_process, write_file
def get_new_version(version, revision):
if not version:
version = datetime.now(timezone.utc).strftime('%Y.%m.%d')
version = dt.datetime.now(dt.timezone.utc).strftime('%Y.%m.%d')
if revision:
assert revision.isdecimal(), 'Revision must be a number'

Wyświetl plik

@ -0,0 +1,26 @@
#!/usr/bin/env python3
# Allow direct execution
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from pathlib import Path
from devscripts.make_changelog import create_changelog, create_parser
from devscripts.utils import read_file, read_version, write_file
# Always run after devscripts/update-version.py, and run before `make doc|pypi-files|tar|all`
if __name__ == '__main__':
parser = create_parser()
parser.description = 'Update an existing changelog file with an entry for a new release'
parser.add_argument(
'--changelog-path', type=Path, default=Path(__file__).parent.parent / 'Changelog.md',
help='path to the Changelog file')
args = parser.parse_args()
new_entry = create_changelog(args)
header, sep, changelog = read_file(args.changelog_path).partition('\n### ')
write_file(args.changelog_path, f'{header}{sep}{read_version()}\n{new_entry}\n{sep}{changelog}')

Wyświetl plik

@ -10,7 +10,7 @@ maintainers = [
{name = "bashonly", email = "bashonly@protonmail.com"},
{name = "coletdjnz", email = "coletdjnz@protonmail.com"},
]
description = "A youtube-dl fork with additional features and patches"
description = "A feature-rich command-line audio/video downloader"
readme = "README.md"
requires-python = ">=3.8"
keywords = [
@ -53,6 +53,7 @@ dependencies = [
[project.optional-dependencies]
default = []
curl-cffi = ["curl-cffi==0.5.10; implementation_name=='cpython'"]
secretstorage = [
"cffi",
"secretstorage",
@ -68,7 +69,10 @@ dev = [
"isort",
"pytest",
]
pyinstaller = ["pyinstaller>=6.3"]
pyinstaller = [
"pyinstaller>=6.3; sys_platform!='darwin'",
"pyinstaller==5.13.2; sys_platform=='darwin'", # needed for curl_cffi
]
py2exe = ["py2exe>=0.12"]
[project.urls]

Wyświetl plik

@ -47,7 +47,7 @@
- **aenetworks:show**
- **AeonCo**
- **afreecatv**: [*afreecatv*](## "netrc machine") afreecatv.com
- **afreecatv:live**: [*afreecatv*](## "netrc machine") afreecatv.com
- **afreecatv:live**: [*afreecatv*](## "netrc machine") afreecatv.com livestreams
- **afreecatv:user**
- **AirTV**
- **AitubeKZVideo**
@ -105,6 +105,7 @@
- **ArteTVPlaylist**
- **asobichannel**: ASOBI CHANNEL
- **asobichannel:tag**: ASOBI CHANNEL
- **AsobiStage**: ASOBISTAGE (アソビステージ)
- **AtresPlayer**: [*atresplayer*](## "netrc machine")
- **AtScaleConfEvent**
- **ATVAt**
@ -436,6 +437,7 @@
- **FacebookPluginsVideo**
- **fancode:live**: [*fancode*](## "netrc machine") (**Currently broken**)
- **fancode:vod**: [*fancode*](## "netrc machine") (**Currently broken**)
- **Fathom**
- **faz.net**
- **fc2**: [*fc2*](## "netrc machine")
- **fc2:embed**
@ -633,8 +635,9 @@
- **Jamendo**
- **JamendoAlbum**
- **JeuxVideo**: (**Currently broken**)
- **JioSaavnAlbum**
- **JioSaavnSong**
- **jiosaavn:album**
- **jiosaavn:playlist**
- **jiosaavn:song**
- **Joj**
- **JoqrAg**: 超!A&G+ 文化放送 (f.k.a. AGQR) Nippon Cultural Broadcasting, Inc. (JOQR)
- **Jove**
@ -716,6 +719,8 @@
- **Lnk**
- **LnkGo**
- **loc**: Library of Congress
- **loom**
- **loom:folder**
- **LoveHomePorn**
- **LRTStream**
- **LRTVOD**
@ -1136,6 +1141,7 @@
- **Radiko**
- **RadikoRadio**
- **radio.de**: (**Currently broken**)
- **Radio1Be**
- **radiocanada**
- **radiocanada:audiovideo**
- **RadioComercial**
@ -1288,6 +1294,7 @@
- **SeznamZpravyArticle**
- **Shahid**: [*shahid*](## "netrc machine")
- **ShahidShow**
- **SharePoint**
- **ShareVideosEmbed**
- **ShemarooMe**
- **ShowRoomLive**

Wyświetl plik

@ -1,4 +1,3 @@
import functools
import inspect
import pytest
@ -10,7 +9,9 @@ from yt_dlp.utils._utils import _YDLLogger as FakeLogger
@pytest.fixture
def handler(request):
RH_KEY = request.param
RH_KEY = getattr(request, 'param', None)
if not RH_KEY:
return
if inspect.isclass(RH_KEY) and issubclass(RH_KEY, RequestHandler):
handler = RH_KEY
elif RH_KEY in _REQUEST_HANDLERS:
@ -18,9 +19,46 @@ def handler(request):
else:
pytest.skip(f'{RH_KEY} request handler is not available')
return functools.partial(handler, logger=FakeLogger)
class HandlerWrapper(handler):
RH_KEY = handler.RH_KEY
def __init__(self, *args, **kwargs):
super().__init__(logger=FakeLogger, *args, **kwargs)
return HandlerWrapper
def validate_and_send(rh, req):
rh.validate(req)
return rh.send(req)
@pytest.fixture(autouse=True)
def skip_handler(request, handler):
"""usage: pytest.mark.skip_handler('my_handler', 'reason')"""
for marker in request.node.iter_markers('skip_handler'):
if marker.args[0] == handler.RH_KEY:
pytest.skip(marker.args[1] if len(marker.args) > 1 else '')
@pytest.fixture(autouse=True)
def skip_handler_if(request, handler):
"""usage: pytest.mark.skip_handler_if('my_handler', lambda request: True, 'reason')"""
for marker in request.node.iter_markers('skip_handler_if'):
if marker.args[0] == handler.RH_KEY and marker.args[1](request):
pytest.skip(marker.args[2] if len(marker.args) > 2 else '')
@pytest.fixture(autouse=True)
def skip_handlers_if(request, handler):
"""usage: pytest.mark.skip_handlers_if(lambda request, handler: True, 'reason')"""
for marker in request.node.iter_markers('skip_handlers_if'):
if handler and marker.args[0](request, handler):
pytest.skip(marker.args[1] if len(marker.args) > 1 else '')
def pytest_configure(config):
config.addinivalue_line(
"markers", "skip_handler(handler): skip test for the given handler",
)
config.addinivalue_line(
"markers", "skip_handler_if(handler): skip test for the given handler if condition is true"
)
config.addinivalue_line(
"markers", "skip_handlers_if(handler): skip test for handlers when the condition is true"
)

Wyświetl plik

@ -338,3 +338,8 @@ def http_server_port(httpd):
def verify_address_availability(address):
if find_available_port(address) is None:
pytest.skip(f'Unable to bind to source address {address} (address may not exist)')
def validate_and_send(rh, req):
rh.validate(req)
return rh.send(req)

Wyświetl plik

@ -1906,6 +1906,15 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
expected_status=TEAPOT_RESPONSE_STATUS)
self.assertEqual(content, TEAPOT_RESPONSE_BODY)
def test_search_nextjs_data(self):
data = '<script id="__NEXT_DATA__" type="application/json">{"props":{}}</script>'
self.assertEqual(self.ie._search_nextjs_data(data, None), {'props': {}})
self.assertEqual(self.ie._search_nextjs_data('', None, fatal=False), {})
self.assertEqual(self.ie._search_nextjs_data('', None, default=None), None)
self.assertEqual(self.ie._search_nextjs_data('', None, default={}), {})
with self.assertRaises(DeprecationWarning):
self.assertEqual(self.ie._search_nextjs_data('', None, default='{}'), {})
if __name__ == '__main__':
unittest.main()

Wyświetl plik

@ -183,7 +183,7 @@ class TestFormatSelection(unittest.TestCase):
]
info_dict = _make_result(formats)
ydl = YDL({'format': 'best'})
ydl = YDL({'format': 'best', 'format_sort': ['abr', 'ext']})
ydl.sort_formats(info_dict)
ydl.process_ie_result(copy.deepcopy(info_dict))
downloaded = ydl.downloaded_info_dicts[0]
@ -195,7 +195,7 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'mp3-64')
ydl = YDL({'prefer_free_formats': True})
ydl = YDL({'prefer_free_formats': True, 'format_sort': ['abr', 'ext']})
ydl.sort_formats(info_dict)
ydl.process_ie_result(copy.deepcopy(info_dict))
downloaded = ydl.downloaded_info_dicts[0]

Wyświetl plik

@ -1,5 +1,5 @@
import datetime as dt
import unittest
from datetime import datetime, timezone
from yt_dlp import cookies
from yt_dlp.cookies import (
@ -138,7 +138,7 @@ class TestCookies(unittest.TestCase):
self.assertEqual(cookie.name, 'foo')
self.assertEqual(cookie.value, 'test%20%3Bcookie')
self.assertFalse(cookie.secure)
expected_expiration = datetime(2021, 6, 18, 21, 39, 19, tzinfo=timezone.utc)
expected_expiration = dt.datetime(2021, 6, 18, 21, 39, 19, tzinfo=dt.timezone.utc)
self.assertEqual(cookie.expires, int(expected_expiration.timestamp()))
def test_pbkdf2_sha1(self):

Wyświetl plik

@ -0,0 +1,379 @@
import abc
import base64
import contextlib
import functools
import json
import os
import random
import ssl
import threading
from http.server import BaseHTTPRequestHandler
from socketserver import ThreadingTCPServer
import pytest
from test.helper import http_server_port, verify_address_availability
from test.test_networking import TEST_DIR
from test.test_socks import IPv6ThreadingTCPServer
from yt_dlp.dependencies import urllib3
from yt_dlp.networking import Request
from yt_dlp.networking.exceptions import HTTPError, ProxyError, SSLError
class HTTPProxyAuthMixin:
def proxy_auth_error(self):
self.send_response(407)
self.send_header('Proxy-Authenticate', 'Basic realm="test http proxy"')
self.end_headers()
return False
def do_proxy_auth(self, username, password):
if username is None and password is None:
return True
proxy_auth_header = self.headers.get('Proxy-Authorization', None)
if proxy_auth_header is None:
return self.proxy_auth_error()
if not proxy_auth_header.startswith('Basic '):
return self.proxy_auth_error()
auth = proxy_auth_header[6:]
try:
auth_username, auth_password = base64.b64decode(auth).decode().split(':', 1)
except Exception:
return self.proxy_auth_error()
if auth_username != (username or '') or auth_password != (password or ''):
return self.proxy_auth_error()
return True
class HTTPProxyHandler(BaseHTTPRequestHandler, HTTPProxyAuthMixin):
def __init__(self, *args, proxy_info=None, username=None, password=None, request_handler=None, **kwargs):
self.username = username
self.password = password
self.proxy_info = proxy_info
super().__init__(*args, **kwargs)
def do_GET(self):
if not self.do_proxy_auth(self.username, self.password):
self.server.close_request(self.request)
return
if self.path.endswith('/proxy_info'):
payload = json.dumps(self.proxy_info or {
'client_address': self.client_address,
'connect': False,
'connect_host': None,
'connect_port': None,
'headers': dict(self.headers),
'path': self.path,
'proxy': ':'.join(str(y) for y in self.connection.getsockname()),
})
self.send_response(200)
self.send_header('Content-Type', 'application/json; charset=utf-8')
self.send_header('Content-Length', str(len(payload)))
self.end_headers()
self.wfile.write(payload.encode())
else:
self.send_response(404)
self.end_headers()
self.server.close_request(self.request)
if urllib3:
import urllib3.util.ssltransport
class SSLTransport(urllib3.util.ssltransport.SSLTransport):
"""
Modified version of urllib3 SSLTransport to support server side SSL
This allows us to chain multiple TLS connections.
"""
def __init__(self, socket, ssl_context, server_hostname=None, suppress_ragged_eofs=True, server_side=False):
self.incoming = ssl.MemoryBIO()
self.outgoing = ssl.MemoryBIO()
self.suppress_ragged_eofs = suppress_ragged_eofs
self.socket = socket
self.sslobj = ssl_context.wrap_bio(
self.incoming,
self.outgoing,
server_hostname=server_hostname,
server_side=server_side
)
self._ssl_io_loop(self.sslobj.do_handshake)
@property
def _io_refs(self):
return self.socket._io_refs
@_io_refs.setter
def _io_refs(self, value):
self.socket._io_refs = value
def shutdown(self, *args, **kwargs):
self.socket.shutdown(*args, **kwargs)
else:
SSLTransport = None
class HTTPSProxyHandler(HTTPProxyHandler):
def __init__(self, request, *args, **kwargs):
certfn = os.path.join(TEST_DIR, 'testcert.pem')
sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
sslctx.load_cert_chain(certfn, None)
if isinstance(request, ssl.SSLSocket):
request = SSLTransport(request, ssl_context=sslctx, server_side=True)
else:
request = sslctx.wrap_socket(request, server_side=True)
super().__init__(request, *args, **kwargs)
class HTTPConnectProxyHandler(BaseHTTPRequestHandler, HTTPProxyAuthMixin):
protocol_version = 'HTTP/1.1'
default_request_version = 'HTTP/1.1'
def __init__(self, *args, username=None, password=None, request_handler=None, **kwargs):
self.username = username
self.password = password
self.request_handler = request_handler
super().__init__(*args, **kwargs)
def do_CONNECT(self):
if not self.do_proxy_auth(self.username, self.password):
self.server.close_request(self.request)
return
self.send_response(200)
self.end_headers()
proxy_info = {
'client_address': self.client_address,
'connect': True,
'connect_host': self.path.split(':')[0],
'connect_port': int(self.path.split(':')[1]),
'headers': dict(self.headers),
'path': self.path,
'proxy': ':'.join(str(y) for y in self.connection.getsockname()),
}
self.request_handler(self.request, self.client_address, self.server, proxy_info=proxy_info)
self.server.close_request(self.request)
class HTTPSConnectProxyHandler(HTTPConnectProxyHandler):
def __init__(self, request, *args, **kwargs):
certfn = os.path.join(TEST_DIR, 'testcert.pem')
sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
sslctx.load_cert_chain(certfn, None)
request = sslctx.wrap_socket(request, server_side=True)
self._original_request = request
super().__init__(request, *args, **kwargs)
def do_CONNECT(self):
super().do_CONNECT()
self.server.close_request(self._original_request)
@contextlib.contextmanager
def proxy_server(proxy_server_class, request_handler, bind_ip=None, **proxy_server_kwargs):
server = server_thread = None
try:
bind_address = bind_ip or '127.0.0.1'
server_type = ThreadingTCPServer if '.' in bind_address else IPv6ThreadingTCPServer
server = server_type(
(bind_address, 0), functools.partial(proxy_server_class, request_handler=request_handler, **proxy_server_kwargs))
server_port = http_server_port(server)
server_thread = threading.Thread(target=server.serve_forever)
server_thread.daemon = True
server_thread.start()
if '.' not in bind_address:
yield f'[{bind_address}]:{server_port}'
else:
yield f'{bind_address}:{server_port}'
finally:
server.shutdown()
server.server_close()
server_thread.join(2.0)
class HTTPProxyTestContext(abc.ABC):
REQUEST_HANDLER_CLASS = None
REQUEST_PROTO = None
def http_server(self, server_class, *args, **kwargs):
return proxy_server(server_class, self.REQUEST_HANDLER_CLASS, *args, **kwargs)
@abc.abstractmethod
def proxy_info_request(self, handler, target_domain=None, target_port=None, **req_kwargs) -> dict:
"""return a dict of proxy_info"""
class HTTPProxyHTTPTestContext(HTTPProxyTestContext):
# Standard HTTP Proxy for http requests
REQUEST_HANDLER_CLASS = HTTPProxyHandler
REQUEST_PROTO = 'http'
def proxy_info_request(self, handler, target_domain=None, target_port=None, **req_kwargs):
request = Request(f'http://{target_domain or "127.0.0.1"}:{target_port or "40000"}/proxy_info', **req_kwargs)
handler.validate(request)
return json.loads(handler.send(request).read().decode())
class HTTPProxyHTTPSTestContext(HTTPProxyTestContext):
# HTTP Connect proxy, for https requests
REQUEST_HANDLER_CLASS = HTTPSProxyHandler
REQUEST_PROTO = 'https'
def proxy_info_request(self, handler, target_domain=None, target_port=None, **req_kwargs):
request = Request(f'https://{target_domain or "127.0.0.1"}:{target_port or "40000"}/proxy_info', **req_kwargs)
handler.validate(request)
return json.loads(handler.send(request).read().decode())
CTX_MAP = {
'http': HTTPProxyHTTPTestContext,
'https': HTTPProxyHTTPSTestContext,
}
@pytest.fixture(scope='module')
def ctx(request):
return CTX_MAP[request.param]()
@pytest.mark.parametrize(
'handler', ['Urllib', 'Requests', 'CurlCFFI'], indirect=True)
@pytest.mark.parametrize('ctx', ['http'], indirect=True) # pure http proxy can only support http
class TestHTTPProxy:
def test_http_no_auth(self, handler, ctx):
with ctx.http_server(HTTPProxyHandler) as server_address:
with handler(proxies={ctx.REQUEST_PROTO: f'http://{server_address}'}) as rh:
proxy_info = ctx.proxy_info_request(rh)
assert proxy_info['proxy'] == server_address
assert proxy_info['connect'] is False
assert 'Proxy-Authorization' not in proxy_info['headers']
def test_http_auth(self, handler, ctx):
with ctx.http_server(HTTPProxyHandler, username='test', password='test') as server_address:
with handler(proxies={ctx.REQUEST_PROTO: f'http://test:test@{server_address}'}) as rh:
proxy_info = ctx.proxy_info_request(rh)
assert proxy_info['proxy'] == server_address
assert 'Proxy-Authorization' in proxy_info['headers']
def test_http_bad_auth(self, handler, ctx):
with ctx.http_server(HTTPProxyHandler, username='test', password='test') as server_address:
with handler(proxies={ctx.REQUEST_PROTO: f'http://test:bad@{server_address}'}) as rh:
with pytest.raises(HTTPError) as exc_info:
ctx.proxy_info_request(rh)
assert exc_info.value.response.status == 407
exc_info.value.response.close()
def test_http_source_address(self, handler, ctx):
with ctx.http_server(HTTPProxyHandler) as server_address:
source_address = f'127.0.0.{random.randint(5, 255)}'
verify_address_availability(source_address)
with handler(proxies={ctx.REQUEST_PROTO: f'http://{server_address}'},
source_address=source_address) as rh:
proxy_info = ctx.proxy_info_request(rh)
assert proxy_info['proxy'] == server_address
assert proxy_info['client_address'][0] == source_address
@pytest.mark.skip_handler('Urllib', 'urllib does not support https proxies')
def test_https(self, handler, ctx):
with ctx.http_server(HTTPSProxyHandler) as server_address:
with handler(verify=False, proxies={ctx.REQUEST_PROTO: f'https://{server_address}'}) as rh:
proxy_info = ctx.proxy_info_request(rh)
assert proxy_info['proxy'] == server_address
assert proxy_info['connect'] is False
assert 'Proxy-Authorization' not in proxy_info['headers']
@pytest.mark.skip_handler('Urllib', 'urllib does not support https proxies')
def test_https_verify_failed(self, handler, ctx):
with ctx.http_server(HTTPSProxyHandler) as server_address:
with handler(verify=True, proxies={ctx.REQUEST_PROTO: f'https://{server_address}'}) as rh:
# Accept SSLError as may not be feasible to tell if it is proxy or request error.
# note: if request proto also does ssl verification, this may also be the error of the request.
# Until we can support passing custom cacerts to handlers, we cannot properly test this for all cases.
with pytest.raises((ProxyError, SSLError)):
ctx.proxy_info_request(rh)
def test_http_with_idn(self, handler, ctx):
with ctx.http_server(HTTPProxyHandler) as server_address:
with handler(proxies={ctx.REQUEST_PROTO: f'http://{server_address}'}) as rh:
proxy_info = ctx.proxy_info_request(rh, target_domain='中文.tw')
assert proxy_info['proxy'] == server_address
assert proxy_info['path'].startswith('http://xn--fiq228c.tw')
assert proxy_info['headers']['Host'].split(':', 1)[0] == 'xn--fiq228c.tw'
@pytest.mark.parametrize(
'handler,ctx', [
('Requests', 'https'),
('CurlCFFI', 'https'),
], indirect=True)
class TestHTTPConnectProxy:
def test_http_connect_no_auth(self, handler, ctx):
with ctx.http_server(HTTPConnectProxyHandler) as server_address:
with handler(verify=False, proxies={ctx.REQUEST_PROTO: f'http://{server_address}'}) as rh:
proxy_info = ctx.proxy_info_request(rh)
assert proxy_info['proxy'] == server_address
assert proxy_info['connect'] is True
assert 'Proxy-Authorization' not in proxy_info['headers']
def test_http_connect_auth(self, handler, ctx):
with ctx.http_server(HTTPConnectProxyHandler, username='test', password='test') as server_address:
with handler(verify=False, proxies={ctx.REQUEST_PROTO: f'http://test:test@{server_address}'}) as rh:
proxy_info = ctx.proxy_info_request(rh)
assert proxy_info['proxy'] == server_address
assert 'Proxy-Authorization' in proxy_info['headers']
@pytest.mark.skip_handler(
'Requests',
'bug in urllib3 causes unclosed socket: https://github.com/urllib3/urllib3/issues/3374'
)
def test_http_connect_bad_auth(self, handler, ctx):
with ctx.http_server(HTTPConnectProxyHandler, username='test', password='test') as server_address:
with handler(verify=False, proxies={ctx.REQUEST_PROTO: f'http://test:bad@{server_address}'}) as rh:
with pytest.raises(ProxyError):
ctx.proxy_info_request(rh)
def test_http_connect_source_address(self, handler, ctx):
with ctx.http_server(HTTPConnectProxyHandler) as server_address:
source_address = f'127.0.0.{random.randint(5, 255)}'
verify_address_availability(source_address)
with handler(proxies={ctx.REQUEST_PROTO: f'http://{server_address}'},
source_address=source_address,
verify=False) as rh:
proxy_info = ctx.proxy_info_request(rh)
assert proxy_info['proxy'] == server_address
assert proxy_info['client_address'][0] == source_address
@pytest.mark.skipif(urllib3 is None, reason='requires urllib3 to test')
def test_https_connect_proxy(self, handler, ctx):
with ctx.http_server(HTTPSConnectProxyHandler) as server_address:
with handler(verify=False, proxies={ctx.REQUEST_PROTO: f'https://{server_address}'}) as rh:
proxy_info = ctx.proxy_info_request(rh)
assert proxy_info['proxy'] == server_address
assert proxy_info['connect'] is True
assert 'Proxy-Authorization' not in proxy_info['headers']
@pytest.mark.skipif(urllib3 is None, reason='requires urllib3 to test')
def test_https_connect_verify_failed(self, handler, ctx):
with ctx.http_server(HTTPSConnectProxyHandler) as server_address:
with handler(verify=True, proxies={ctx.REQUEST_PROTO: f'https://{server_address}'}) as rh:
# Accept SSLError as may not be feasible to tell if it is proxy or request error.
# note: if request proto also does ssl verification, this may also be the error of the request.
# Until we can support passing custom cacerts to handlers, we cannot properly test this for all cases.
with pytest.raises((ProxyError, SSLError)):
ctx.proxy_info_request(rh)
@pytest.mark.skipif(urllib3 is None, reason='requires urllib3 to test')
def test_https_connect_proxy_auth(self, handler, ctx):
with ctx.http_server(HTTPSConnectProxyHandler, username='test', password='test') as server_address:
with handler(verify=False, proxies={ctx.REQUEST_PROTO: f'https://test:test@{server_address}'}) as rh:
proxy_info = ctx.proxy_info_request(rh)
assert proxy_info['proxy'] == server_address
assert 'Proxy-Authorization' in proxy_info['headers']

Plik diff jest za duży Load Diff

Wyświetl plik

@ -286,8 +286,14 @@ def ctx(request):
return CTX_MAP[request.param]()
@pytest.mark.parametrize(
'handler,ctx', [
('Urllib', 'http'),
('Requests', 'http'),
('Websockets', 'ws'),
('CurlCFFI', 'http')
], indirect=True)
class TestSocks4Proxy:
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_socks4_no_auth(self, handler, ctx):
with handler() as rh:
with ctx.socks_server(Socks4ProxyHandler) as server_address:
@ -295,7 +301,6 @@ class TestSocks4Proxy:
rh, proxies={'all': f'socks4://{server_address}'})
assert response['version'] == 4
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_socks4_auth(self, handler, ctx):
with handler() as rh:
with ctx.socks_server(Socks4ProxyHandler, user_id='user') as server_address:
@ -305,7 +310,6 @@ class TestSocks4Proxy:
rh, proxies={'all': f'socks4://user:@{server_address}'})
assert response['version'] == 4
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_socks4a_ipv4_target(self, handler, ctx):
with ctx.socks_server(Socks4ProxyHandler) as server_address:
with handler(proxies={'all': f'socks4a://{server_address}'}) as rh:
@ -313,7 +317,6 @@ class TestSocks4Proxy:
assert response['version'] == 4
assert (response['ipv4_address'] == '127.0.0.1') != (response['domain_address'] == '127.0.0.1')
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_socks4a_domain_target(self, handler, ctx):
with ctx.socks_server(Socks4ProxyHandler) as server_address:
with handler(proxies={'all': f'socks4a://{server_address}'}) as rh:
@ -322,7 +325,6 @@ class TestSocks4Proxy:
assert response['ipv4_address'] is None
assert response['domain_address'] == 'localhost'
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_ipv4_client_source_address(self, handler, ctx):
with ctx.socks_server(Socks4ProxyHandler) as server_address:
source_address = f'127.0.0.{random.randint(5, 255)}'
@ -333,7 +335,6 @@ class TestSocks4Proxy:
assert response['client_address'][0] == source_address
assert response['version'] == 4
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
@pytest.mark.parametrize('reply_code', [
Socks4CD.REQUEST_REJECTED_OR_FAILED,
Socks4CD.REQUEST_REJECTED_CANNOT_CONNECT_TO_IDENTD,
@ -345,7 +346,6 @@ class TestSocks4Proxy:
with pytest.raises(ProxyError):
ctx.socks_info_request(rh)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_ipv6_socks4_proxy(self, handler, ctx):
with ctx.socks_server(Socks4ProxyHandler, bind_ip='::1') as server_address:
with handler(proxies={'all': f'socks4://{server_address}'}) as rh:
@ -354,7 +354,6 @@ class TestSocks4Proxy:
assert response['ipv4_address'] == '127.0.0.1'
assert response['version'] == 4
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_timeout(self, handler, ctx):
with ctx.socks_server(Socks4ProxyHandler, sleep=2) as server_address:
with handler(proxies={'all': f'socks4://{server_address}'}, timeout=0.5) as rh:
@ -362,9 +361,15 @@ class TestSocks4Proxy:
ctx.socks_info_request(rh)
@pytest.mark.parametrize(
'handler,ctx', [
('Urllib', 'http'),
('Requests', 'http'),
('Websockets', 'ws'),
('CurlCFFI', 'http')
], indirect=True)
class TestSocks5Proxy:
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_socks5_no_auth(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler) as server_address:
with handler(proxies={'all': f'socks5://{server_address}'}) as rh:
@ -372,7 +377,6 @@ class TestSocks5Proxy:
assert response['auth_methods'] == [0x0]
assert response['version'] == 5
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_socks5_user_pass(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler, auth=('test', 'testpass')) as server_address:
with handler() as rh:
@ -385,7 +389,6 @@ class TestSocks5Proxy:
assert response['auth_methods'] == [Socks5Auth.AUTH_NONE, Socks5Auth.AUTH_USER_PASS]
assert response['version'] == 5
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_socks5_ipv4_target(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler) as server_address:
with handler(proxies={'all': f'socks5://{server_address}'}) as rh:
@ -393,7 +396,6 @@ class TestSocks5Proxy:
assert response['ipv4_address'] == '127.0.0.1'
assert response['version'] == 5
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_socks5_domain_target(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler) as server_address:
with handler(proxies={'all': f'socks5://{server_address}'}) as rh:
@ -401,7 +403,6 @@ class TestSocks5Proxy:
assert (response['ipv4_address'] == '127.0.0.1') != (response['ipv6_address'] == '::1')
assert response['version'] == 5
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_socks5h_domain_target(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler) as server_address:
with handler(proxies={'all': f'socks5h://{server_address}'}) as rh:
@ -410,7 +411,6 @@ class TestSocks5Proxy:
assert response['domain_address'] == 'localhost'
assert response['version'] == 5
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_socks5h_ip_target(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler) as server_address:
with handler(proxies={'all': f'socks5h://{server_address}'}) as rh:
@ -419,7 +419,6 @@ class TestSocks5Proxy:
assert response['domain_address'] is None
assert response['version'] == 5
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_socks5_ipv6_destination(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler) as server_address:
with handler(proxies={'all': f'socks5://{server_address}'}) as rh:
@ -427,7 +426,6 @@ class TestSocks5Proxy:
assert response['ipv6_address'] == '::1'
assert response['version'] == 5
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_ipv6_socks5_proxy(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler, bind_ip='::1') as server_address:
with handler(proxies={'all': f'socks5://{server_address}'}) as rh:
@ -438,7 +436,6 @@ class TestSocks5Proxy:
# XXX: is there any feasible way of testing IPv6 source addresses?
# Same would go for non-proxy source_address test...
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
def test_ipv4_client_source_address(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler) as server_address:
source_address = f'127.0.0.{random.randint(5, 255)}'
@ -448,7 +445,6 @@ class TestSocks5Proxy:
assert response['client_address'][0] == source_address
assert response['version'] == 5
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Requests', 'http'), ('Websockets', 'ws')], indirect=True)
@pytest.mark.parametrize('reply_code', [
Socks5Reply.GENERAL_FAILURE,
Socks5Reply.CONNECTION_NOT_ALLOWED,
@ -465,7 +461,6 @@ class TestSocks5Proxy:
with pytest.raises(ProxyError):
ctx.socks_info_request(rh)
@pytest.mark.parametrize('handler,ctx', [('Urllib', 'http'), ('Websockets', 'ws')], indirect=True)
def test_timeout(self, handler, ctx):
with ctx.socks_server(Socks5ProxyHandler, sleep=2) as server_address:
with handler(proxies={'all': f'socks5://{server_address}'}, timeout=1) as rh:

Wyświetl plik

@ -0,0 +1,444 @@
import http.cookies
import re
import xml.etree.ElementTree
import pytest
from yt_dlp.utils import dict_get, int_or_none, str_or_none
from yt_dlp.utils.traversal import traverse_obj
_TEST_DATA = {
100: 100,
1.2: 1.2,
'str': 'str',
'None': None,
'...': ...,
'urls': [
{'index': 0, 'url': 'https://www.example.com/0'},
{'index': 1, 'url': 'https://www.example.com/1'},
],
'data': (
{'index': 2},
{'index': 3},
),
'dict': {},
}
class TestTraversal:
def test_traversal_base(self):
assert traverse_obj(_TEST_DATA, ('str',)) == 'str', \
'allow tuple path'
assert traverse_obj(_TEST_DATA, ['str']) == 'str', \
'allow list path'
assert traverse_obj(_TEST_DATA, (value for value in ("str",))) == 'str', \
'allow iterable path'
assert traverse_obj(_TEST_DATA, 'str') == 'str', \
'single items should be treated as a path'
assert traverse_obj(_TEST_DATA, 100) == 100, \
'allow int path'
assert traverse_obj(_TEST_DATA, 1.2) == 1.2, \
'allow float path'
assert traverse_obj(_TEST_DATA, None) == _TEST_DATA, \
'`None` should not perform any modification'
def test_traversal_ellipsis(self):
assert traverse_obj(_TEST_DATA, ...) == [x for x in _TEST_DATA.values() if x not in (None, {})], \
'`...` should give all non discarded values'
assert traverse_obj(_TEST_DATA, ('urls', 0, ...)) == list(_TEST_DATA['urls'][0].values()), \
'`...` selection for dicts should select all values'
assert traverse_obj(_TEST_DATA, (..., ..., 'url')) == ['https://www.example.com/0', 'https://www.example.com/1'], \
'nested `...` queries should work'
assert traverse_obj(_TEST_DATA, (..., ..., 'index')) == list(range(4)), \
'`...` query result should be flattened'
assert traverse_obj(iter(range(4)), ...) == list(range(4)), \
'`...` should accept iterables'
def test_traversal_function(self):
filter_func = lambda x, y: x == 'urls' and isinstance(y, list)
assert traverse_obj(_TEST_DATA, filter_func) == [_TEST_DATA['urls']], \
'function as query key should perform a filter based on (key, value)'
assert traverse_obj(_TEST_DATA, lambda _, x: isinstance(x[0], str)) == ['str'], \
'exceptions in the query function should be catched'
assert traverse_obj(iter(range(4)), lambda _, x: x % 2 == 0) == [0, 2], \
'function key should accept iterables'
# Wrong function signature should raise (debug mode)
with pytest.raises(Exception):
traverse_obj(_TEST_DATA, lambda a: ...)
with pytest.raises(Exception):
traverse_obj(_TEST_DATA, lambda a, b, c: ...)
def test_traversal_set(self):
# transformation/type, like `expected_type`
assert traverse_obj(_TEST_DATA, (..., {str.upper}, )) == ['STR'], \
'Function in set should be a transformation'
assert traverse_obj(_TEST_DATA, (..., {str})) == ['str'], \
'Type in set should be a type filter'
assert traverse_obj(_TEST_DATA, (..., {str, int})) == [100, 'str'], \
'Multiple types in set should be a type filter'
assert traverse_obj(_TEST_DATA, {dict}) == _TEST_DATA, \
'A single set should be wrapped into a path'
assert traverse_obj(_TEST_DATA, (..., {str.upper})) == ['STR'], \
'Transformation function should not raise'
expected = [x for x in map(str_or_none, _TEST_DATA.values()) if x is not None]
assert traverse_obj(_TEST_DATA, (..., {str_or_none})) == expected, \
'Function in set should be a transformation'
assert traverse_obj(_TEST_DATA, ('fail', {lambda _: 'const'})) == 'const', \
'Function in set should always be called'
# Sets with length < 1 or > 1 not including only types should raise
with pytest.raises(Exception):
traverse_obj(_TEST_DATA, set())
with pytest.raises(Exception):
traverse_obj(_TEST_DATA, {str.upper, str})
def test_traversal_slice(self):
_SLICE_DATA = [0, 1, 2, 3, 4]
assert traverse_obj(_TEST_DATA, ('dict', slice(1))) is None, \
'slice on a dictionary should not throw'
assert traverse_obj(_SLICE_DATA, slice(1)) == _SLICE_DATA[:1], \
'slice key should apply slice to sequence'
assert traverse_obj(_SLICE_DATA, slice(1, 2)) == _SLICE_DATA[1:2], \
'slice key should apply slice to sequence'
assert traverse_obj(_SLICE_DATA, slice(1, 4, 2)) == _SLICE_DATA[1:4:2], \
'slice key should apply slice to sequence'
def test_traversal_alternatives(self):
assert traverse_obj(_TEST_DATA, 'fail', 'str') == 'str', \
'multiple `paths` should be treated as alternative paths'
assert traverse_obj(_TEST_DATA, 'str', 100) == 'str', \
'alternatives should exit early'
assert traverse_obj(_TEST_DATA, 'fail', 'fail') is None, \
'alternatives should return `default` if exhausted'
assert traverse_obj(_TEST_DATA, (..., 'fail'), 100) == 100, \
'alternatives should track their own branching return'
assert traverse_obj(_TEST_DATA, ('dict', ...), ('data', ...)) == list(_TEST_DATA['data']), \
'alternatives on empty objects should search further'
def test_traversal_branching_nesting(self):
assert traverse_obj(_TEST_DATA, ('urls', (3, 0), 'url')) == ['https://www.example.com/0'], \
'tuple as key should be treated as branches'
assert traverse_obj(_TEST_DATA, ('urls', [3, 0], 'url')) == ['https://www.example.com/0'], \
'list as key should be treated as branches'
assert traverse_obj(_TEST_DATA, ('urls', ((1, 'fail'), (0, 'url')))) == ['https://www.example.com/0'], \
'double nesting in path should be treated as paths'
assert traverse_obj(['0', [1, 2]], [(0, 1), 0]) == [1], \
'do not fail early on branching'
expected = ['https://www.example.com/0', 'https://www.example.com/1']
assert traverse_obj(_TEST_DATA, ('urls', ((0, ('fail', 'url')), (1, 'url')))) == expected, \
'tripple nesting in path should be treated as branches'
assert traverse_obj(_TEST_DATA, ('urls', ('fail', (..., 'url')))) == expected, \
'ellipsis as branch path start gets flattened'
def test_traversal_dict(self):
assert traverse_obj(_TEST_DATA, {0: 100, 1: 1.2}) == {0: 100, 1: 1.2}, \
'dict key should result in a dict with the same keys'
expected = {0: 'https://www.example.com/0'}
assert traverse_obj(_TEST_DATA, {0: ('urls', 0, 'url')}) == expected, \
'dict key should allow paths'
expected = {0: ['https://www.example.com/0']}
assert traverse_obj(_TEST_DATA, {0: ('urls', (3, 0), 'url')}) == expected, \
'tuple in dict path should be treated as branches'
assert traverse_obj(_TEST_DATA, {0: ('urls', ((1, 'fail'), (0, 'url')))}) == expected, \
'double nesting in dict path should be treated as paths'
expected = {0: ['https://www.example.com/1', 'https://www.example.com/0']}
assert traverse_obj(_TEST_DATA, {0: ('urls', ((1, ('fail', 'url')), (0, 'url')))}) == expected, \
'tripple nesting in dict path should be treated as branches'
assert traverse_obj(_TEST_DATA, {0: 'fail'}) == {}, \
'remove `None` values when top level dict key fails'
assert traverse_obj(_TEST_DATA, {0: 'fail'}, default=...) == {0: ...}, \
'use `default` if key fails and `default`'
assert traverse_obj(_TEST_DATA, {0: 'dict'}) == {}, \
'remove empty values when dict key'
assert traverse_obj(_TEST_DATA, {0: 'dict'}, default=...) == {0: ...}, \
'use `default` when dict key and `default`'
assert traverse_obj(_TEST_DATA, {0: {0: 'fail'}}) == {}, \
'remove empty values when nested dict key fails'
assert traverse_obj(None, {0: 'fail'}) == {}, \
'default to dict if pruned'
assert traverse_obj(None, {0: 'fail'}, default=...) == {0: ...}, \
'default to dict if pruned and default is given'
assert traverse_obj(_TEST_DATA, {0: {0: 'fail'}}, default=...) == {0: {0: ...}}, \
'use nested `default` when nested dict key fails and `default`'
assert traverse_obj(_TEST_DATA, {0: ('dict', ...)}) == {}, \
'remove key if branch in dict key not successful'
def test_traversal_default(self):
_DEFAULT_DATA = {'None': None, 'int': 0, 'list': []}
assert traverse_obj(_DEFAULT_DATA, 'fail') is None, \
'default value should be `None`'
assert traverse_obj(_DEFAULT_DATA, 'fail', 'fail', default=...) == ..., \
'chained fails should result in default'
assert traverse_obj(_DEFAULT_DATA, 'None', 'int') == 0, \
'should not short cirquit on `None`'
assert traverse_obj(_DEFAULT_DATA, 'fail', default=1) == 1, \
'invalid dict key should result in `default`'
assert traverse_obj(_DEFAULT_DATA, 'None', default=1) == 1, \
'`None` is a deliberate sentinel and should become `default`'
assert traverse_obj(_DEFAULT_DATA, ('list', 10)) is None, \
'`IndexError` should result in `default`'
assert traverse_obj(_DEFAULT_DATA, (..., 'fail'), default=1) == 1, \
'if branched but not successful return `default` if defined, not `[]`'
assert traverse_obj(_DEFAULT_DATA, (..., 'fail'), default=None) is None, \
'if branched but not successful return `default` even if `default` is `None`'
assert traverse_obj(_DEFAULT_DATA, (..., 'fail')) == [], \
'if branched but not successful return `[]`, not `default`'
assert traverse_obj(_DEFAULT_DATA, ('list', ...)) == [], \
'if branched but object is empty return `[]`, not `default`'
assert traverse_obj(None, ...) == [], \
'if branched but object is `None` return `[]`, not `default`'
assert traverse_obj({0: None}, (0, ...)) == [], \
'if branched but state is `None` return `[]`, not `default`'
@pytest.mark.parametrize('path', [
('fail', ...),
(..., 'fail'),
100 * ('fail',) + (...,),
(...,) + 100 * ('fail',),
])
def test_traversal_branching(self, path):
assert traverse_obj({}, path) == [], \
'if branched but state is `None`, return `[]` (not `default`)'
assert traverse_obj({}, 'fail', path) == [], \
'if branching in last alternative and previous did not match, return `[]` (not `default`)'
assert traverse_obj({0: 'x'}, 0, path) == 'x', \
'if branching in last alternative and previous did match, return single value'
assert traverse_obj({0: 'x'}, path, 0) == 'x', \
'if branching in first alternative and non-branching path does match, return single value'
assert traverse_obj({}, path, 'fail') is None, \
'if branching in first alternative and non-branching path does not match, return `default`'
def test_traversal_expected_type(self):
_EXPECTED_TYPE_DATA = {'str': 'str', 'int': 0}
assert traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=str) == 'str', \
'accept matching `expected_type` type'
assert traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=int) is None, \
'reject non matching `expected_type` type'
assert traverse_obj(_EXPECTED_TYPE_DATA, 'int', expected_type=lambda x: str(x)) == '0', \
'transform type using type function'
assert traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=lambda _: 1 / 0) is None, \
'wrap expected_type fuction in try_call'
assert traverse_obj(_EXPECTED_TYPE_DATA, ..., expected_type=str) == ['str'], \
'eliminate items that expected_type fails on'
assert traverse_obj(_TEST_DATA, {0: 100, 1: 1.2}, expected_type=int) == {0: 100}, \
'type as expected_type should filter dict values'
assert traverse_obj(_TEST_DATA, {0: 100, 1: 1.2, 2: 'None'}, expected_type=str_or_none) == {0: '100', 1: '1.2'}, \
'function as expected_type should transform dict values'
assert traverse_obj(_TEST_DATA, ({0: 1.2}, 0, {int_or_none}), expected_type=int) == 1, \
'expected_type should not filter non final dict values'
assert traverse_obj(_TEST_DATA, {0: {0: 100, 1: 'str'}}, expected_type=int) == {0: {0: 100}}, \
'expected_type should transform deep dict values'
assert traverse_obj(_TEST_DATA, [({0: '...'}, {0: '...'})], expected_type=type(...)) == [{0: ...}, {0: ...}], \
'expected_type should transform branched dict values'
assert traverse_obj({1: {3: 4}}, [(1, 2), 3], expected_type=int) == [4], \
'expected_type regression for type matching in tuple branching'
assert traverse_obj(_TEST_DATA, ['data', ...], expected_type=int) == [], \
'expected_type regression for type matching in dict result'
def test_traversal_get_all(self):
_GET_ALL_DATA = {'key': [0, 1, 2]}
assert traverse_obj(_GET_ALL_DATA, ('key', ...), get_all=False) == 0, \
'if not `get_all`, return only first matching value'
assert traverse_obj(_GET_ALL_DATA, ..., get_all=False) == [0, 1, 2], \
'do not overflatten if not `get_all`'
def test_traversal_casesense(self):
_CASESENSE_DATA = {
'KeY': 'value0',
0: {
'KeY': 'value1',
0: {'KeY': 'value2'},
},
}
assert traverse_obj(_CASESENSE_DATA, 'key') is None, \
'dict keys should be case sensitive unless `casesense`'
assert traverse_obj(_CASESENSE_DATA, 'keY', casesense=False) == 'value0', \
'allow non matching key case if `casesense`'
assert traverse_obj(_CASESENSE_DATA, [0, ('keY',)], casesense=False) == ['value1'], \
'allow non matching key case in branch if `casesense`'
assert traverse_obj(_CASESENSE_DATA, [0, ([0, 'keY'],)], casesense=False) == ['value2'], \
'allow non matching key case in branch path if `casesense`'
def test_traversal_traverse_string(self):
_TRAVERSE_STRING_DATA = {'str': 'str', 1.2: 1.2}
assert traverse_obj(_TRAVERSE_STRING_DATA, ('str', 0)) is None, \
'do not traverse into string if not `traverse_string`'
assert traverse_obj(_TRAVERSE_STRING_DATA, ('str', 0), traverse_string=True) == 's', \
'traverse into string if `traverse_string`'
assert traverse_obj(_TRAVERSE_STRING_DATA, (1.2, 1), traverse_string=True) == '.', \
'traverse into converted data if `traverse_string`'
assert traverse_obj(_TRAVERSE_STRING_DATA, ('str', ...), traverse_string=True) == 'str', \
'`...` should result in string (same value) if `traverse_string`'
assert traverse_obj(_TRAVERSE_STRING_DATA, ('str', slice(0, None, 2)), traverse_string=True) == 'sr', \
'`slice` should result in string if `traverse_string`'
assert traverse_obj(_TRAVERSE_STRING_DATA, ('str', lambda i, v: i or v == "s"), traverse_string=True) == 'str', \
'function should result in string if `traverse_string`'
assert traverse_obj(_TRAVERSE_STRING_DATA, ('str', (0, 2)), traverse_string=True) == ['s', 'r'], \
'branching should result in list if `traverse_string`'
assert traverse_obj({}, (0, ...), traverse_string=True) == [], \
'branching should result in list if `traverse_string`'
assert traverse_obj({}, (0, lambda x, y: True), traverse_string=True) == [], \
'branching should result in list if `traverse_string`'
assert traverse_obj({}, (0, slice(1)), traverse_string=True) == [], \
'branching should result in list if `traverse_string`'
def test_traversal_re(self):
mobj = re.fullmatch(r'0(12)(?P<group>3)(4)?', '0123')
assert traverse_obj(mobj, ...) == [x for x in mobj.groups() if x is not None], \
'`...` on a `re.Match` should give its `groups()`'
assert traverse_obj(mobj, lambda k, _: k in (0, 2)) == ['0123', '3'], \
'function on a `re.Match` should give groupno, value starting at 0'
assert traverse_obj(mobj, 'group') == '3', \
'str key on a `re.Match` should give group with that name'
assert traverse_obj(mobj, 2) == '3', \
'int key on a `re.Match` should give group with that name'
assert traverse_obj(mobj, 'gRoUp', casesense=False) == '3', \
'str key on a `re.Match` should respect casesense'
assert traverse_obj(mobj, 'fail') is None, \
'failing str key on a `re.Match` should return `default`'
assert traverse_obj(mobj, 'gRoUpS', casesense=False) is None, \
'failing str key on a `re.Match` should return `default`'
assert traverse_obj(mobj, 8) is None, \
'failing int key on a `re.Match` should return `default`'
assert traverse_obj(mobj, lambda k, _: k in (0, 'group')) == ['0123', '3'], \
'function on a `re.Match` should give group name as well'
def test_traversal_xml_etree(self):
etree = xml.etree.ElementTree.fromstring('''<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>''')
assert traverse_obj(etree, '') == etree, \
'empty str key should return the element itself'
assert traverse_obj(etree, 'country') == list(etree), \
'str key should lead all children with that tag name'
assert traverse_obj(etree, ...) == list(etree), \
'`...` as key should return all children'
assert traverse_obj(etree, lambda _, x: x[0].text == '4') == [etree[1]], \
'function as key should get element as value'
assert traverse_obj(etree, lambda i, _: i == 1) == [etree[1]], \
'function as key should get index as key'
assert traverse_obj(etree, 0) == etree[0], \
'int key should return the nth child'
expected = ['Austria', 'Switzerland', 'Malaysia', 'Costa Rica', 'Colombia']
assert traverse_obj(etree, './/neighbor/@name') == expected, \
'`@<attribute>` at end of path should give that attribute'
assert traverse_obj(etree, '//neighbor/@fail') == [None, None, None, None, None], \
'`@<nonexistant>` at end of path should give `None`'
assert traverse_obj(etree, ('//neighbor/@', 2)) == {'name': 'Malaysia', 'direction': 'N'}, \
'`@` should give the full attribute dict'
assert traverse_obj(etree, '//year/text()') == ['2008', '2011', '2011'], \
'`text()` at end of path should give the inner text'
assert traverse_obj(etree, '//*[@direction]/@direction') == ['E', 'W', 'N', 'W', 'E'], \
'full Python xpath features should be supported'
assert traverse_obj(etree, (0, '@name')) == 'Liechtenstein', \
'special transformations should act on current element'
assert traverse_obj(etree, ('country', 0, ..., 'text()', {int_or_none})) == [1, 2008, 141100], \
'special transformations should act on current element'
def test_traversal_unbranching(self):
assert traverse_obj(_TEST_DATA, [(100, 1.2), all]) == [100, 1.2], \
'`all` should give all results as list'
assert traverse_obj(_TEST_DATA, [(100, 1.2), any]) == 100, \
'`any` should give the first result'
assert traverse_obj(_TEST_DATA, [100, all]) == [100], \
'`all` should give list if non branching'
assert traverse_obj(_TEST_DATA, [100, any]) == 100, \
'`any` should give single item if non branching'
assert traverse_obj(_TEST_DATA, [('dict', 'None', 100), all]) == [100], \
'`all` should filter `None` and empty dict'
assert traverse_obj(_TEST_DATA, [('dict', 'None', 100), any]) == 100, \
'`any` should filter `None` and empty dict'
assert traverse_obj(_TEST_DATA, [{
'all': [('dict', 'None', 100, 1.2), all],
'any': [('dict', 'None', 100, 1.2), any],
}]) == {'all': [100, 1.2], 'any': 100}, \
'`all`/`any` should apply to each dict path separately'
assert traverse_obj(_TEST_DATA, [{
'all': [('dict', 'None', 100, 1.2), all],
'any': [('dict', 'None', 100, 1.2), any],
}], get_all=False) == {'all': [100, 1.2], 'any': 100}, \
'`all`/`any` should apply to dict regardless of `get_all`'
assert traverse_obj(_TEST_DATA, [('dict', 'None', 100, 1.2), all, {float}]) is None, \
'`all` should reset branching status'
assert traverse_obj(_TEST_DATA, [('dict', 'None', 100, 1.2), any, {float}]) is None, \
'`any` should reset branching status'
assert traverse_obj(_TEST_DATA, [('dict', 'None', 100, 1.2), all, ..., {float}]) == [1.2], \
'`all` should allow further branching'
assert traverse_obj(_TEST_DATA, [('dict', 'None', 'urls', 'data'), any, ..., 'index']) == [0, 1], \
'`any` should allow further branching'
def test_traversal_morsel(self):
values = {
'expires': 'a',
'path': 'b',
'comment': 'c',
'domain': 'd',
'max-age': 'e',
'secure': 'f',
'httponly': 'g',
'version': 'h',
'samesite': 'i',
}
morsel = http.cookies.Morsel()
morsel.set('item_key', 'item_value', 'coded_value')
morsel.update(values)
values['key'] = 'item_key'
values['value'] = 'item_value'
for key, value in values.items():
assert traverse_obj(morsel, key) == value, \
'Morsel should provide access to all values'
assert traverse_obj(morsel, ...) == list(values.values()), \
'`...` should yield all values'
assert traverse_obj(morsel, lambda k, v: True) == list(values.values()), \
'function key should yield all values'
assert traverse_obj(morsel, [(None,), any]) == morsel, \
'Morsel should not be implicitly changed to dict on usage'
class TestDictGet:
def test_dict_get(self):
FALSE_VALUES = {
'none': None,
'false': False,
'zero': 0,
'empty_string': '',
'empty_list': [],
}
d = {**FALSE_VALUES, 'a': 42}
assert dict_get(d, 'a') == 42
assert dict_get(d, 'b') is None
assert dict_get(d, 'b', 42) == 42
assert dict_get(d, ('a',)) == 42
assert dict_get(d, ('b', 'a')) == 42
assert dict_get(d, ('b', 'c', 'a', 'd')) == 42
assert dict_get(d, ('b', 'c')) is None
assert dict_get(d, ('b', 'c'), 42) == 42
for key, false_value in FALSE_VALUES.items():
assert dict_get(d, ('b', 'c', key)) is None
assert dict_get(d, ('b', 'c', key), skip_false_values=False) == false_value

Wyświetl plik

@ -2,7 +2,6 @@
# Allow direct execution
import os
import re
import sys
import unittest
import warnings
@ -45,7 +44,6 @@ from yt_dlp.utils import (
determine_ext,
determine_file_encoding,
dfxp2srt,
dict_get,
encode_base_n,
encode_compat_str,
encodeFilename,
@ -106,13 +104,11 @@ from yt_dlp.utils import (
sanitize_url,
shell_quote,
smuggle_url,
str_or_none,
str_to_int,
strip_jsonp,
strip_or_none,
subtitles_filename,
timeconvert,
traverse_obj,
try_call,
unescapeHTML,
unified_strdate,
@ -755,28 +751,6 @@ class TestUtil(unittest.TestCase):
self.assertRaises(
ValueError, multipart_encode, {b'field': b'value'}, boundary='value')
def test_dict_get(self):
FALSE_VALUES = {
'none': None,
'false': False,
'zero': 0,
'empty_string': '',
'empty_list': [],
}
d = FALSE_VALUES.copy()
d['a'] = 42
self.assertEqual(dict_get(d, 'a'), 42)
self.assertEqual(dict_get(d, 'b'), None)
self.assertEqual(dict_get(d, 'b', 42), 42)
self.assertEqual(dict_get(d, ('a', )), 42)
self.assertEqual(dict_get(d, ('b', 'a', )), 42)
self.assertEqual(dict_get(d, ('b', 'c', 'a', 'd', )), 42)
self.assertEqual(dict_get(d, ('b', 'c', )), None)
self.assertEqual(dict_get(d, ('b', 'c', ), 42), 42)
for key, false_value in FALSE_VALUES.items():
self.assertEqual(dict_get(d, ('b', 'c', key, )), None)
self.assertEqual(dict_get(d, ('b', 'c', key, ), skip_false_values=False), false_value)
def test_merge_dicts(self):
self.assertEqual(merge_dicts({'a': 1}, {'b': 2}), {'a': 1, 'b': 2})
self.assertEqual(merge_dicts({'a': 1}, {'a': 2}), {'a': 1})
@ -2039,359 +2013,6 @@ Line 1
warnings.simplefilter('ignore')
self.assertEqual(variadic('spam', allowed_types=[dict]), 'spam')
def test_traverse_obj(self):
_TEST_DATA = {
100: 100,
1.2: 1.2,
'str': 'str',
'None': None,
'...': ...,
'urls': [
{'index': 0, 'url': 'https://www.example.com/0'},
{'index': 1, 'url': 'https://www.example.com/1'},
],
'data': (
{'index': 2},
{'index': 3},
),
'dict': {},
}
# Test base functionality
self.assertEqual(traverse_obj(_TEST_DATA, ('str',)), 'str',
msg='allow tuple path')
self.assertEqual(traverse_obj(_TEST_DATA, ['str']), 'str',
msg='allow list path')
self.assertEqual(traverse_obj(_TEST_DATA, (value for value in ("str",))), 'str',
msg='allow iterable path')
self.assertEqual(traverse_obj(_TEST_DATA, 'str'), 'str',
msg='single items should be treated as a path')
self.assertEqual(traverse_obj(_TEST_DATA, None), _TEST_DATA)
self.assertEqual(traverse_obj(_TEST_DATA, 100), 100)
self.assertEqual(traverse_obj(_TEST_DATA, 1.2), 1.2)
# Test Ellipsis behavior
self.assertCountEqual(traverse_obj(_TEST_DATA, ...),
(item for item in _TEST_DATA.values() if item not in (None, {})),
msg='`...` should give all non discarded values')
self.assertCountEqual(traverse_obj(_TEST_DATA, ('urls', 0, ...)), _TEST_DATA['urls'][0].values(),
msg='`...` selection for dicts should select all values')
self.assertEqual(traverse_obj(_TEST_DATA, (..., ..., 'url')),
['https://www.example.com/0', 'https://www.example.com/1'],
msg='nested `...` queries should work')
self.assertCountEqual(traverse_obj(_TEST_DATA, (..., ..., 'index')), range(4),
msg='`...` query result should be flattened')
self.assertEqual(traverse_obj(iter(range(4)), ...), list(range(4)),
msg='`...` should accept iterables')
# Test function as key
self.assertEqual(traverse_obj(_TEST_DATA, lambda x, y: x == 'urls' and isinstance(y, list)),
[_TEST_DATA['urls']],
msg='function as query key should perform a filter based on (key, value)')
self.assertCountEqual(traverse_obj(_TEST_DATA, lambda _, x: isinstance(x[0], str)), {'str'},
msg='exceptions in the query function should be catched')
self.assertEqual(traverse_obj(iter(range(4)), lambda _, x: x % 2 == 0), [0, 2],
msg='function key should accept iterables')
if __debug__:
with self.assertRaises(Exception, msg='Wrong function signature should raise in debug'):
traverse_obj(_TEST_DATA, lambda a: ...)
with self.assertRaises(Exception, msg='Wrong function signature should raise in debug'):
traverse_obj(_TEST_DATA, lambda a, b, c: ...)
# Test set as key (transformation/type, like `expected_type`)
self.assertEqual(traverse_obj(_TEST_DATA, (..., {str.upper}, )), ['STR'],
msg='Function in set should be a transformation')
self.assertEqual(traverse_obj(_TEST_DATA, (..., {str})), ['str'],
msg='Type in set should be a type filter')
self.assertEqual(traverse_obj(_TEST_DATA, {dict}), _TEST_DATA,
msg='A single set should be wrapped into a path')
self.assertEqual(traverse_obj(_TEST_DATA, (..., {str.upper})), ['STR'],
msg='Transformation function should not raise')
self.assertEqual(traverse_obj(_TEST_DATA, (..., {str_or_none})),
[item for item in map(str_or_none, _TEST_DATA.values()) if item is not None],
msg='Function in set should be a transformation')
self.assertEqual(traverse_obj(_TEST_DATA, ('fail', {lambda _: 'const'})), 'const',
msg='Function in set should always be called')
if __debug__:
with self.assertRaises(Exception, msg='Sets with length != 1 should raise in debug'):
traverse_obj(_TEST_DATA, set())
with self.assertRaises(Exception, msg='Sets with length != 1 should raise in debug'):
traverse_obj(_TEST_DATA, {str.upper, str})
# Test `slice` as a key
_SLICE_DATA = [0, 1, 2, 3, 4]
self.assertEqual(traverse_obj(_TEST_DATA, ('dict', slice(1))), None,
msg='slice on a dictionary should not throw')
self.assertEqual(traverse_obj(_SLICE_DATA, slice(1)), _SLICE_DATA[:1],
msg='slice key should apply slice to sequence')
self.assertEqual(traverse_obj(_SLICE_DATA, slice(1, 2)), _SLICE_DATA[1:2],
msg='slice key should apply slice to sequence')
self.assertEqual(traverse_obj(_SLICE_DATA, slice(1, 4, 2)), _SLICE_DATA[1:4:2],
msg='slice key should apply slice to sequence')
# Test alternative paths
self.assertEqual(traverse_obj(_TEST_DATA, 'fail', 'str'), 'str',
msg='multiple `paths` should be treated as alternative paths')
self.assertEqual(traverse_obj(_TEST_DATA, 'str', 100), 'str',
msg='alternatives should exit early')
self.assertEqual(traverse_obj(_TEST_DATA, 'fail', 'fail'), None,
msg='alternatives should return `default` if exhausted')
self.assertEqual(traverse_obj(_TEST_DATA, (..., 'fail'), 100), 100,
msg='alternatives should track their own branching return')
self.assertEqual(traverse_obj(_TEST_DATA, ('dict', ...), ('data', ...)), list(_TEST_DATA['data']),
msg='alternatives on empty objects should search further')
# Test branch and path nesting
self.assertEqual(traverse_obj(_TEST_DATA, ('urls', (3, 0), 'url')), ['https://www.example.com/0'],
msg='tuple as key should be treated as branches')
self.assertEqual(traverse_obj(_TEST_DATA, ('urls', [3, 0], 'url')), ['https://www.example.com/0'],
msg='list as key should be treated as branches')
self.assertEqual(traverse_obj(_TEST_DATA, ('urls', ((1, 'fail'), (0, 'url')))), ['https://www.example.com/0'],
msg='double nesting in path should be treated as paths')
self.assertEqual(traverse_obj(['0', [1, 2]], [(0, 1), 0]), [1],
msg='do not fail early on branching')
self.assertCountEqual(traverse_obj(_TEST_DATA, ('urls', ((1, ('fail', 'url')), (0, 'url')))),
['https://www.example.com/0', 'https://www.example.com/1'],
msg='tripple nesting in path should be treated as branches')
self.assertEqual(traverse_obj(_TEST_DATA, ('urls', ('fail', (..., 'url')))),
['https://www.example.com/0', 'https://www.example.com/1'],
msg='ellipsis as branch path start gets flattened')
# Test dictionary as key
self.assertEqual(traverse_obj(_TEST_DATA, {0: 100, 1: 1.2}), {0: 100, 1: 1.2},
msg='dict key should result in a dict with the same keys')
self.assertEqual(traverse_obj(_TEST_DATA, {0: ('urls', 0, 'url')}),
{0: 'https://www.example.com/0'},
msg='dict key should allow paths')
self.assertEqual(traverse_obj(_TEST_DATA, {0: ('urls', (3, 0), 'url')}),
{0: ['https://www.example.com/0']},
msg='tuple in dict path should be treated as branches')
self.assertEqual(traverse_obj(_TEST_DATA, {0: ('urls', ((1, 'fail'), (0, 'url')))}),
{0: ['https://www.example.com/0']},
msg='double nesting in dict path should be treated as paths')
self.assertEqual(traverse_obj(_TEST_DATA, {0: ('urls', ((1, ('fail', 'url')), (0, 'url')))}),
{0: ['https://www.example.com/1', 'https://www.example.com/0']},
msg='tripple nesting in dict path should be treated as branches')
self.assertEqual(traverse_obj(_TEST_DATA, {0: 'fail'}), {},
msg='remove `None` values when top level dict key fails')
self.assertEqual(traverse_obj(_TEST_DATA, {0: 'fail'}, default=...), {0: ...},
msg='use `default` if key fails and `default`')
self.assertEqual(traverse_obj(_TEST_DATA, {0: 'dict'}), {},
msg='remove empty values when dict key')
self.assertEqual(traverse_obj(_TEST_DATA, {0: 'dict'}, default=...), {0: ...},
msg='use `default` when dict key and `default`')
self.assertEqual(traverse_obj(_TEST_DATA, {0: {0: 'fail'}}), {},
msg='remove empty values when nested dict key fails')
self.assertEqual(traverse_obj(None, {0: 'fail'}), {},
msg='default to dict if pruned')
self.assertEqual(traverse_obj(None, {0: 'fail'}, default=...), {0: ...},
msg='default to dict if pruned and default is given')
self.assertEqual(traverse_obj(_TEST_DATA, {0: {0: 'fail'}}, default=...), {0: {0: ...}},
msg='use nested `default` when nested dict key fails and `default`')
self.assertEqual(traverse_obj(_TEST_DATA, {0: ('dict', ...)}), {},
msg='remove key if branch in dict key not successful')
# Testing default parameter behavior
_DEFAULT_DATA = {'None': None, 'int': 0, 'list': []}
self.assertEqual(traverse_obj(_DEFAULT_DATA, 'fail'), None,
msg='default value should be `None`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, 'fail', 'fail', default=...), ...,
msg='chained fails should result in default')
self.assertEqual(traverse_obj(_DEFAULT_DATA, 'None', 'int'), 0,
msg='should not short cirquit on `None`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, 'fail', default=1), 1,
msg='invalid dict key should result in `default`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, 'None', default=1), 1,
msg='`None` is a deliberate sentinel and should become `default`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, ('list', 10)), None,
msg='`IndexError` should result in `default`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, (..., 'fail'), default=1), 1,
msg='if branched but not successful return `default` if defined, not `[]`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, (..., 'fail'), default=None), None,
msg='if branched but not successful return `default` even if `default` is `None`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, (..., 'fail')), [],
msg='if branched but not successful return `[]`, not `default`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, ('list', ...)), [],
msg='if branched but object is empty return `[]`, not `default`')
self.assertEqual(traverse_obj(None, ...), [],
msg='if branched but object is `None` return `[]`, not `default`')
self.assertEqual(traverse_obj({0: None}, (0, ...)), [],
msg='if branched but state is `None` return `[]`, not `default`')
branching_paths = [
('fail', ...),
(..., 'fail'),
100 * ('fail',) + (...,),
(...,) + 100 * ('fail',),
]
for branching_path in branching_paths:
self.assertEqual(traverse_obj({}, branching_path), [],
msg='if branched but state is `None`, return `[]` (not `default`)')
self.assertEqual(traverse_obj({}, 'fail', branching_path), [],
msg='if branching in last alternative and previous did not match, return `[]` (not `default`)')
self.assertEqual(traverse_obj({0: 'x'}, 0, branching_path), 'x',
msg='if branching in last alternative and previous did match, return single value')
self.assertEqual(traverse_obj({0: 'x'}, branching_path, 0), 'x',
msg='if branching in first alternative and non-branching path does match, return single value')
self.assertEqual(traverse_obj({}, branching_path, 'fail'), None,
msg='if branching in first alternative and non-branching path does not match, return `default`')
# Testing expected_type behavior
_EXPECTED_TYPE_DATA = {'str': 'str', 'int': 0}
self.assertEqual(traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=str),
'str', msg='accept matching `expected_type` type')
self.assertEqual(traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=int),
None, msg='reject non matching `expected_type` type')
self.assertEqual(traverse_obj(_EXPECTED_TYPE_DATA, 'int', expected_type=lambda x: str(x)),
'0', msg='transform type using type function')
self.assertEqual(traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=lambda _: 1 / 0),
None, msg='wrap expected_type fuction in try_call')
self.assertEqual(traverse_obj(_EXPECTED_TYPE_DATA, ..., expected_type=str),
['str'], msg='eliminate items that expected_type fails on')
self.assertEqual(traverse_obj(_TEST_DATA, {0: 100, 1: 1.2}, expected_type=int),
{0: 100}, msg='type as expected_type should filter dict values')
self.assertEqual(traverse_obj(_TEST_DATA, {0: 100, 1: 1.2, 2: 'None'}, expected_type=str_or_none),
{0: '100', 1: '1.2'}, msg='function as expected_type should transform dict values')
self.assertEqual(traverse_obj(_TEST_DATA, ({0: 1.2}, 0, {int_or_none}), expected_type=int),
1, msg='expected_type should not filter non final dict values')
self.assertEqual(traverse_obj(_TEST_DATA, {0: {0: 100, 1: 'str'}}, expected_type=int),
{0: {0: 100}}, msg='expected_type should transform deep dict values')
self.assertEqual(traverse_obj(_TEST_DATA, [({0: '...'}, {0: '...'})], expected_type=type(...)),
[{0: ...}, {0: ...}], msg='expected_type should transform branched dict values')
self.assertEqual(traverse_obj({1: {3: 4}}, [(1, 2), 3], expected_type=int),
[4], msg='expected_type regression for type matching in tuple branching')
self.assertEqual(traverse_obj(_TEST_DATA, ['data', ...], expected_type=int),
[], msg='expected_type regression for type matching in dict result')
# Test get_all behavior
_GET_ALL_DATA = {'key': [0, 1, 2]}
self.assertEqual(traverse_obj(_GET_ALL_DATA, ('key', ...), get_all=False), 0,
msg='if not `get_all`, return only first matching value')
self.assertEqual(traverse_obj(_GET_ALL_DATA, ..., get_all=False), [0, 1, 2],
msg='do not overflatten if not `get_all`')
# Test casesense behavior
_CASESENSE_DATA = {
'KeY': 'value0',
0: {
'KeY': 'value1',
0: {'KeY': 'value2'},
},
}
self.assertEqual(traverse_obj(_CASESENSE_DATA, 'key'), None,
msg='dict keys should be case sensitive unless `casesense`')
self.assertEqual(traverse_obj(_CASESENSE_DATA, 'keY',
casesense=False), 'value0',
msg='allow non matching key case if `casesense`')
self.assertEqual(traverse_obj(_CASESENSE_DATA, (0, ('keY',)),
casesense=False), ['value1'],
msg='allow non matching key case in branch if `casesense`')
self.assertEqual(traverse_obj(_CASESENSE_DATA, (0, ((0, 'keY'),)),
casesense=False), ['value2'],
msg='allow non matching key case in branch path if `casesense`')
# Test traverse_string behavior
_TRAVERSE_STRING_DATA = {'str': 'str', 1.2: 1.2}
self.assertEqual(traverse_obj(_TRAVERSE_STRING_DATA, ('str', 0)), None,
msg='do not traverse into string if not `traverse_string`')
self.assertEqual(traverse_obj(_TRAVERSE_STRING_DATA, ('str', 0),
traverse_string=True), 's',
msg='traverse into string if `traverse_string`')
self.assertEqual(traverse_obj(_TRAVERSE_STRING_DATA, (1.2, 1),
traverse_string=True), '.',
msg='traverse into converted data if `traverse_string`')
self.assertEqual(traverse_obj(_TRAVERSE_STRING_DATA, ('str', ...),
traverse_string=True), 'str',
msg='`...` should result in string (same value) if `traverse_string`')
self.assertEqual(traverse_obj(_TRAVERSE_STRING_DATA, ('str', slice(0, None, 2)),
traverse_string=True), 'sr',
msg='`slice` should result in string if `traverse_string`')
self.assertEqual(traverse_obj(_TRAVERSE_STRING_DATA, ('str', lambda i, v: i or v == "s"),
traverse_string=True), 'str',
msg='function should result in string if `traverse_string`')
self.assertEqual(traverse_obj(_TRAVERSE_STRING_DATA, ('str', (0, 2)),
traverse_string=True), ['s', 'r'],
msg='branching should result in list if `traverse_string`')
self.assertEqual(traverse_obj({}, (0, ...), traverse_string=True), [],
msg='branching should result in list if `traverse_string`')
self.assertEqual(traverse_obj({}, (0, lambda x, y: True), traverse_string=True), [],
msg='branching should result in list if `traverse_string`')
self.assertEqual(traverse_obj({}, (0, slice(1)), traverse_string=True), [],
msg='branching should result in list if `traverse_string`')
# Test re.Match as input obj
mobj = re.fullmatch(r'0(12)(?P<group>3)(4)?', '0123')
self.assertEqual(traverse_obj(mobj, ...), [x for x in mobj.groups() if x is not None],
msg='`...` on a `re.Match` should give its `groups()`')
self.assertEqual(traverse_obj(mobj, lambda k, _: k in (0, 2)), ['0123', '3'],
msg='function on a `re.Match` should give groupno, value starting at 0')
self.assertEqual(traverse_obj(mobj, 'group'), '3',
msg='str key on a `re.Match` should give group with that name')
self.assertEqual(traverse_obj(mobj, 2), '3',
msg='int key on a `re.Match` should give group with that name')
self.assertEqual(traverse_obj(mobj, 'gRoUp', casesense=False), '3',
msg='str key on a `re.Match` should respect casesense')
self.assertEqual(traverse_obj(mobj, 'fail'), None,
msg='failing str key on a `re.Match` should return `default`')
self.assertEqual(traverse_obj(mobj, 'gRoUpS', casesense=False), None,
msg='failing str key on a `re.Match` should return `default`')
self.assertEqual(traverse_obj(mobj, 8), None,
msg='failing int key on a `re.Match` should return `default`')
self.assertEqual(traverse_obj(mobj, lambda k, _: k in (0, 'group')), ['0123', '3'],
msg='function on a `re.Match` should give group name as well')
# Test xml.etree.ElementTree.Element as input obj
etree = xml.etree.ElementTree.fromstring('''<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>''')
self.assertEqual(traverse_obj(etree, ''), etree,
msg='empty str key should return the element itself')
self.assertEqual(traverse_obj(etree, 'country'), list(etree),
msg='str key should lead all children with that tag name')
self.assertEqual(traverse_obj(etree, ...), list(etree),
msg='`...` as key should return all children')
self.assertEqual(traverse_obj(etree, lambda _, x: x[0].text == '4'), [etree[1]],
msg='function as key should get element as value')
self.assertEqual(traverse_obj(etree, lambda i, _: i == 1), [etree[1]],
msg='function as key should get index as key')
self.assertEqual(traverse_obj(etree, 0), etree[0],
msg='int key should return the nth child')
self.assertEqual(traverse_obj(etree, './/neighbor/@name'),
['Austria', 'Switzerland', 'Malaysia', 'Costa Rica', 'Colombia'],
msg='`@<attribute>` at end of path should give that attribute')
self.assertEqual(traverse_obj(etree, '//neighbor/@fail'), [None, None, None, None, None],
msg='`@<nonexistant>` at end of path should give `None`')
self.assertEqual(traverse_obj(etree, ('//neighbor/@', 2)), {'name': 'Malaysia', 'direction': 'N'},
msg='`@` should give the full attribute dict')
self.assertEqual(traverse_obj(etree, '//year/text()'), ['2008', '2011', '2011'],
msg='`text()` at end of path should give the inner text')
self.assertEqual(traverse_obj(etree, '//*[@direction]/@direction'), ['E', 'W', 'N', 'W', 'E'],
msg='full Python xpath features should be supported')
self.assertEqual(traverse_obj(etree, (0, '@name')), 'Liechtenstein',
msg='special transformations should act on current element')
self.assertEqual(traverse_obj(etree, ('country', 0, ..., 'text()', {int_or_none})), [1, 2008, 141100],
msg='special transformations should act on current element')
def test_http_header_dict(self):
headers = HTTPHeaderDict()
headers['ytdl-test'] = b'0'
@ -2438,7 +2059,22 @@ Line 1
assert extract_basic_auth('http://user:pass@foo.bar') == ('http://foo.bar', 'Basic dXNlcjpwYXNz')
@unittest.skipUnless(compat_os_name == 'nt', 'Only relevant on Windows')
def test_Popen_windows_escaping(self):
def test_windows_escaping(self):
tests = [
'test"&',
'%CMDCMDLINE:~-1%&',
'a\nb',
'"',
'\\',
'!',
'^!',
'a \\ b',
'a \\" b',
'a \\ b\\',
# We replace \r with \n
('a\r\ra', 'a\n\na'),
]
def run_shell(args):
stdout, stderr, error = Popen.run(
args, text=True, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
@ -2446,11 +2082,15 @@ Line 1
assert not error
return stdout
# Test escaping
assert run_shell(['echo', 'test"&']) == '"test""&"\n'
# Test if delayed expansion is disabled
assert run_shell(['echo', '^!']) == '"^!"\n'
assert run_shell('echo "^!"') == '"^!"\n'
for argument in tests:
if isinstance(argument, str):
expected = argument
else:
argument, expected = argument
args = [sys.executable, '-c', 'import sys; print(end=sys.argv[1])', argument, 'end']
assert run_shell(args) == expected
assert run_shell(shell_quote(args, shell=True)) == expected
if __name__ == '__main__':

Wyświetl plik

@ -7,6 +7,7 @@ import sys
import pytest
from test.helper import verify_address_availability
from yt_dlp.networking.common import Features
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
@ -18,7 +19,7 @@ import random
import ssl
import threading
from yt_dlp import socks
from yt_dlp import socks, traverse_obj
from yt_dlp.cookies import YoutubeDLCookieJar
from yt_dlp.dependencies import websockets
from yt_dlp.networking import Request
@ -32,8 +33,6 @@ from yt_dlp.networking.exceptions import (
)
from yt_dlp.utils.networking import HTTPHeaderDict
from test.conftest import validate_and_send
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
@ -66,7 +65,9 @@ def process_request(self, request):
def create_websocket_server(**ws_kwargs):
import websockets.sync.server
wsd = websockets.sync.server.serve(websocket_handler, '127.0.0.1', 0, process_request=process_request, **ws_kwargs)
wsd = websockets.sync.server.serve(
websocket_handler, '127.0.0.1', 0,
process_request=process_request, open_timeout=2, **ws_kwargs)
ws_port = wsd.socket.getsockname()[1]
ws_server_thread = threading.Thread(target=wsd.serve_forever)
ws_server_thread.daemon = True
@ -100,7 +101,21 @@ def create_mtls_wss_websocket_server():
return create_websocket_server(ssl_context=sslctx)
def ws_validate_and_send(rh, req):
rh.validate(req)
max_tries = 3
for i in range(max_tries):
try:
return rh.send(req)
except TransportError as e:
if i < (max_tries - 1) and 'connection closed during handshake' in str(e):
# websockets server sometimes hangs on new connections
continue
raise
@pytest.mark.skipif(not websockets, reason='websockets must be installed to test websocket request handlers')
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
class TestWebsSocketRequestHandlerConformance:
@classmethod
def setup_class(cls):
@ -116,10 +131,9 @@ class TestWebsSocketRequestHandlerConformance:
cls.mtls_wss_thread, cls.mtls_wss_port = create_mtls_wss_websocket_server()
cls.mtls_wss_base_url = f'wss://127.0.0.1:{cls.mtls_wss_port}'
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
def test_basic_websockets(self, handler):
with handler() as rh:
ws = validate_and_send(rh, Request(self.ws_base_url))
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
assert 'upgrade' in ws.headers
assert ws.status == 101
ws.send('foo')
@ -128,33 +142,29 @@ class TestWebsSocketRequestHandlerConformance:
# https://www.rfc-editor.org/rfc/rfc6455.html#section-5.6
@pytest.mark.parametrize('msg,opcode', [('str', 1), (b'bytes', 2)])
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
def test_send_types(self, handler, msg, opcode):
with handler() as rh:
ws = validate_and_send(rh, Request(self.ws_base_url))
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send(msg)
assert int(ws.recv()) == opcode
ws.close()
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
def test_verify_cert(self, handler):
with handler() as rh:
with pytest.raises(CertificateVerifyError):
validate_and_send(rh, Request(self.wss_base_url))
ws_validate_and_send(rh, Request(self.wss_base_url))
with handler(verify=False) as rh:
ws = validate_and_send(rh, Request(self.wss_base_url))
ws = ws_validate_and_send(rh, Request(self.wss_base_url))
assert ws.status == 101
ws.close()
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
def test_ssl_error(self, handler):
with handler(verify=False) as rh:
with pytest.raises(SSLError, match=r'ssl(?:v3|/tls) alert handshake failure') as exc_info:
validate_and_send(rh, Request(self.bad_wss_host))
ws_validate_and_send(rh, Request(self.bad_wss_host))
assert not issubclass(exc_info.type, CertificateVerifyError)
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
@pytest.mark.parametrize('path,expected', [
# Unicode characters should be encoded with uppercase percent-encoding
('/中文', '/%E4%B8%AD%E6%96%87'),
@ -163,18 +173,17 @@ class TestWebsSocketRequestHandlerConformance:
])
def test_percent_encode(self, handler, path, expected):
with handler() as rh:
ws = validate_and_send(rh, Request(f'{self.ws_base_url}{path}'))
ws = ws_validate_and_send(rh, Request(f'{self.ws_base_url}{path}'))
ws.send('path')
assert ws.recv() == expected
assert ws.status == 101
ws.close()
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
def test_remove_dot_segments(self, handler):
with handler() as rh:
# This isn't a comprehensive test,
# but it should be enough to check whether the handler is removing dot segments
ws = validate_and_send(rh, Request(f'{self.ws_base_url}/a/b/./../../test'))
ws = ws_validate_and_send(rh, Request(f'{self.ws_base_url}/a/b/./../../test'))
assert ws.status == 101
ws.send('path')
assert ws.recv() == '/test'
@ -182,15 +191,13 @@ class TestWebsSocketRequestHandlerConformance:
# We are restricted to known HTTP status codes in http.HTTPStatus
# Redirects are not supported for websockets
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
@pytest.mark.parametrize('status', (200, 204, 301, 302, 303, 400, 500, 511))
def test_raise_http_error(self, handler, status):
with handler() as rh:
with pytest.raises(HTTPError) as exc_info:
validate_and_send(rh, Request(f'{self.ws_base_url}/gen_{status}'))
ws_validate_and_send(rh, Request(f'{self.ws_base_url}/gen_{status}'))
assert exc_info.value.status == status
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
@pytest.mark.parametrize('params,extensions', [
({'timeout': sys.float_info.min}, {}),
({}, {'timeout': sys.float_info.min}),
@ -198,9 +205,8 @@ class TestWebsSocketRequestHandlerConformance:
def test_timeout(self, handler, params, extensions):
with handler(**params) as rh:
with pytest.raises(TransportError):
validate_and_send(rh, Request(self.ws_base_url, extensions=extensions))
ws_validate_and_send(rh, Request(self.ws_base_url, extensions=extensions))
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
def test_cookies(self, handler):
cookiejar = YoutubeDLCookieJar()
cookiejar.set_cookie(http.cookiejar.Cookie(
@ -210,52 +216,49 @@ class TestWebsSocketRequestHandlerConformance:
comment_url=None, rest={}))
with handler(cookiejar=cookiejar) as rh:
ws = validate_and_send(rh, Request(self.ws_base_url))
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp'
ws.close()
with handler() as rh:
ws = validate_and_send(rh, Request(self.ws_base_url))
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers')
assert 'cookie' not in json.loads(ws.recv())
ws.close()
ws = validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': cookiejar}))
ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': cookiejar}))
ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp'
ws.close()
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
def test_source_address(self, handler):
source_address = f'127.0.0.{random.randint(5, 255)}'
verify_address_availability(source_address)
with handler(source_address=source_address) as rh:
ws = validate_and_send(rh, Request(self.ws_base_url))
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('source_address')
assert source_address == ws.recv()
ws.close()
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
def test_response_url(self, handler):
with handler() as rh:
url = f'{self.ws_base_url}/something'
ws = validate_and_send(rh, Request(url))
ws = ws_validate_and_send(rh, Request(url))
assert ws.url == url
ws.close()
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
def test_request_headers(self, handler):
with handler(headers=HTTPHeaderDict({'test1': 'test', 'test2': 'test2'})) as rh:
# Global Headers
ws = validate_and_send(rh, Request(self.ws_base_url))
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers')
headers = HTTPHeaderDict(json.loads(ws.recv()))
assert headers['test1'] == 'test'
ws.close()
# Per request headers, merged with global
ws = validate_and_send(rh, Request(
ws = ws_validate_and_send(rh, Request(
self.ws_base_url, headers={'test2': 'changed', 'test3': 'test3'}))
ws.send('headers')
headers = HTTPHeaderDict(json.loads(ws.recv()))
@ -280,7 +283,6 @@ class TestWebsSocketRequestHandlerConformance:
'client_certificate_password': 'foobar',
}
))
@pytest.mark.parametrize('handler', ['Websockets'], indirect=True)
def test_mtls(self, handler, client_cert):
with handler(
# Disable client-side validation of unacceptable self-signed testcert.pem
@ -288,7 +290,45 @@ class TestWebsSocketRequestHandlerConformance:
verify=False,
client_cert=client_cert
) as rh:
validate_and_send(rh, Request(self.mtls_wss_base_url)).close()
ws_validate_and_send(rh, Request(self.mtls_wss_base_url)).close()
def test_request_disable_proxy(self, handler):
for proxy_proto in handler._SUPPORTED_PROXY_SCHEMES or ['ws']:
# Given handler is configured with a proxy
with handler(proxies={'ws': f'{proxy_proto}://10.255.255.255'}, timeout=5) as rh:
# When a proxy is explicitly set to None for the request
ws = ws_validate_and_send(rh, Request(self.ws_base_url, proxies={'http': None}))
# Then no proxy should be used
assert ws.status == 101
ws.close()
@pytest.mark.skip_handlers_if(
lambda _, handler: Features.NO_PROXY not in handler._SUPPORTED_FEATURES, 'handler does not support NO_PROXY')
def test_noproxy(self, handler):
for proxy_proto in handler._SUPPORTED_PROXY_SCHEMES or ['ws']:
# Given the handler is configured with a proxy
with handler(proxies={'ws': f'{proxy_proto}://10.255.255.255'}, timeout=5) as rh:
for no_proxy in (f'127.0.0.1:{self.ws_port}', '127.0.0.1', 'localhost'):
# When request no proxy includes the request url host
ws = ws_validate_and_send(rh, Request(self.ws_base_url, proxies={'no': no_proxy}))
# Then the proxy should not be used
assert ws.status == 101
ws.close()
@pytest.mark.skip_handlers_if(
lambda _, handler: Features.ALL_PROXY not in handler._SUPPORTED_FEATURES, 'handler does not support ALL_PROXY')
def test_allproxy(self, handler):
supported_proto = traverse_obj(handler._SUPPORTED_PROXY_SCHEMES, 0, default='ws')
# This is a bit of a hacky test, but it should be enough to check whether the handler is using the proxy.
# 0.1s might not be enough of a timeout if proxy is not used in all cases, but should still get failures.
with handler(proxies={'all': f'{supported_proto}://10.255.255.255'}, timeout=0.1) as rh:
with pytest.raises(TransportError):
ws_validate_and_send(rh, Request(self.ws_base_url)).close()
with handler(timeout=0.1) as rh:
with pytest.raises(TransportError):
ws_validate_and_send(
rh, Request(self.ws_base_url, proxies={'all': f'{supported_proto}://10.255.255.255'})).close()
def create_fake_ws_connection(raised):

Wyświetl plik

@ -1,7 +1,7 @@
import collections
import contextlib
import copy
import datetime
import datetime as dt
import errno
import fileinput
import http.cookiejar
@ -25,7 +25,7 @@ import unicodedata
from .cache import Cache
from .compat import functools, urllib # isort: split
from .compat import compat_os_name, compat_shlex_quote, urllib_req_to_req
from .compat import compat_os_name, urllib_req_to_req
from .cookies import LenientSimpleCookie, load_cookies
from .downloader import FFmpegFD, get_suitable_downloader, shorten_protocol_name
from .downloader.rtmp import rtmpdump_version
@ -42,6 +42,7 @@ from .networking.exceptions import (
SSLError,
network_exceptions,
)
from .networking.impersonate import ImpersonateRequestHandler
from .plugins import directories as plugin_directories
from .postprocessor import _PLUGIN_CLASSES as plugin_pps
from .postprocessor import (
@ -99,8 +100,8 @@ from .utils import (
SameFileError,
UnavailableVideoError,
UserNotLive,
YoutubeDLError,
age_restricted,
args_to_str,
bug_reports_message,
date_from_str,
deprecation_warning,
@ -139,11 +140,13 @@ from .utils import (
sanitize_filename,
sanitize_path,
sanitize_url,
shell_quote,
str_or_none,
strftime_or_none,
subtitles_filename,
supports_terminal_sequences,
system_identifier,
filesize_from_tbr,
timetuple_from_msec,
to_high_limit_path,
traverse_obj,
@ -402,6 +405,8 @@ class YoutubeDL:
- "detect_or_warn": check whether we can do anything
about it, warn otherwise (default)
source_address: Client-side IP address to bind to.
impersonate: Client to impersonate for requests.
An ImpersonateTarget (from yt_dlp.networking.impersonate)
sleep_interval_requests: Number of seconds to sleep between requests
during extraction
sleep_interval: Number of seconds to sleep before each download when
@ -476,7 +481,7 @@ class YoutubeDL:
nopart, updatetime, buffersize, ratelimit, throttledratelimit, min_filesize,
max_filesize, test, noresizebuffer, retries, file_access_retries, fragment_retries,
continuedl, xattr_set_filesize, hls_use_mpegts, http_chunk_size,
external_downloader_args, concurrent_fragment_downloads.
external_downloader_args, concurrent_fragment_downloads, progress_delta.
The following options are used by the post processors:
ffmpeg_location: Location of the ffmpeg/avconv binary; either the path
@ -713,6 +718,13 @@ class YoutubeDL:
for msg in self.params.get('_deprecation_warnings', []):
self.deprecated_feature(msg)
if impersonate_target := self.params.get('impersonate'):
if not self._impersonate_target_available(impersonate_target):
raise YoutubeDLError(
f'Impersonate target "{impersonate_target}" is not available. '
f'Use --list-impersonate-targets to see available targets. '
f'You may be missing dependencies required to support this target.')
if 'list-formats' in self.params['compat_opts']:
self.params['listformats_table'] = False
@ -811,7 +823,7 @@ class YoutubeDL:
self.report_warning(
'Long argument string detected. '
'Use -- to separate parameters and URLs, like this:\n%s' %
args_to_str(correct_argv))
shell_quote(correct_argv))
def add_info_extractor(self, ie):
"""Add an InfoExtractor object to the end of the list."""
@ -1343,7 +1355,7 @@ class YoutubeDL:
value, fmt = escapeHTML(str(value)), str_fmt
elif fmt[-1] == 'q': # quoted
value = map(str, variadic(value) if '#' in flags else [value])
value, fmt = ' '.join(map(compat_shlex_quote, value)), str_fmt
value, fmt = shell_quote(value, shell=True), str_fmt
elif fmt[-1] == 'B': # bytes
value = f'%{str_fmt}'.encode() % str(value).encode()
value, fmt = value.decode('utf-8', 'ignore'), 's'
@ -2124,6 +2136,11 @@ class YoutubeDL:
def _check_formats(self, formats):
for f in formats:
working = f.get('__working')
if working is not None:
if working:
yield f
continue
self.to_screen('[info] Testing format %s' % f['format_id'])
path = self.get_output_path('temp')
if not self._ensure_dir_exists(f'{path}/'):
@ -2140,33 +2157,44 @@ class YoutubeDL:
os.remove(temp_file.name)
except OSError:
self.report_warning('Unable to delete temporary file "%s"' % temp_file.name)
f['__working'] = success
if success:
yield f
else:
self.to_screen('[info] Unable to download format %s. Skipping...' % f['format_id'])
def _select_formats(self, formats, selector):
return list(selector({
'formats': formats,
'has_merged_format': any('none' not in (f.get('acodec'), f.get('vcodec')) for f in formats),
'incomplete_formats': (all(f.get('vcodec') == 'none' for f in formats) # No formats with video
or all(f.get('acodec') == 'none' for f in formats)), # OR, No formats with audio
}))
def _default_format_spec(self, info_dict, download=True):
download = download and not self.params.get('simulate')
prefer_best = download and (
self.params['outtmpl']['default'] == '-'
or info_dict.get('is_live') and not self.params.get('live_from_start'))
def can_merge():
merger = FFmpegMergerPP(self)
return merger.available and merger.can_merge()
prefer_best = (
not self.params.get('simulate')
and download
and (
not can_merge()
or info_dict.get('is_live') and not self.params.get('live_from_start')
or self.params['outtmpl']['default'] == '-'))
compat = (
prefer_best
or self.params.get('allow_multiple_audio_streams', False)
or 'format-spec' in self.params['compat_opts'])
if not prefer_best and download and not can_merge():
prefer_best = True
formats = self._get_formats(info_dict)
evaluate_formats = lambda spec: self._select_formats(formats, self.build_format_selector(spec))
if evaluate_formats('b/bv+ba') != evaluate_formats('bv*+ba/b'):
self.report_warning('ffmpeg not found. The downloaded format may not be the best available. '
'Installing ffmpeg is strongly recommended: https://github.com/yt-dlp/yt-dlp#dependencies')
return (
'best/bestvideo+bestaudio' if prefer_best
else 'bestvideo*+bestaudio/best' if not compat
else 'bestvideo+bestaudio/best')
compat = (self.params.get('allow_multiple_audio_streams')
or 'format-spec' in self.params['compat_opts'])
return ('best/bestvideo+bestaudio' if prefer_best
else 'bestvideo+bestaudio/best' if compat
else 'bestvideo*+bestaudio/best')
def build_format_selector(self, format_spec):
def syntax_error(note, start):
@ -2617,7 +2645,7 @@ class YoutubeDL:
# Working around out-of-range timestamp values (e.g. negative ones on Windows,
# see http://bugs.python.org/issue1646728)
with contextlib.suppress(ValueError, OverflowError, OSError):
upload_date = datetime.datetime.fromtimestamp(info_dict[ts_key], datetime.timezone.utc)
upload_date = dt.datetime.fromtimestamp(info_dict[ts_key], dt.timezone.utc)
info_dict[date_key] = upload_date.strftime('%Y%m%d')
if not info_dict.get('release_year'):
@ -2771,7 +2799,7 @@ class YoutubeDL:
get_from_start = not info_dict.get('is_live') or bool(self.params.get('live_from_start'))
if not get_from_start:
info_dict['title'] += ' ' + datetime.datetime.now().strftime('%Y-%m-%d %H:%M')
info_dict['title'] += ' ' + dt.datetime.now().strftime('%Y-%m-%d %H:%M')
if info_dict.get('is_live') and formats:
formats = [f for f in formats if bool(f.get('is_from_start')) == get_from_start]
if get_from_start and not formats:
@ -2802,6 +2830,9 @@ class YoutubeDL:
format['url'] = sanitize_url(format['url'])
if format.get('ext') is None:
format['ext'] = determine_ext(format['url']).lower()
if format['ext'] in ('aac', 'opus', 'mp3', 'flac', 'vorbis'):
if format.get('acodec') is None:
format['acodec'] = format['ext']
if format.get('protocol') is None:
format['protocol'] = determine_protocol(format)
if format.get('resolution') is None:
@ -2812,9 +2843,8 @@ class YoutubeDL:
format['aspect_ratio'] = try_call(lambda: round(format['width'] / format['height'], 2))
# For fragmented formats, "tbr" is often max bitrate and not average
if (('manifest-filesize-approx' in self.params['compat_opts'] or not format.get('manifest_url'))
and info_dict.get('duration') and format.get('tbr')
and not format.get('filesize') and not format.get('filesize_approx')):
format['filesize_approx'] = int(info_dict['duration'] * format['tbr'] * (1024 / 8))
format['filesize_approx'] = filesize_from_tbr(format.get('tbr'), info_dict.get('duration'))
format['http_headers'] = self._calc_headers(collections.ChainMap(format, info_dict), load_cookies=True)
# Safeguard against old/insecure infojson when using --load-info-json
@ -2914,12 +2944,7 @@ class YoutubeDL:
self.write_debug(f'Default format spec: {req_format}')
format_selector = self.build_format_selector(req_format)
formats_to_download = list(format_selector({
'formats': formats,
'has_merged_format': any('none' not in (f.get('acodec'), f.get('vcodec')) for f in formats),
'incomplete_formats': (all(f.get('vcodec') == 'none' for f in formats) # No formats with video
or all(f.get('acodec') == 'none' for f in formats)), # OR, No formats with audio
}))
formats_to_download = self._select_formats(formats, format_selector)
if interactive_format_selection and not formats_to_download:
self.report_error('Requested format is not available', tb=False, is_error=False)
continue
@ -3046,7 +3071,7 @@ class YoutubeDL:
f = formats[-1]
self.report_warning(
'No subtitle format found matching "%s" for language %s, '
'using %s' % (formats_query, lang, f['ext']))
'using %s. Use --list-subs for a list of available subtitles' % (formats_query, lang, f['ext']))
subs[lang] = f
return subs
@ -3864,8 +3889,8 @@ class YoutubeDL:
delim, (
format_field(f, 'filesize', ' \t%s', func=format_bytes)
or format_field(f, 'filesize_approx', '\t%s', func=format_bytes)
or format_field(try_call(lambda: format_bytes(int(info_dict['duration'] * f['tbr'] * (1024 / 8)))),
None, self._format_out('~\t%s', self.Styles.SUPPRESS))),
or format_field(filesize_from_tbr(f.get('tbr'), info_dict.get('duration')), None,
self._format_out('~\t%s', self.Styles.SUPPRESS), func=format_bytes)),
format_field(f, 'tbr', '\t%dk', func=round),
shorten_protocol_name(f.get('protocol', '')),
delim,
@ -4077,6 +4102,22 @@ class YoutubeDL:
handler = self._request_director.handlers['Urllib']
return handler._get_instance(cookiejar=self.cookiejar, proxies=self.proxies)
def _get_available_impersonate_targets(self):
# todo(future): make available as public API
return [
(target, rh.RH_NAME)
for rh in self._request_director.handlers.values()
if isinstance(rh, ImpersonateRequestHandler)
for target in rh.supported_targets
]
def _impersonate_target_available(self, target):
# todo(future): make available as public API
return any(
rh.is_supported_target(target)
for rh in self._request_director.handlers.values()
if isinstance(rh, ImpersonateRequestHandler))
def urlopen(self, req):
""" Start an HTTP download """
if isinstance(req, str):
@ -4108,9 +4149,13 @@ class YoutubeDL:
raise RequestError(
'file:// URLs are disabled by default in yt-dlp for security reasons. '
'Use --enable-file-urls to enable at your own risk.', cause=ue) from ue
if 'unsupported proxy type: "https"' in ue.msg.lower():
if (
'unsupported proxy type: "https"' in ue.msg.lower()
and 'requests' not in self._request_director.handlers
and 'curl_cffi' not in self._request_director.handlers
):
raise RequestError(
'To use an HTTPS proxy for this request, one of the following dependencies needs to be installed: requests')
'To use an HTTPS proxy for this request, one of the following dependencies needs to be installed: requests, curl_cffi')
elif (
re.match(r'unsupported url scheme: "wss?"', ue.msg.lower())
@ -4120,6 +4165,13 @@ class YoutubeDL:
'This request requires WebSocket support. '
'Ensure one of the following dependencies are installed: websockets',
cause=ue) from ue
elif re.match(r'unsupported (?:extensions: impersonate|impersonate target)', ue.msg.lower()):
raise RequestError(
f'Impersonate target "{req.extensions["impersonate"]}" is not available.'
f' See --list-impersonate-targets for available targets.'
f' This request requires browser impersonation, however you may be missing dependencies'
f' required to support this target.')
raise
except SSLError as e:
if 'UNSAFE_LEGACY_RENEGOTIATION_DISABLED' in str(e):
@ -4152,6 +4204,7 @@ class YoutubeDL:
'timeout': 'socket_timeout',
'legacy_ssl_support': 'legacyserverconnect',
'enable_file_urls': 'enable_file_urls',
'impersonate': 'impersonate',
'client_cert': {
'client_certificate': 'client_certificate',
'client_certificate_key': 'client_certificate_key',

Wyświetl plik

@ -19,6 +19,7 @@ from .cookies import SUPPORTED_BROWSERS, SUPPORTED_KEYRINGS
from .downloader.external import get_external_downloader
from .extractor import list_extractor_classes
from .extractor.adobepass import MSO_INFO
from .networking.impersonate import ImpersonateTarget
from .options import parseOpts
from .postprocessor import (
FFmpegExtractAudioPP,
@ -48,6 +49,7 @@ from .utils import (
float_or_none,
format_field,
int_or_none,
join_nonempty,
match_filter_func,
parse_bytes,
parse_duration,
@ -388,6 +390,9 @@ def validate_options(opts):
f'Supported keyrings are: {", ".join(sorted(SUPPORTED_KEYRINGS))}')
opts.cookiesfrombrowser = (browser_name, profile, keyring, container)
if opts.impersonate is not None:
opts.impersonate = ImpersonateTarget.from_str(opts.impersonate.lower())
# MetadataParser
def metadataparser_actions(f):
if isinstance(f, str):
@ -831,6 +836,7 @@ def parse_options(argv=None):
'noprogress': opts.quiet if opts.noprogress is None else opts.noprogress,
'progress_with_newline': opts.progress_with_newline,
'progress_template': opts.progress_template,
'progress_delta': opts.progress_delta,
'playliststart': opts.playliststart,
'playlistend': opts.playlistend,
'playlistreverse': opts.playlist_reverse,
@ -911,6 +917,7 @@ def parse_options(argv=None):
'postprocessors': postprocessors,
'fixup': opts.fixup,
'source_address': opts.source_address,
'impersonate': opts.impersonate,
'call_home': opts.call_home,
'sleep_interval_requests': opts.sleep_interval_requests,
'sleep_interval': opts.sleep_interval,
@ -980,6 +987,41 @@ def _real_main(argv=None):
traceback.print_exc()
ydl._download_retcode = 100
if opts.list_impersonate_targets:
known_targets = [
# List of simplified targets we know are supported,
# to help users know what dependencies may be required.
(ImpersonateTarget('chrome'), 'curl_cffi'),
(ImpersonateTarget('edge'), 'curl_cffi'),
(ImpersonateTarget('safari'), 'curl_cffi'),
]
available_targets = ydl._get_available_impersonate_targets()
def make_row(target, handler):
return [
join_nonempty(target.client.title(), target.version, delim='-') or '-',
join_nonempty((target.os or "").title(), target.os_version, delim='-') or '-',
handler,
]
rows = [make_row(target, handler) for target, handler in available_targets]
for known_target, known_handler in known_targets:
if not any(
known_target in target and handler == known_handler
for target, handler in available_targets
):
rows.append([
ydl._format_out(text, ydl.Styles.SUPPRESS)
for text in make_row(known_target, f'{known_handler} (not available)')
])
ydl.to_screen('[info] Available impersonate targets')
ydl.to_stdout(render_table(['Client', 'OS', 'Source'], rows, extra_gap=2, delim='-'))
return
if not actual_use:
if pre_process:
return ydl._download_retcode

Wyświetl plik

@ -1,6 +1,6 @@
import sys
from PyInstaller.utils.hooks import collect_submodules
from PyInstaller.utils.hooks import collect_submodules, collect_data_files
def pycryptodome_module():
@ -25,10 +25,12 @@ def get_hidden_imports():
for module in ('websockets', 'requests', 'urllib3'):
yield from collect_submodules(module)
# These are auto-detected, but explicitly add them just in case
yield from ('mutagen', 'brotli', 'certifi', 'secretstorage')
yield from ('mutagen', 'brotli', 'certifi', 'secretstorage', 'curl_cffi')
hiddenimports = list(get_hidden_imports())
print(f'Adding imports: {hiddenimports}')
excludedimports = ['youtube_dl', 'youtube_dlc', 'test', 'ytdlp_plugins', 'devscripts', 'bundle']
datas = collect_data_files('curl_cffi', includes=['cacert.pem'])

Wyświetl plik

@ -27,12 +27,9 @@ def compat_etree_fromstring(text):
compat_os_name = os._name if os.name == 'java' else os.name
if compat_os_name == 'nt':
def compat_shlex_quote(s):
import re
return s if re.match(r'^[-_\w./]+$', s) else s.replace('"', '""').join('""')
else:
from shlex import quote as compat_shlex_quote # noqa: F401
def compat_shlex_quote(s):
from ..utils import shell_quote
return shell_quote(s)
def compat_ord(c):

Wyświetl plik

@ -1,6 +1,7 @@
import base64
import collections
import contextlib
import datetime as dt
import glob
import http.cookiejar
import http.cookies
@ -15,7 +16,6 @@ import sys
import tempfile
import time
import urllib.request
from datetime import datetime, timedelta, timezone
from enum import Enum, auto
from hashlib import pbkdf2_hmac
@ -194,7 +194,11 @@ def _firefox_browser_dirs():
yield os.path.expanduser('~/Library/Application Support/Firefox/Profiles')
else:
yield from map(os.path.expanduser, ('~/.mozilla/firefox', '~/snap/firefox/common/.mozilla/firefox'))
yield from map(os.path.expanduser, (
'~/.mozilla/firefox',
'~/snap/firefox/common/.mozilla/firefox',
'~/.var/app/org.mozilla.firefox/.mozilla/firefox',
))
def _firefox_cookie_dbs(roots):
@ -343,6 +347,11 @@ def _process_chrome_cookie(decryptor, host_key, name, value, encrypted_value, pa
if value is None:
return is_encrypted, None
# In chrome, session cookies have expires_utc set to 0
# In our cookie-store, cookies that do not expire should have expires set to None
if not expires_utc:
expires_utc = None
return is_encrypted, http.cookiejar.Cookie(
version=0, name=name, value=value, port=None, port_specified=False,
domain=host_key, domain_specified=bool(host_key), domain_initial_dot=host_key.startswith('.'),
@ -594,7 +603,7 @@ class DataParser:
def _mac_absolute_time_to_posix(timestamp):
return int((datetime(2001, 1, 1, 0, 0, tzinfo=timezone.utc) + timedelta(seconds=timestamp)).timestamp())
return int((dt.datetime(2001, 1, 1, 0, 0, tzinfo=dt.timezone.utc) + dt.timedelta(seconds=timestamp)).timestamp())
def _parse_safari_cookies_header(data, logger):

Wyświetl plik

@ -74,6 +74,10 @@ else:
if hasattr(xattr, 'set'): # pyxattr
xattr._yt_dlp__identifier = 'pyxattr'
try:
import curl_cffi
except ImportError:
curl_cffi = None
from . import Cryptodome

Wyświetl plik

@ -4,6 +4,7 @@ import functools
import os
import random
import re
import threading
import time
from ..minicurses import (
@ -63,6 +64,7 @@ class FileDownloader:
min_filesize: Skip files smaller than this size
max_filesize: Skip files larger than this size
xattr_set_filesize: Set ytdl.filesize user xattribute with expected size.
progress_delta: The minimum time between progress output, in seconds
external_downloader_args: A dictionary of downloader keys (in lower case)
and a list of additional command-line arguments for the
executable. Use 'default' as the name for arguments to be
@ -88,6 +90,9 @@ class FileDownloader:
self.params = params
self._prepare_multiline_status()
self.add_progress_hook(self.report_progress)
if self.params.get('progress_delta'):
self._progress_delta_lock = threading.Lock()
self._progress_delta_time = time.monotonic()
def _set_ydl(self, ydl):
self.ydl = ydl
@ -366,6 +371,12 @@ class FileDownloader:
if s['status'] != 'downloading':
return
if update_delta := self.params.get('progress_delta'):
with self._progress_delta_lock:
if time.monotonic() < self._progress_delta_time:
return
self._progress_delta_time += update_delta
s.update({
'_eta_str': self.format_eta(s.get('eta')).strip(),
'_speed_str': self.format_speed(s.get('speed')),

Wyświetl plik

@ -491,7 +491,7 @@ class FFmpegFD(ExternalFD):
if not self.params.get('verbose'):
args += ['-hide_banner']
args += traverse_obj(info_dict, ('downloader_options', 'ffmpeg_args'), default=[])
args += traverse_obj(info_dict, ('downloader_options', 'ffmpeg_args', ...))
# These exists only for compatibility. Extractors should use
# info_dict['downloader_options']['ffmpeg_args'] instead
@ -615,6 +615,8 @@ class FFmpegFD(ExternalFD):
else:
args += ['-f', EXT_TO_OUT_FORMATS.get(ext, ext)]
args += traverse_obj(info_dict, ('downloader_options', 'ffmpeg_args_out', ...))
args += self._configuration_args(('_o1', '_o', ''))
args = [encodeArgument(opt) for opt in args]

Wyświetl plik

@ -150,6 +150,7 @@ from .arte import (
)
from .arnes import ArnesIE
from .asobichannel import AsobiChannelIE, AsobiChannelTagURLIE
from .asobistage import AsobiStageIE
from .atresplayer import AtresPlayerIE
from .atscaleconf import AtScaleConfEventIE
from .atvat import ATVAtIE
@ -386,7 +387,11 @@ from .comedycentral import (
ComedyCentralIE,
ComedyCentralTVIE,
)
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonmistakes import (
BlobIE,
CommonMistakesIE,
UnicodeBOMIE,
)
from .commonprotocols import (
MmsIE,
RtmpIE,
@ -590,6 +595,7 @@ from .facebook import (
FacebookReelIE,
FacebookAdsIE,
)
from .fathom import FathomIE
from .fancode import (
FancodeVodIE,
FancodeLiveIE
@ -874,6 +880,7 @@ from .jeuxvideo import JeuxVideoIE
from .jiosaavn import (
JioSaavnSongIE,
JioSaavnAlbumIE,
JioSaavnPlaylistIE,
)
from .jove import JoveIE
from .joj import JojIE
@ -989,6 +996,10 @@ from .lnkgo import (
LnkGoIE,
LnkIE,
)
from .loom import (
LoomIE,
LoomFolderIE,
)
from .lovehomeporn import LoveHomePornIE
from .lrt import (
LRTVODIE,
@ -1750,6 +1761,7 @@ from .shahid import (
ShahidIE,
ShahidShowIE,
)
from .sharepoint import SharePointIE
from .sharevideos import ShareVideosEmbedIE
from .sibnet import SibnetEmbedIE
from .shemaroome import ShemarooMeIE
@ -2283,6 +2295,7 @@ from .vrt import (
VrtNUIE,
KetnetIE,
DagelijkseKostIE,
Radio1BeIE,
)
from .vtm import VTMIE
from .medialaan import MedialaanIE

Wyświetl plik

@ -1,25 +1,65 @@
import functools
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
OnDemandPagedList,
date_from_str,
UserNotLive,
determine_ext,
filter_dict,
int_or_none,
qualities,
traverse_obj,
unified_strdate,
orderedSet,
unified_timestamp,
update_url_query,
url_or_none,
urlencode_postdata,
xpath_text,
urljoin,
)
from ..utils.traversal import traverse_obj
class AfreecaTVIE(InfoExtractor):
class AfreecaTVBaseIE(InfoExtractor):
_NETRC_MACHINE = 'afreecatv'
def _perform_login(self, username, password):
login_form = {
'szWork': 'login',
'szType': 'json',
'szUid': username,
'szPassword': password,
'isSaveId': 'false',
'szScriptVar': 'oLoginRet',
'szAction': '',
}
response = self._download_json(
'https://login.afreecatv.com/app/LoginAction.php', None,
'Logging in', data=urlencode_postdata(login_form))
_ERRORS = {
-4: 'Your account has been suspended due to a violation of our terms and policies.',
-5: 'https://member.afreecatv.com/app/user_delete_progress.php',
-6: 'https://login.afreecatv.com/membership/changeMember.php',
-8: "Hello! AfreecaTV here.\nThe username you have entered belongs to \n an account that requires a legal guardian's consent. \nIf you wish to use our services without restriction, \nplease make sure to go through the necessary verification process.",
-9: 'https://member.afreecatv.com/app/pop_login_block.php',
-11: 'https://login.afreecatv.com/afreeca/second_login.php',
-12: 'https://member.afreecatv.com/app/user_security.php',
0: 'The username does not exist or you have entered the wrong password.',
-1: 'The username does not exist or you have entered the wrong password.',
-3: 'You have entered your username/password incorrectly.',
-7: 'You cannot use your Global AfreecaTV account to access Korean AfreecaTV.',
-10: 'Sorry for the inconvenience. \nYour account has been blocked due to an unauthorized access. \nPlease contact our Help Center for assistance.',
-32008: 'You have failed to log in. Please contact our Help Center.',
}
result = int_or_none(response.get('RESULT'))
if result != 1:
error = _ERRORS.get(result, 'You have failed to log in.')
raise ExtractorError(
'Unable to login: %s said: %s' % (self.IE_NAME, error),
expected=True)
class AfreecaTVIE(AfreecaTVBaseIE):
IE_NAME = 'afreecatv'
IE_DESC = 'afreecatv.com'
_VALID_URL = r'''(?x)
@ -34,7 +74,6 @@ class AfreecaTVIE(InfoExtractor):
)
(?P<id>\d+)
'''
_NETRC_MACHINE = 'afreecatv'
_TESTS = [{
'url': 'http://live.afreecatv.com:8079/app/index.cgi?szType=read_ucc_bbs&szBjId=dailyapril&nStationNo=16711924&nBbsNo=18605867&nTitleNo=36164052&szSkin=',
'md5': 'f72c89fe7ecc14c1b5ce506c4996046e',
@ -87,6 +126,7 @@ class AfreecaTVIE(InfoExtractor):
'uploader': '♥이슬이',
'uploader_id': 'dasl8121',
'upload_date': '20170411',
'timestamp': 1491929865,
'duration': 213,
},
'params': {
@ -120,219 +160,102 @@ class AfreecaTVIE(InfoExtractor):
'uploader_id': 'rlantnghks',
'uploader': '페이즈으',
'duration': 10840,
'thumbnail': 'http://videoimg.afreecatv.com/php/SnapshotLoad.php?rowKey=20230108_9FF5BEE1_244432674_1_r',
'thumbnail': r're:https?://videoimg\.afreecatv\.com/.+',
'upload_date': '20230108',
'timestamp': 1673218805,
'title': '젠지 페이즈',
},
'params': {
'skip_download': True,
},
}, {
# adult content
'url': 'https://vod.afreecatv.com/player/70395877',
'only_matching': True,
}, {
# subscribers only
'url': 'https://vod.afreecatv.com/player/104647403',
'only_matching': True,
}, {
# private
'url': 'https://vod.afreecatv.com/player/81669846',
'only_matching': True,
}]
@staticmethod
def parse_video_key(key):
video_key = {}
m = re.match(r'^(?P<upload_date>\d{8})_\w+_(?P<part>\d+)$', key)
if m:
video_key['upload_date'] = m.group('upload_date')
video_key['part'] = int(m.group('part'))
return video_key
def _perform_login(self, username, password):
login_form = {
'szWork': 'login',
'szType': 'json',
'szUid': username,
'szPassword': password,
'isSaveId': 'false',
'szScriptVar': 'oLoginRet',
'szAction': '',
}
response = self._download_json(
'https://login.afreecatv.com/app/LoginAction.php', None,
'Logging in', data=urlencode_postdata(login_form))
_ERRORS = {
-4: 'Your account has been suspended due to a violation of our terms and policies.',
-5: 'https://member.afreecatv.com/app/user_delete_progress.php',
-6: 'https://login.afreecatv.com/membership/changeMember.php',
-8: "Hello! AfreecaTV here.\nThe username you have entered belongs to \n an account that requires a legal guardian's consent. \nIf you wish to use our services without restriction, \nplease make sure to go through the necessary verification process.",
-9: 'https://member.afreecatv.com/app/pop_login_block.php',
-11: 'https://login.afreecatv.com/afreeca/second_login.php',
-12: 'https://member.afreecatv.com/app/user_security.php',
0: 'The username does not exist or you have entered the wrong password.',
-1: 'The username does not exist or you have entered the wrong password.',
-3: 'You have entered your username/password incorrectly.',
-7: 'You cannot use your Global AfreecaTV account to access Korean AfreecaTV.',
-10: 'Sorry for the inconvenience. \nYour account has been blocked due to an unauthorized access. \nPlease contact our Help Center for assistance.',
-32008: 'You have failed to log in. Please contact our Help Center.',
}
result = int_or_none(response.get('RESULT'))
if result != 1:
error = _ERRORS.get(result, 'You have failed to log in.')
raise ExtractorError(
'Unable to login: %s said: %s' % (self.IE_NAME, error),
expected=True)
def _real_extract(self, url):
video_id = self._match_id(url)
partial_view = False
adult_view = False
for _ in range(2):
data = self._download_json(
'https://api.m.afreecatv.com/station/video/a/view',
video_id, headers={'Referer': url}, data=urlencode_postdata({
'nTitleNo': video_id,
'nApiLevel': 10,
}))['data']
if traverse_obj(data, ('code', {int})) == -6221:
raise ExtractorError('The VOD does not exist', expected=True)
query = {
data = self._download_json(
'https://api.m.afreecatv.com/station/video/a/view', video_id,
headers={'Referer': url}, data=urlencode_postdata({
'nTitleNo': video_id,
'nStationNo': data['station_no'],
'nBbsNo': data['bbs_no'],
}
if partial_view:
query['partialView'] = 'SKIP_ADULT'
if adult_view:
query['adultView'] = 'ADULT_VIEW'
video_xml = self._download_xml(
'http://afbbs.afreecatv.com:8080/api/video/get_video_info.php',
video_id, 'Downloading video info XML%s'
% (' (skipping adult)' if partial_view else ''),
video_id, headers={
'Referer': url,
}, query=query)
'nApiLevel': 10,
}))['data']
flag = xpath_text(video_xml, './track/flag', 'flag', default=None)
if flag and flag == 'SUCCEED':
break
if flag == 'PARTIAL_ADULT':
self.report_warning(
'In accordance with local laws and regulations, underage users are restricted from watching adult content. '
'Only content suitable for all ages will be downloaded. '
'Provide account credentials if you wish to download restricted content.')
partial_view = True
continue
elif flag == 'ADULT':
if not adult_view:
adult_view = True
continue
error = 'Only users older than 19 are able to watch this video. Provide account credentials to download this content.'
else:
error = flag
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error), expected=True)
else:
raise ExtractorError('Unable to download video info')
error_code = traverse_obj(data, ('code', {int}))
if error_code == -6221:
raise ExtractorError('The VOD does not exist', expected=True)
elif error_code == -6205:
raise ExtractorError('This VOD is private', expected=True)
video_element = video_xml.findall('./track/video')[-1]
if video_element is None or video_element.text is None:
raise ExtractorError(
'Video %s does not exist' % video_id, expected=True)
video_url = video_element.text.strip()
title = xpath_text(video_xml, './track/title', 'title', fatal=True)
uploader = xpath_text(video_xml, './track/nickname', 'uploader')
uploader_id = xpath_text(video_xml, './track/bj_id', 'uploader id')
duration = int_or_none(xpath_text(
video_xml, './track/duration', 'duration'))
thumbnail = xpath_text(video_xml, './track/titleImage', 'thumbnail')
common_entry = {
'uploader': uploader,
'uploader_id': uploader_id,
'thumbnail': thumbnail,
}
info = common_entry.copy()
info.update({
'id': video_id,
'title': title,
'duration': duration,
common_info = traverse_obj(data, {
'title': ('title', {str}),
'uploader': ('writer_nick', {str}),
'uploader_id': ('bj_id', {str}),
'duration': ('total_file_duration', {functools.partial(int_or_none, scale=1000)}),
'thumbnail': ('thumb', {url_or_none}),
})
if not video_url:
entries = []
file_elements = video_element.findall('./file')
one = len(file_elements) == 1
for file_num, file_element in enumerate(file_elements, start=1):
file_url = url_or_none(file_element.text)
if not file_url:
continue
key = file_element.get('key', '')
upload_date = unified_strdate(self._search_regex(
r'^(\d{8})_', key, 'upload date', default=None))
if upload_date is not None:
# sometimes the upload date isn't included in the file name
# instead, another random ID is, which may parse as a valid
# date but be wildly out of a reasonable range
parsed_date = date_from_str(upload_date)
if parsed_date.year < 2000 or parsed_date.year >= 2100:
upload_date = None
file_duration = int_or_none(file_element.get('duration'))
format_id = key if key else '%s_%s' % (video_id, file_num)
if determine_ext(file_url) == 'm3u8':
formats = self._extract_m3u8_formats(
file_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls',
note='Downloading part %d m3u8 information' % file_num)
else:
formats = [{
'url': file_url,
'format_id': 'http',
}]
if not formats and not self.get_param('ignore_no_formats'):
continue
file_info = common_entry.copy()
file_info.update({
'id': format_id,
'title': title if one else '%s (part %d)' % (title, file_num),
'upload_date': upload_date,
'duration': file_duration,
'formats': formats,
entries = []
for file_num, file_element in enumerate(
traverse_obj(data, ('files', lambda _, v: url_or_none(v['file']))), start=1):
file_url = file_element['file']
if determine_ext(file_url) == 'm3u8':
formats = self._extract_m3u8_formats(
file_url, video_id, 'mp4', m3u8_id='hls',
note=f'Downloading part {file_num} m3u8 information')
else:
formats = [{
'url': file_url,
'format_id': 'http',
}]
entries.append({
**common_info,
'id': file_element.get('file_info_key') or f'{video_id}_{file_num}',
'title': f'{common_info.get("title") or "Untitled"} (part {file_num})',
'formats': formats,
**traverse_obj(file_element, {
'duration': ('duration', {functools.partial(int_or_none, scale=1000)}),
'timestamp': ('file_start', {unified_timestamp}),
})
entries.append(file_info)
entries_info = info.copy()
entries_info.update({
'_type': 'multi_video',
'entries': entries,
})
return entries_info
info = {
'id': video_id,
'title': title,
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,
'thumbnail': thumbnail,
}
if determine_ext(video_url) == 'm3u8':
info['formats'] = self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
else:
app, playpath = video_url.split('mp4:')
info.update({
'url': app,
'ext': 'flv',
'play_path': 'mp4:' + playpath,
'rtmp_live': True, # downloading won't end without this
})
return info
if traverse_obj(data, ('adult_status', {str})) == 'notLogin':
if not entries:
self.raise_login_required(
'Only users older than 19 are able to watch this video', method='password')
self.report_warning(
'In accordance with local laws and regulations, underage users are '
'restricted from watching adult content. Only content suitable for all '
f'ages will be downloaded. {self._login_hint("password")}')
if not entries and traverse_obj(data, ('sub_upload_type', {str})):
self.raise_login_required('This VOD is for subscribers only', method='password')
if len(entries) == 1:
return {
**entries[0],
'title': common_info.get('title'),
}
common_info['timestamp'] = traverse_obj(entries, (..., 'timestamp'), get_all=False)
return self.playlist_result(entries, video_id, multi_video=True, **common_info)
class AfreecaTVLiveIE(AfreecaTVIE): # XXX: Do not subclass from concrete IE
class AfreecaTVLiveIE(AfreecaTVBaseIE):
IE_NAME = 'afreecatv:live'
IE_DESC = 'afreecatv.com livestreams'
_VALID_URL = r'https?://play\.afreeca(?:tv)?\.com/(?P<id>[^/]+)(?:/(?P<bno>\d+))?'
_TESTS = [{
'url': 'https://play.afreecatv.com/pyh3646/237852185',
@ -347,77 +270,97 @@ class AfreecaTVLiveIE(AfreecaTVIE): # XXX: Do not subclass from concrete IE
},
'skip': 'Livestream has ended',
}, {
'url': 'http://play.afreeca.com/pyh3646/237852185',
'url': 'https://play.afreecatv.com/pyh3646/237852185',
'only_matching': True,
}, {
'url': 'http://play.afreeca.com/pyh3646',
'url': 'https://play.afreecatv.com/pyh3646',
'only_matching': True,
}]
_LIVE_API_URL = 'https://live.afreecatv.com/afreeca/player_live_api.php'
_WORKING_CDNS = [
'gcp_cdn', # live-global-cdn-v02.afreecatv.com
'gs_cdn_pc_app', # pc-app.stream.afreecatv.com
'gs_cdn_mobile_web', # mobile-web.stream.afreecatv.com
'gs_cdn_pc_web', # pc-web.stream.afreecatv.com
]
_BAD_CDNS = [
'gs_cdn', # chromecast.afreeca.gscdn.com (cannot resolve)
'gs_cdn_chromecast', # chromecast.stream.afreecatv.com (HTTP Error 400)
'azure_cdn', # live-global-cdn-v01.afreecatv.com (cannot resolve)
'aws_cf', # live-global-cdn-v03.afreecatv.com (cannot resolve)
'kt_cdn', # kt.stream.afreecatv.com (HTTP Error 400)
]
_QUALITIES = ('sd', 'hd', 'hd2k', 'original')
def _extract_formats(self, channel_info, broadcast_no, aid):
stream_base_url = channel_info.get('RMD') or 'https://livestream-manager.afreecatv.com'
# If user has not passed CDN IDs, try API-provided CDN ID followed by other working CDN IDs
default_cdn_ids = orderedSet([
*traverse_obj(channel_info, ('CDN', {str}, all, lambda _, v: v not in self._BAD_CDNS)),
*self._WORKING_CDNS,
])
cdn_ids = self._configuration_arg('cdn', default_cdn_ids)
for attempt, cdn_id in enumerate(cdn_ids, start=1):
m3u8_url = traverse_obj(self._download_json(
urljoin(stream_base_url, 'broad_stream_assign.html'), broadcast_no,
f'Downloading {cdn_id} stream info', f'Unable to download {cdn_id} stream info',
fatal=False, query={
'return_type': cdn_id,
'broad_key': f'{broadcast_no}-common-master-hls',
}), ('view_url', {url_or_none}))
try:
return self._extract_m3u8_formats(
m3u8_url, broadcast_no, 'mp4', m3u8_id='hls', query={'aid': aid},
headers={'Referer': 'https://play.afreecatv.com/'})
except ExtractorError as e:
if attempt == len(cdn_ids):
raise
self.report_warning(
f'{e.cause or e.msg}. Retrying... (attempt {attempt} of {len(cdn_ids)})')
def _real_extract(self, url):
broadcaster_id, broadcast_no = self._match_valid_url(url).group('id', 'bno')
password = self.get_param('videopassword')
channel_info = traverse_obj(self._download_json(
self._LIVE_API_URL, broadcaster_id, data=urlencode_postdata({'bid': broadcaster_id})),
('CHANNEL', {dict})) or {}
info = self._download_json(self._LIVE_API_URL, broadcaster_id, fatal=False,
data=urlencode_postdata({'bid': broadcaster_id})) or {}
channel_info = info.get('CHANNEL') or {}
broadcaster_id = channel_info.get('BJID') or broadcaster_id
broadcast_no = channel_info.get('BNO') or broadcast_no
password_protected = channel_info.get('BPWD')
if not broadcast_no:
raise ExtractorError(f'Unable to extract broadcast number ({broadcaster_id} may not be live)', expected=True)
if password_protected == 'Y' and password is None:
raise UserNotLive(video_id=broadcaster_id)
password = self.get_param('videopassword')
if channel_info.get('BPWD') == 'Y' and password is None:
raise ExtractorError(
'This livestream is protected by a password, use the --video-password option',
expected=True)
formats = []
quality_key = qualities(self._QUALITIES)
for quality_str in self._QUALITIES:
params = {
token_info = traverse_obj(self._download_json(
self._LIVE_API_URL, broadcast_no, 'Downloading access token for stream',
'Unable to download access token for stream', data=urlencode_postdata(filter_dict({
'bno': broadcast_no,
'stream_type': 'common',
'type': 'aid',
'quality': quality_str,
}
if password is not None:
params['pwd'] = password
aid_response = self._download_json(
self._LIVE_API_URL, broadcast_no, fatal=False,
data=urlencode_postdata(params),
note=f'Downloading access token for {quality_str} stream',
errnote=f'Unable to download access token for {quality_str} stream')
aid = traverse_obj(aid_response, ('CHANNEL', 'AID'))
if not aid:
continue
'quality': 'master',
'pwd': password,
}))), ('CHANNEL', {dict})) or {}
aid = token_info.get('AID')
if not aid:
result = token_info.get('RESULT')
if result == 0:
raise ExtractorError('This livestream has ended', expected=True)
elif result == -6:
self.raise_login_required('This livestream is for subscribers only', method='password')
raise ExtractorError('Unable to extract access token')
stream_base_url = channel_info.get('RMD') or 'https://livestream-manager.afreecatv.com'
stream_info = self._download_json(
f'{stream_base_url}/broad_stream_assign.html', broadcast_no, fatal=False,
query={
'return_type': channel_info.get('CDN', 'gcp_cdn'),
'broad_key': f'{broadcast_no}-common-{quality_str}-hls',
},
note=f'Downloading metadata for {quality_str} stream',
errnote=f'Unable to download metadata for {quality_str} stream') or {}
formats = self._extract_formats(channel_info, broadcast_no, aid)
if stream_info.get('view_url'):
formats.append({
'format_id': quality_str,
'url': update_url_query(stream_info['view_url'], {'aid': aid}),
'ext': 'mp4',
'protocol': 'm3u8',
'quality': quality_key(quality_str),
})
station_info = self._download_json(
station_info = traverse_obj(self._download_json(
'https://st.afreecatv.com/api/get_station_status.php', broadcast_no,
query={'szBjId': broadcaster_id}, fatal=False,
note='Downloading channel metadata', errnote='Unable to download channel metadata') or {}
'Downloading channel metadata', 'Unable to download channel metadata',
query={'szBjId': broadcaster_id}, fatal=False), {dict}) or {}
return {
'id': broadcast_no,
@ -427,6 +370,7 @@ class AfreecaTVLiveIE(AfreecaTVIE): # XXX: Do not subclass from concrete IE
'timestamp': unified_timestamp(station_info.get('broad_start')),
'formats': formats,
'is_live': True,
'http_headers': {'Referer': url},
}

Wyświetl plik

@ -39,7 +39,7 @@ class AluraIE(InfoExtractor):
def _real_extract(self, url):
course, video_id = self._match_valid_url(url)
course, video_id = self._match_valid_url(url).group('course_name', 'id')
video_url = self._VIDEO_URL % (course, video_id)
video_dict = self._download_json(video_url, video_id, 'Searching for videos')
@ -52,7 +52,7 @@ class AluraIE(InfoExtractor):
formats = []
for video_obj in video_dict:
video_url_m3u8 = video_obj.get('link')
video_url_m3u8 = video_obj.get('mp4')
video_format = self._extract_m3u8_formats(
video_url_m3u8, None, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)

Wyświetl plik

@ -1,5 +1,5 @@
import functools
import re
from functools import partial
from .common import InfoExtractor
from ..utils import (
@ -349,7 +349,7 @@ class ARDBetaMediathekIE(InfoExtractor):
r'(?P<title>.*)',
]
return traverse_obj(patterns, (..., {partial(re.match, string=title)}, {
return traverse_obj(patterns, (..., {functools.partial(re.match, string=title)}, {
'season_number': ('season_number', {int_or_none}),
'episode_number': ('episode_number', {int_or_none}),
'episode': ((

Wyświetl plik

@ -0,0 +1,154 @@
import functools
from .common import InfoExtractor
from ..utils import str_or_none, url_or_none
from ..utils.traversal import traverse_obj
class AsobiStageIE(InfoExtractor):
IE_DESC = 'ASOBISTAGE (アソビステージ)'
_VALID_URL = r'https?://asobistage\.asobistore\.jp/event/(?P<id>(?P<event>\w+)/(?P<type>archive|player)/(?P<slug>\w+))(?:[?#]|$)'
_TESTS = [{
'url': 'https://asobistage.asobistore.jp/event/315passionhour_2022summer/archive/frame',
'info_dict': {
'id': '315passionhour_2022summer/archive/frame',
'title': '315プロダクションプレゼンツ 315パッションアワー!!!',
'thumbnail': r're:^https?://[\w.-]+/\w+/\w+',
},
'playlist_count': 1,
'playlist': [{
'info_dict': {
'id': 'edff52f2',
'ext': 'mp4',
'title': '315passion_FRAME_only',
'thumbnail': r're:^https?://[\w.-]+/\w+/\w+',
},
}],
}, {
'url': 'https://asobistage.asobistore.jp/event/idolmaster_idolworld2023_goods/archive/live',
'info_dict': {
'id': 'idolmaster_idolworld2023_goods/archive/live',
'title': 'md5:378510b6e830129d505885908bd6c576',
'thumbnail': r're:^https?://[\w.-]+/\w+/\w+',
},
'playlist_count': 1,
'playlist': [{
'info_dict': {
'id': '3aef7110',
'ext': 'mp4',
'title': 'asobistore_station_1020_serverREC',
'thumbnail': r're:^https?://[\w.-]+/\w+/\w+',
},
}],
}, {
'url': 'https://asobistage.asobistore.jp/event/sidem_fclive_bpct/archive/premium_hc',
'playlist_count': 4,
'info_dict': {
'id': 'sidem_fclive_bpct/archive/premium_hc',
'title': '315 Production presents FNTASTIC COMBINATION LIVE BRAINPOWER!!/CONNECTIME!!!!',
'thumbnail': r're:^https?://[\w.-]+/\w+/\w+',
},
}, {
'url': 'https://asobistage.asobistore.jp/event/ijigenfes_utagassen/player/day1',
'only_matching': True,
}]
_API_HOST = 'https://asobistage-api.asobistore.jp'
_HEADERS = {}
_is_logged_in = False
@functools.cached_property
def _owned_tickets(self):
owned_tickets = set()
if not self._is_logged_in:
return owned_tickets
for path, name in [
('api/v1/purchase_history/list', 'ticket purchase history'),
('api/v1/serialcode/list', 'redemption history'),
]:
response = self._download_json(
f'{self._API_HOST}/{path}', None, f'Downloading {name}',
f'Unable to download {name}', expected_status=400)
if traverse_obj(response, ('payload', 'error_message'), 'error') == 'notlogin':
self._is_logged_in = False
break
owned_tickets.update(
traverse_obj(response, ('payload', 'value', ..., 'digital_product_id', {str_or_none})))
return owned_tickets
def _get_available_channel_id(self, channel):
channel_id = traverse_obj(channel, ('chennel_vspf_id', {str}))
if not channel_id:
return None
# if rights_type_id == 6, then 'No conditions (no login required - non-members are OK)'
if traverse_obj(channel, ('viewrights', lambda _, v: v['rights_type_id'] == 6)):
return channel_id
available_tickets = traverse_obj(channel, (
'viewrights', ..., ('tickets', 'serialcodes'), ..., 'digital_product_id', {str_or_none}))
if not self._owned_tickets.intersection(available_tickets):
self.report_warning(
f'You are not a ticketholder for "{channel.get("channel_name") or channel_id}"')
return None
return channel_id
def _real_initialize(self):
if self._get_cookies(self._API_HOST):
self._is_logged_in = True
token = self._download_json(
f'{self._API_HOST}/api/v1/vspf/token', None, 'Getting token', 'Unable to get token')
self._HEADERS['Authorization'] = f'Bearer {token}'
def _real_extract(self, url):
video_id, event, type_, slug = self._match_valid_url(url).group('id', 'event', 'type', 'slug')
video_type = {'archive': 'archives', 'player': 'broadcasts'}[type_]
webpage = self._download_webpage(url, video_id)
event_data = traverse_obj(
self._search_nextjs_data(webpage, video_id, default={}),
('props', 'pageProps', 'eventCMSData', {
'title': ('event_name', {str}),
'thumbnail': ('event_thumbnail_image', {url_or_none}),
}))
available_channels = traverse_obj(self._download_json(
f'https://asobistage.asobistore.jp/cdn/v101/events/{event}/{video_type}.json',
video_id, 'Getting channel list', 'Unable to get channel list'), (
video_type, lambda _, v: v['broadcast_slug'] == slug,
'channels', lambda _, v: v['chennel_vspf_id'] != '00000'))
entries = []
for channel_id in traverse_obj(available_channels, (..., {self._get_available_channel_id})):
if video_type == 'archives':
channel_json = self._download_json(
f'https://survapi.channel.or.jp/proxy/v1/contents/{channel_id}/get_by_cuid', channel_id,
'Getting archive channel info', 'Unable to get archive channel info', fatal=False,
headers=self._HEADERS)
channel_data = traverse_obj(channel_json, ('ex_content', {
'm3u8_url': 'streaming_url',
'title': 'title',
'thumbnail': ('thumbnail', 'url'),
}))
else: # video_type == 'broadcasts'
channel_json = self._download_json(
f'https://survapi.channel.or.jp/ex/events/{channel_id}', channel_id,
'Getting live channel info', 'Unable to get live channel info', fatal=False,
headers=self._HEADERS, query={'embed': 'channel'})
channel_data = traverse_obj(channel_json, ('data', {
'm3u8_url': ('Channel', 'Custom_live_url'),
'title': 'Name',
'thumbnail': 'Poster_url',
}))
entries.append({
'id': channel_id,
'title': channel_data.get('title'),
'formats': self._extract_m3u8_formats(channel_data.get('m3u8_url'), channel_id, fatal=False),
'is_live': video_type == 'broadcasts',
'thumbnail': url_or_none(channel_data.get('thumbnail')),
})
if not self._is_logged_in and not entries:
self.raise_login_required()
return self.playlist_result(entries, video_id, **event_data)

Wyświetl plik

@ -1,4 +1,4 @@
import datetime
import datetime as dt
from .common import InfoExtractor
from ..utils import (
@ -71,9 +71,9 @@ class ATVAtIE(InfoExtractor):
content_ids = [{'id': id, 'subclip_start': content['start'], 'subclip_end': content['end']}
for id, content in enumerate(contentResource)]
time_of_request = datetime.datetime.now()
not_before = time_of_request - datetime.timedelta(minutes=5)
expire = time_of_request + datetime.timedelta(minutes=5)
time_of_request = dt.datetime.now()
not_before = time_of_request - dt.timedelta(minutes=5)
expire = time_of_request + dt.timedelta(minutes=5)
payload = {
'content_ids': {
content_id: content_ids,

Wyświetl plik

@ -1,4 +1,4 @@
import datetime
import datetime as dt
import hashlib
import hmac
@ -12,7 +12,7 @@ class AWSIE(InfoExtractor): # XXX: Conventionally, base classes should end with
def _aws_execute_api(self, aws_dict, video_id, query=None):
query = query or {}
amz_date = datetime.datetime.now(datetime.timezone.utc).strftime('%Y%m%dT%H%M%SZ')
amz_date = dt.datetime.now(dt.timezone.utc).strftime('%Y%m%dT%H%M%SZ')
date = amz_date[:8]
headers = {
'Accept': 'application/json',

Wyświetl plik

@ -1,4 +1,4 @@
from functools import partial
import functools
from .common import InfoExtractor
from ..utils import (
@ -50,7 +50,7 @@ class BibelTVBaseIE(InfoExtractor):
**traverse_obj(data, {
'title': 'title',
'description': 'description',
'duration': ('duration', {partial(int_or_none, scale=1000)}),
'duration': ('duration', {functools.partial(int_or_none, scale=1000)}),
'timestamp': ('schedulingStart', {parse_iso8601}),
'season_number': 'seasonNumber',
'episode_number': 'episodeNumber',

Wyświetl plik

@ -93,11 +93,11 @@ class BilibiliBaseIE(InfoExtractor):
return formats
def _download_playinfo(self, video_id, cid):
def _download_playinfo(self, video_id, cid, headers=None):
return self._download_json(
'https://api.bilibili.com/x/player/playurl', video_id,
query={'bvid': video_id, 'cid': cid, 'fnval': 4048},
note=f'Downloading video formats for cid {cid}')['data']
note=f'Downloading video formats for cid {cid}', headers=headers)['data']
def json2srt(self, json_data):
srt_data = ''
@ -493,7 +493,8 @@ class BiliBiliIE(BilibiliBaseIE):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage, urlh = self._download_webpage_handle(url, video_id)
headers = self.geo_verification_headers()
webpage, urlh = self._download_webpage_handle(url, video_id, headers=headers)
if not self._match_valid_url(urlh.url):
return self.url_result(urlh.url)
@ -531,7 +532,7 @@ class BiliBiliIE(BilibiliBaseIE):
self._download_json(
'https://api.bilibili.com/x/player/pagelist', video_id,
fatal=False, query={'bvid': video_id, 'jsonp': 'jsonp'},
note='Extracting videos in anthology'),
note='Extracting videos in anthology', headers=headers),
'data', expected_type=list) or []
is_anthology = len(page_list_json) > 1
@ -552,7 +553,7 @@ class BiliBiliIE(BilibiliBaseIE):
festival_info = {}
if is_festival:
play_info = self._download_playinfo(video_id, cid)
play_info = self._download_playinfo(video_id, cid, headers=headers)
festival_info = traverse_obj(initial_state, {
'uploader': ('videoInfo', 'upName'),
@ -666,14 +667,15 @@ class BiliBiliBangumiIE(BilibiliBaseIE):
def _real_extract(self, url):
episode_id = self._match_id(url)
webpage = self._download_webpage(url, episode_id)
headers = self.geo_verification_headers()
webpage = self._download_webpage(url, episode_id, headers=headers)
if '您所在的地区无法观看本片' in webpage:
raise GeoRestrictedError('This video is restricted')
elif '正在观看预览,大会员免费看全片' in webpage:
self.raise_login_required('This video is for premium members only')
headers = {'Referer': url, **self.geo_verification_headers()}
headers['Referer'] = url
play_info = self._download_json(
'https://api.bilibili.com/pgc/player/web/v2/playurl', episode_id,
'Extracting episode', query={'fnval': '4048', 'ep_id': episode_id},
@ -724,7 +726,7 @@ class BiliBiliBangumiIE(BilibiliBaseIE):
'duration': float_or_none(play_info.get('timelength'), scale=1000),
'subtitles': self.extract_subtitles(episode_id, episode_info.get('cid'), aid=aid),
'__post_extractor': self.extract_comments(aid),
'http_headers': headers,
'http_headers': {'Referer': url},
}
@ -1043,15 +1045,17 @@ class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
try:
response = self._download_json('https://api.bilibili.com/x/space/wbi/arc/search',
playlist_id, note=f'Downloading page {page_idx}', query=query)
playlist_id, note=f'Downloading page {page_idx}', query=query,
headers={'referer': url})
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 412:
raise ExtractorError(
'Request is blocked by server (412), please add cookies, wait and try later.', expected=True)
raise
if response['code'] == -401:
if response['code'] in (-352, -401):
raise ExtractorError(
'Request is blocked by server (401), please add cookies, wait and try later.', expected=True)
f'Request is blocked by server ({-response["code"]}), '
'please add cookies, wait and try later.', expected=True)
return response['data']
def get_metadata(page_data):

Wyświetl plik

@ -1,7 +1,11 @@
import json
import urllib.parse
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import (
ExtractorError,
bug_reports_message,
int_or_none,
qualities,
str_or_none,
@ -162,9 +166,19 @@ class BoostyIE(InfoExtractor):
def _real_extract(self, url):
user, post_id = self._match_valid_url(url).group('user', 'post_id')
auth_headers = {}
auth_cookie = self._get_cookies('https://boosty.to/').get('auth')
if auth_cookie is not None:
try:
auth_data = json.loads(urllib.parse.unquote(auth_cookie.value))
auth_headers['Authorization'] = f'Bearer {auth_data["accessToken"]}'
except (json.JSONDecodeError, KeyError):
self.report_warning(f'Failed to extract token from auth cookie{bug_reports_message()}')
post = self._download_json(
f'https://api.boosty.to/v1/blog/{user}/post/{post_id}', post_id,
note='Downloading post data', errnote='Unable to download post data')
note='Downloading post data', errnote='Unable to download post data', headers=auth_headers)
post_title = post.get('title')
if not post_title:
@ -202,7 +216,9 @@ class BoostyIE(InfoExtractor):
'thumbnail': (('previewUrl', 'defaultPreview'), {url_or_none}),
}, get_all=False)})
if not entries:
if not entries and not post.get('hasAccess'):
self.raise_login_required('This post requires a subscription', metadata_available=True)
elif not entries:
raise ExtractorError('No videos found', expected=True)
if len(entries) == 1:
return entries[0]

Wyświetl plik

@ -3,6 +3,7 @@ import urllib.parse
from .common import InfoExtractor
from ..utils import (
ExtractorError,
parse_iso8601,
update_url_query,
url_or_none,
@ -11,8 +12,8 @@ from ..utils.traversal import traverse_obj
class BoxIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^.]+\.)?app\.box\.com/s/(?P<shared_name>[^/?#]+)/file/(?P<id>\d+)'
_TEST = {
_VALID_URL = r'https?://(?:[^.]+\.)?app\.box\.com/s/(?P<shared_name>[^/?#]+)(?:/file/(?P<id>\d+))?'
_TESTS = [{
'url': 'https://mlssoccer.app.box.com/s/0evd2o3e08l60lr4ygukepvnkord1o1x/file/510727257538',
'md5': '1f81b2fd3960f38a40a3b8823e5fcd43',
'info_dict': {
@ -25,14 +26,36 @@ class BoxIE(InfoExtractor):
'uploader_id': '235196876',
},
'params': {'skip_download': 'dash fragment too small'},
}
}, {
'url': 'https://utexas.app.box.com/s/2x6vanv85fdl8j2eqlcxmv0gp1wvps6e',
'info_dict': {
'id': '787379022466',
'ext': 'mp4',
'title': 'Webinar recording: Take the Leap!.mp4',
'uploader': 'Patricia Mosele',
'timestamp': 1615824864,
'upload_date': '20210315',
'uploader_id': '239068974',
},
'params': {'skip_download': 'dash fragment too small'},
}]
def _real_extract(self, url):
shared_name, file_id = self._match_valid_url(url).groups()
webpage = self._download_webpage(url, file_id)
request_token = self._parse_json(self._search_regex(
r'Box\.config\s*=\s*({.+?});', webpage,
'Box config'), file_id)['requestToken']
webpage = self._download_webpage(url, file_id or shared_name)
if not file_id:
post_stream_data = self._search_json(
r'Box\.postStreamData\s*=', webpage, 'Box post-stream data', shared_name)
shared_item = traverse_obj(
post_stream_data, ('/app-api/enduserapp/shared-item', {dict})) or {}
if shared_item.get('itemType') != 'file':
raise ExtractorError('The requested resource is not a file', expected=True)
file_id = str(shared_item['itemID'])
request_token = self._search_json(
r'Box\.config\s*=', webpage, 'Box config', file_id)['requestToken']
access_token = self._download_json(
'https://app.box.com/app-api/enduserapp/elements/tokens', file_id,
'Downloading token JSON metadata',

Wyświetl plik

@ -1,5 +1,5 @@
import functools
import re
from functools import partial
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
@ -115,9 +115,9 @@ class BundestagIE(InfoExtractor):
note='Downloading metadata overlay', fatal=False,
), {
'title': (
{partial(get_element_text_and_html_by_tag, 'h3')}, 0,
{partial(re.sub, r'<span[^>]*>[^<]+</span>', '')}, {clean_html}),
'description': ({partial(get_element_text_and_html_by_tag, 'p')}, 0, {clean_html}),
{functools.partial(get_element_text_and_html_by_tag, 'h3')}, 0,
{functools.partial(re.sub, r'<span[^>]*>[^<]+</span>', '')}, {clean_html}),
'description': ({functools.partial(get_element_text_and_html_by_tag, 'p')}, 0, {clean_html}),
}))
return result

Wyświetl plik

@ -40,7 +40,7 @@ class CanalAlphaIE(InfoExtractor):
'id': '24484',
'ext': 'mp4',
'title': 'Ces innovations qui veulent rendre lagriculture plus durable',
'description': 'md5:3de3f151180684621e85be7c10e4e613',
'description': 'md5:85d594a3b5dc6ccfc4a85aba6e73b129',
'thumbnail': 'https://static.canalalpha.ch/poster/magazine/magazine_10236.jpg',
'upload_date': '20211026',
'duration': 360,
@ -58,14 +58,25 @@ class CanalAlphaIE(InfoExtractor):
'duration': 360,
},
'params': {'skip_download': True}
}, {
'url': 'https://www.canalalpha.ch/play/le-journal/topic/33500/encore-des-mesures-deconomie-dans-le-jura',
'info_dict': {
'id': '33500',
'ext': 'mp4',
'title': 'Encore des mesures d\'économie dans le Jura',
'description': 'md5:938b5b556592f2d1b9ab150268082a80',
'thumbnail': 'https://static.canalalpha.ch/poster/news/news_46665.jpg',
'upload_date': '20240411',
'duration': 105,
},
}]
def _real_extract(self, url):
id = self._match_id(url)
webpage = self._download_webpage(url, id)
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
data_json = self._parse_json(self._search_regex(
r'window\.__SERVER_STATE__\s?=\s?({(?:(?!};)[^"]|"([^"]|\\")*")+})\s?;',
webpage, 'data_json'), id)['1']['data']['data']
webpage, 'data_json'), video_id)['1']['data']['data']
manifests = try_get(data_json, lambda x: x['video']['manifests'], expected_type=dict) or {}
subtitles = {}
formats = [{
@ -75,15 +86,17 @@ class CanalAlphaIE(InfoExtractor):
'height': try_get(video, lambda x: x['res']['height'], expected_type=int),
} for video in try_get(data_json, lambda x: x['video']['mp4'], expected_type=list) or [] if video.get('$url')]
if manifests.get('hls'):
m3u8_frmts, m3u8_subs = self._parse_m3u8_formats_and_subtitles(manifests['hls'], video_id=id)
formats.extend(m3u8_frmts)
subtitles = self._merge_subtitles(subtitles, m3u8_subs)
fmts, subs = self._extract_m3u8_formats_and_subtitles(
manifests['hls'], video_id, m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
if manifests.get('dash'):
dash_frmts, dash_subs = self._parse_mpd_formats_and_subtitles(manifests['dash'])
formats.extend(dash_frmts)
subtitles = self._merge_subtitles(subtitles, dash_subs)
fmts, subs = self._extract_mpd_formats_and_subtitles(
manifests['dash'], video_id, mpd_id='dash', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
return {
'id': id,
'id': video_id,
'title': data_json.get('title').strip(),
'description': clean_html(dict_get(data_json, ('longDesc', 'shortDesc'))),
'thumbnail': data_json.get('poster'),

Wyświetl plik

@ -151,7 +151,7 @@ class CBCIE(InfoExtractor):
class CBCPlayerIE(InfoExtractor):
IE_NAME = 'cbc.ca:player'
_VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/|i/caffeine/syndicate/\?mediaId=))(?P<id>\d+)'
_VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/(?:video/)?|i/caffeine/syndicate/\?mediaId=))(?P<id>(?:\d\.)?\d+)'
_TESTS = [{
'url': 'http://www.cbc.ca/player/play/2683190193',
'md5': '64d25f841ddf4ddb28a235338af32e2c',
@ -165,9 +165,52 @@ class CBCPlayerIE(InfoExtractor):
'uploader': 'CBCC-NEW',
},
'skip': 'Geo-restricted to Canada and no longer available',
}, {
'url': 'http://www.cbc.ca/i/caffeine/syndicate/?mediaId=2657631896',
'md5': 'e5e708c34ae6fca156aafe17c43e8b75',
'info_dict': {
'id': '2657631896',
'ext': 'mp3',
'title': 'CBC Montreal is organizing its first ever community hackathon!',
'description': 'md5:dd3b692f0a139b0369943150bd1c46a9',
'timestamp': 1425704400,
'upload_date': '20150307',
'uploader': 'CBCC-NEW',
'thumbnail': 'http://thumbnails.cbc.ca/maven_legacy/thumbnails/sonali-karnick-220.jpg',
'chapters': [],
'duration': 494.811,
'categories': ['AudioMobile/All in a Weekend Montreal'],
'tags': 'count:8',
'location': 'Quebec',
'series': 'All in a Weekend Montreal',
'season': 'Season 2015',
'season_number': 2015,
'media_type': 'Excerpt',
},
}, {
'url': 'http://www.cbc.ca/i/caffeine/syndicate/?mediaId=2164402062',
'md5': '33fcd8f6719b9dd60a5e73adcb83b9f6',
'info_dict': {
'id': '2164402062',
'ext': 'mp4',
'title': 'Cancer survivor four times over',
'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.',
'timestamp': 1320410746,
'upload_date': '20111104',
'uploader': 'CBCC-NEW',
'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/277/67/cancer_852x480_2164412612.jpg',
'chapters': [],
'duration': 186.867,
'series': 'CBC News: Windsor at 6:00',
'categories': ['News/Canada/Windsor'],
'location': 'Windsor',
'tags': ['cancer'],
'creators': ['Allison Johnson'],
'media_type': 'Excerpt',
},
}, {
# Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/
'url': 'http://www.cbc.ca/player/play/2657631896',
'url': 'https://www.cbc.ca/player/play/1.2985700',
'md5': 'e5e708c34ae6fca156aafe17c43e8b75',
'info_dict': {
'id': '2657631896',
@ -189,7 +232,7 @@ class CBCPlayerIE(InfoExtractor):
'media_type': 'Excerpt',
},
}, {
'url': 'http://www.cbc.ca/player/play/2164402062',
'url': 'https://www.cbc.ca/player/play/1.1711287',
'md5': '33fcd8f6719b9dd60a5e73adcb83b9f6',
'info_dict': {
'id': '2164402062',
@ -206,38 +249,75 @@ class CBCPlayerIE(InfoExtractor):
'categories': ['News/Canada/Windsor'],
'location': 'Windsor',
'tags': ['cancer'],
'creator': 'Allison Johnson',
'creators': ['Allison Johnson'],
'media_type': 'Excerpt',
},
}, {
# Has subtitles
# These broadcasts expire after ~1 month, can find new test URL here:
# https://www.cbc.ca/player/news/TV%20Shows/The%20National/Latest%20Broadcast
'url': 'http://www.cbc.ca/player/play/2284799043667',
'md5': '9b49f0839e88b6ec0b01d840cf3d42b5',
'url': 'https://www.cbc.ca/player/play/1.7159484',
'md5': '6ed6cd0fc2ef568d2297ba68a763d455',
'info_dict': {
'id': '2284799043667',
'id': '2324213316001',
'ext': 'mp4',
'title': 'The National | Hockey coach charged, Green grants, Safer drugs',
'description': 'md5:84ef46321c94bcf7d0159bb565d26bfa',
'timestamp': 1700272800,
'duration': 2718.833,
'title': 'The National | School boards sue social media giants',
'description': 'md5:4b4db69322fa32186c3ce426da07402c',
'timestamp': 1711681200,
'duration': 2743.400,
'subtitles': {'eng': [{'ext': 'vtt', 'protocol': 'm3u8_native'}]},
'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/907/171/thumbnail.jpeg',
'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/607/559/thumbnail.jpeg',
'uploader': 'CBCC-NEW',
'chapters': 'count:5',
'upload_date': '20231118',
'upload_date': '20240329',
'categories': 'count:4',
'series': 'The National - Full Show',
'tags': 'count:1',
'creator': 'News',
'creators': ['News'],
'location': 'Canada',
'media_type': 'Full Program',
},
}, {
'url': 'https://www.cbc.ca/player/play/video/1.7194274',
'md5': '188b96cf6bdcb2540e178a6caa957128',
'info_dict': {
'id': '2334524995812',
'ext': 'mp4',
'title': '#TheMoment a rare white spirit moose was spotted in Alberta',
'description': 'md5:18ae269a2d0265c5b0bbe4b2e1ac61a3',
'timestamp': 1714788791,
'duration': 77.678,
'subtitles': {'eng': [{'ext': 'vtt', 'protocol': 'm3u8_native'}]},
'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/201/543/THE_MOMENT.jpg',
'uploader': 'CBCC-NEW',
'chapters': 'count:0',
'upload_date': '20240504',
'categories': 'count:3',
'series': 'The National',
'tags': 'count:15',
'creators': ['encoder'],
'location': 'Canada',
'media_type': 'Excerpt',
},
}, {
'url': 'cbcplayer:1.7159484',
'only_matching': True,
}, {
'url': 'cbcplayer:2164402062',
'only_matching': True,
}, {
'url': 'http://www.cbc.ca/player/play/2657631896',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
if '.' in video_id:
webpage = self._download_webpage(f'https://www.cbc.ca/player/play/{video_id}', video_id)
video_id = self._search_json(
r'window\.__INITIAL_STATE__\s*=', webpage,
'initial state', video_id)['video']['currentClip']['mediaId']
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',

Wyświetl plik

@ -1,6 +1,6 @@
import base64
import codecs
import datetime
import datetime as dt
import hashlib
import hmac
import json
@ -134,7 +134,7 @@ class CDAIE(InfoExtractor):
self._API_HEADERS['User-Agent'] = f'pl.cda 1.0 (version {app_version}; Android {android_version}; {phone_model})'
cached_bearer = self.cache.load(self._BEARER_CACHE, username) or {}
if cached_bearer.get('valid_until', 0) > datetime.datetime.now().timestamp() + 5:
if cached_bearer.get('valid_until', 0) > dt.datetime.now().timestamp() + 5:
self._API_HEADERS['Authorization'] = f'Bearer {cached_bearer["token"]}'
return
@ -154,7 +154,7 @@ class CDAIE(InfoExtractor):
})
self.cache.store(self._BEARER_CACHE, username, {
'token': token_res['access_token'],
'valid_until': token_res['expires_in'] + datetime.datetime.now().timestamp(),
'valid_until': token_res['expires_in'] + dt.datetime.now().timestamp(),
})
self._API_HEADERS['Authorization'] = f'Bearer {token_res["access_token"]}'

Wyświetl plik

@ -37,6 +37,7 @@ from ..networking.exceptions import (
IncompleteRead,
network_exceptions,
)
from ..networking.impersonate import ImpersonateTarget
from ..utils import (
IDENTITY,
JSON_LD_RE,
@ -170,12 +171,12 @@ class InfoExtractor:
Automatically calculated from width and height
* dynamic_range The dynamic range of the video. One of:
"SDR" (None), "HDR10", "HDR10+, "HDR12", "HLG, "DV"
* tbr Average bitrate of audio and video in KBit/s
* abr Average audio bitrate in KBit/s
* tbr Average bitrate of audio and video in kbps (1000 bits/sec)
* abr Average audio bitrate in kbps (1000 bits/sec)
* acodec Name of the audio codec in use
* asr Audio sampling rate in Hertz
* audio_channels Number of audio channels
* vbr Average video bitrate in KBit/s
* vbr Average video bitrate in kbps (1000 bits/sec)
* fps Frame rate
* vcodec Name of the video codec in use
* container Name of the container format
@ -246,7 +247,8 @@ class InfoExtractor:
* downloader_options A dictionary of downloader options
(For internal use only)
* http_chunk_size Chunk size for HTTP downloads
* ffmpeg_args Extra arguments for ffmpeg downloader
* ffmpeg_args Extra arguments for ffmpeg downloader (input)
* ffmpeg_args_out Extra arguments for ffmpeg downloader (output)
* is_dash_periods Whether the format is a result of merging
multiple DASH periods.
RTMP formats can also have the additional fields: page_url,
@ -817,7 +819,7 @@ class InfoExtractor:
else:
return err.status in variadic(expected_status)
def _create_request(self, url_or_request, data=None, headers=None, query=None):
def _create_request(self, url_or_request, data=None, headers=None, query=None, extensions=None):
if isinstance(url_or_request, urllib.request.Request):
self._downloader.deprecation_warning(
'Passing a urllib.request.Request to _create_request() is deprecated. '
@ -826,10 +828,11 @@ class InfoExtractor:
elif not isinstance(url_or_request, Request):
url_or_request = Request(url_or_request)
url_or_request.update(data=data, headers=headers, query=query)
url_or_request.update(data=data, headers=headers, query=query, extensions=extensions)
return url_or_request
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers=None, query=None, expected_status=None):
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None,
headers=None, query=None, expected_status=None, impersonate=None, require_impersonation=False):
"""
Return the response handle.
@ -860,8 +863,31 @@ class InfoExtractor:
headers = (headers or {}).copy()
headers.setdefault('X-Forwarded-For', self._x_forwarded_for_ip)
extensions = {}
if impersonate in (True, ''):
impersonate = ImpersonateTarget()
requested_targets = [
t if isinstance(t, ImpersonateTarget) else ImpersonateTarget.from_str(t)
for t in variadic(impersonate)
] if impersonate else []
available_target = next(filter(self._downloader._impersonate_target_available, requested_targets), None)
if available_target:
extensions['impersonate'] = available_target
elif requested_targets:
message = 'The extractor is attempting impersonation, but '
message += (
'no impersonate target is available' if not str(impersonate)
else f'none of these impersonate targets are available: "{", ".join(map(str, requested_targets))}"')
info_msg = ('see https://github.com/yt-dlp/yt-dlp#impersonation '
'for information on installing the required dependencies')
if require_impersonation:
raise ExtractorError(f'{message}; {info_msg}', expected=True)
self.report_warning(f'{message}; if you encounter errors, then {info_msg}', only_once=True)
try:
return self._downloader.urlopen(self._create_request(url_or_request, data, headers, query))
return self._downloader.urlopen(self._create_request(url_or_request, data, headers, query, extensions))
except network_exceptions as err:
if isinstance(err, HTTPError):
if self.__can_accept_status_code(err, expected_status):
@ -880,13 +906,14 @@ class InfoExtractor:
return False
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True,
encoding=None, data=None, headers={}, query={}, expected_status=None):
encoding=None, data=None, headers={}, query={}, expected_status=None,
impersonate=None, require_impersonation=False):
"""
Return a tuple (page content as string, URL handle).
Arguments:
url_or_request -- plain text URL as a string or
a urllib.request.Request object
a yt_dlp.networking.Request object
video_id -- Video/playlist/item identifier (string)
Keyword arguments:
@ -911,13 +938,22 @@ class InfoExtractor:
returning True if it should be accepted
Note that this argument does not affect success status codes (2xx)
which are always accepted.
impersonate -- the impersonate target. Can be any of the following entities:
- an instance of yt_dlp.networking.impersonate.ImpersonateTarget
- a string in the format of CLIENT[:OS]
- a list or a tuple of CLIENT[:OS] strings or ImpersonateTarget instances
- a boolean value; True means any impersonate target is sufficient
require_impersonation -- flag to toggle whether the request should raise an error
if impersonation is not possible (bool, default: False)
"""
# Strip hashes from the URL (#1038)
if isinstance(url_or_request, str):
url_or_request = url_or_request.partition('#')[0]
urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query, expected_status=expected_status)
urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data,
headers=headers, query=query, expected_status=expected_status,
impersonate=impersonate, require_impersonation=require_impersonation)
if urlh is False:
assert not fatal
return False
@ -1046,17 +1082,20 @@ class InfoExtractor:
return getattr(ie, parser)(content, *args, **kwargs)
def download_handle(self, url_or_request, video_id, note=note, errnote=errnote, transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}, expected_status=None):
fatal=True, encoding=None, data=None, headers={}, query={}, expected_status=None,
impersonate=None, require_impersonation=False):
res = self._download_webpage_handle(
url_or_request, video_id, note=note, errnote=errnote, fatal=fatal, encoding=encoding,
data=data, headers=headers, query=query, expected_status=expected_status)
data=data, headers=headers, query=query, expected_status=expected_status,
impersonate=impersonate, require_impersonation=require_impersonation)
if res is False:
return res
content, urlh = res
return parse(self, content, video_id, transform_source=transform_source, fatal=fatal, errnote=errnote), urlh
def download_content(self, url_or_request, video_id, note=note, errnote=errnote, transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}, expected_status=None):
fatal=True, encoding=None, data=None, headers={}, query={}, expected_status=None,
impersonate=None, require_impersonation=False):
if self.get_param('load_pages'):
url_or_request = self._create_request(url_or_request, data, headers, query)
filename = self._request_dump_filename(url_or_request.url, video_id)
@ -1079,6 +1118,8 @@ class InfoExtractor:
'headers': headers,
'query': query,
'expected_status': expected_status,
'impersonate': impersonate,
'require_impersonation': require_impersonation,
}
if parser is None:
kwargs.pop('transform_source')
@ -1697,12 +1738,16 @@ class InfoExtractor:
traverse_json_ld(json_ld)
return filter_dict(info)
def _search_nextjs_data(self, webpage, video_id, *, transform_source=None, fatal=True, **kw):
return self._parse_json(
self._search_regex(
r'(?s)<script[^>]+id=[\'"]__NEXT_DATA__[\'"][^>]*>([^<]+)</script>',
webpage, 'next.js data', fatal=fatal, **kw),
video_id, transform_source=transform_source, fatal=fatal)
def _search_nextjs_data(self, webpage, video_id, *, fatal=True, default=NO_DEFAULT, **kw):
if default == '{}':
self._downloader.deprecation_warning('using `default=\'{}\'` is deprecated, use `default={}` instead')
default = {}
if default is not NO_DEFAULT:
fatal = False
return self._search_json(
r'<script[^>]+id=[\'"]__NEXT_DATA__[\'"][^>]*>', webpage, 'next.js data',
video_id, end_pattern='</script>', fatal=fatal, default=default, **kw)
def _search_nuxt_data(self, webpage, video_id, context_name='__NUXT__', *, fatal=True, traverse=('data', 0)):
"""Parses Nuxt.js metadata. This works as long as the function __NUXT__ invokes is a pure function"""

Wyświetl plik

@ -40,3 +40,19 @@ class UnicodeBOMIE(InfoExtractor):
'Your URL starts with a Byte Order Mark (BOM). '
'Removing the BOM and looking for "%s" ...' % real_url)
return self.url_result(real_url)
class BlobIE(InfoExtractor):
IE_DESC = False
_VALID_URL = r'blob:'
_TESTS = [{
'url': 'blob:https://www.youtube.com/4eb3d090-a761-46e6-8083-c32016a36e3b',
'only_matching': True,
}]
def _real_extract(self, url):
raise ExtractorError(
'You\'ve asked yt-dlp to download a blob URL. '
'A blob URL exists only locally in your browser. '
'It is not possible for yt-dlp to access it.', expected=True)

Wyświetl plik

@ -1,4 +1,5 @@
import base64
import uuid
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
@ -7,12 +8,11 @@ from ..utils import (
float_or_none,
format_field,
int_or_none,
join_nonempty,
jwt_decode_hs256,
parse_age_limit,
parse_count,
parse_iso8601,
qualities,
remove_start,
time_seconds,
traverse_obj,
url_or_none,
@ -24,10 +24,15 @@ class CrunchyrollBaseIE(InfoExtractor):
_BASE_URL = 'https://www.crunchyroll.com'
_API_BASE = 'https://api.crunchyroll.com'
_NETRC_MACHINE = 'crunchyroll'
_REFRESH_TOKEN = None
_AUTH_HEADERS = None
_AUTH_EXPIRY = None
_API_ENDPOINT = None
_BASIC_AUTH = None
_CLIENT_ID = ('cr_web', 'noaihdevm_6iyg0a8l0q')
_BASIC_AUTH = 'Basic ' + base64.b64encode(':'.join((
't-kdgp2h8c3jub8fn0fq',
'yfLDfMfrYvKXh4JXS1LEI2cCqu1v5Wan',
)).encode()).decode()
_IS_PREMIUM = None
_LOCALE_LOOKUP = {
'ar': 'ar-SA',
'de': 'de-DE',
@ -42,63 +47,78 @@ class CrunchyrollBaseIE(InfoExtractor):
'hi': 'hi-IN',
}
@property
def is_logged_in(self):
return bool(self._get_cookies(self._BASE_URL).get('etp_rt'))
def _set_auth_info(self, response):
CrunchyrollBaseIE._IS_PREMIUM = 'cr_premium' in traverse_obj(response, ('access_token', {jwt_decode_hs256}, 'benefits', ...))
CrunchyrollBaseIE._AUTH_HEADERS = {'Authorization': response['token_type'] + ' ' + response['access_token']}
CrunchyrollBaseIE._AUTH_EXPIRY = time_seconds(seconds=traverse_obj(response, ('expires_in', {float_or_none}), default=300) - 10)
def _request_token(self, headers, data, note='Requesting token', errnote='Failed to request token'):
try:
return self._download_json(
f'{self._BASE_URL}/auth/v1/token', None, note=note, errnote=errnote,
headers=headers, data=urlencode_postdata(data), impersonate=True)
except ExtractorError as error:
if not isinstance(error.cause, HTTPError) or error.cause.status != 403:
raise
if target := error.cause.response.extensions.get('impersonate'):
raise ExtractorError(f'Got HTTP Error 403 when using impersonate target "{target}"')
raise ExtractorError(
'Request blocked by Cloudflare. '
'Install the required impersonation dependency if possible, '
'or else navigate to Crunchyroll in your browser, '
'then pass the fresh cookies (with --cookies-from-browser or --cookies) '
'and your browser\'s User-Agent (with --user-agent)', expected=True)
def _perform_login(self, username, password):
if self.is_logged_in:
if not CrunchyrollBaseIE._REFRESH_TOKEN:
CrunchyrollBaseIE._REFRESH_TOKEN = self.cache.load(self._NETRC_MACHINE, username)
if CrunchyrollBaseIE._REFRESH_TOKEN:
return
upsell_response = self._download_json(
f'{self._API_BASE}/get_upsell_data.0.json', None, 'Getting session id',
query={
'sess_id': 1,
'device_id': 'whatvalueshouldbeforweb',
'device_type': 'com.crunchyroll.static',
'access_token': 'giKq5eY27ny3cqz',
'referer': f'{self._BASE_URL}/welcome/login'
})
if upsell_response['code'] != 'ok':
raise ExtractorError('Could not get session id')
session_id = upsell_response['data']['session_id']
login_response = self._download_json(
f'{self._API_BASE}/login.1.json', None, 'Logging in',
data=urlencode_postdata({
'account': username,
'password': password,
'session_id': session_id
}))
if login_response['code'] != 'ok':
raise ExtractorError('Login failed. Server message: %s' % login_response['message'], expected=True)
if not self.is_logged_in:
raise ExtractorError('Login succeeded but did not set etp_rt cookie')
def _update_auth(self):
if CrunchyrollBaseIE._AUTH_HEADERS and CrunchyrollBaseIE._AUTH_REFRESH > time_seconds():
return
if not CrunchyrollBaseIE._BASIC_AUTH:
cx_api_param = self._CLIENT_ID[self.is_logged_in]
self.write_debug(f'Using cxApiParam={cx_api_param}')
CrunchyrollBaseIE._BASIC_AUTH = 'Basic ' + base64.b64encode(f'{cx_api_param}:'.encode()).decode()
grant_type = 'etp_rt_cookie' if self.is_logged_in else 'client_id'
try:
auth_response = self._download_json(
f'{self._BASE_URL}/auth/v1/token', None, note=f'Authenticating with grant_type={grant_type}',
headers={'Authorization': CrunchyrollBaseIE._BASIC_AUTH}, data=f'grant_type={grant_type}'.encode())
login_response = self._request_token(
headers={'Authorization': self._BASIC_AUTH}, data={
'username': username,
'password': password,
'grant_type': 'password',
'scope': 'offline_access',
}, note='Logging in', errnote='Failed to log in')
except ExtractorError as error:
if isinstance(error.cause, HTTPError) and error.cause.status == 403:
raise ExtractorError(
'Request blocked by Cloudflare; navigate to Crunchyroll in your browser, '
'then pass the fresh cookies (with --cookies-from-browser or --cookies) '
'and your browser\'s User-Agent (with --user-agent)', expected=True)
if isinstance(error.cause, HTTPError) and error.cause.status == 401:
raise ExtractorError('Invalid username and/or password', expected=True)
raise
CrunchyrollBaseIE._AUTH_HEADERS = {'Authorization': auth_response['token_type'] + ' ' + auth_response['access_token']}
CrunchyrollBaseIE._AUTH_REFRESH = time_seconds(seconds=traverse_obj(auth_response, ('expires_in', {float_or_none}), default=300) - 10)
CrunchyrollBaseIE._REFRESH_TOKEN = login_response['refresh_token']
self.cache.store(self._NETRC_MACHINE, username, CrunchyrollBaseIE._REFRESH_TOKEN)
self._set_auth_info(login_response)
def _update_auth(self):
if CrunchyrollBaseIE._AUTH_HEADERS and CrunchyrollBaseIE._AUTH_EXPIRY > time_seconds():
return
auth_headers = {'Authorization': self._BASIC_AUTH}
if CrunchyrollBaseIE._REFRESH_TOKEN:
data = {
'refresh_token': CrunchyrollBaseIE._REFRESH_TOKEN,
'grant_type': 'refresh_token',
'scope': 'offline_access',
}
else:
data = {'grant_type': 'client_id'}
auth_headers['ETP-Anonymous-ID'] = uuid.uuid4()
try:
auth_response = self._request_token(auth_headers, data)
except ExtractorError as error:
username, password = self._get_login_info()
if not username or not isinstance(error.cause, HTTPError) or error.cause.status != 400:
raise
self.to_screen('Refresh token has expired. Re-logging in')
CrunchyrollBaseIE._REFRESH_TOKEN = None
self.cache.store(self._NETRC_MACHINE, username, None)
self._perform_login(username, password)
return
self._set_auth_info(auth_response)
def _locale_from_language(self, language):
config_locale = self._configuration_arg('metadata', ie_key=CrunchyrollBetaIE, casesense=True)
@ -135,62 +155,73 @@ class CrunchyrollBaseIE(InfoExtractor):
raise ExtractorError(f'Unexpected response when downloading {note} JSON')
return result
def _extract_formats(self, stream_response, display_id=None):
requested_formats = self._configuration_arg('format') or ['adaptive_hls']
available_formats = {}
for stream_type, streams in traverse_obj(
stream_response, (('streams', ('data', 0)), {dict.items}, ...)):
if stream_type not in requested_formats:
def _extract_chapters(self, internal_id):
# if no skip events are available, a 403 xml error is returned
skip_events = self._download_json(
f'https://static.crunchyroll.com/skip-events/production/{internal_id}.json',
internal_id, note='Downloading chapter info', fatal=False, errnote=False)
if not skip_events:
return None
chapters = []
for event in ('recap', 'intro', 'credits', 'preview'):
start = traverse_obj(skip_events, (event, 'start', {float_or_none}))
end = traverse_obj(skip_events, (event, 'end', {float_or_none}))
# some chapters have no start and/or ending time, they will just be ignored
if start is None or end is None:
continue
for stream in traverse_obj(streams, lambda _, v: v['url']):
hardsub_lang = stream.get('hardsub_locale') or ''
format_id = join_nonempty(stream_type, format_field(stream, 'hardsub_locale', 'hardsub-%s'))
available_formats[hardsub_lang] = (stream_type, format_id, hardsub_lang, stream['url'])
chapters.append({'title': event.capitalize(), 'start_time': start, 'end_time': end})
return chapters
def _extract_stream(self, identifier, display_id=None):
if not display_id:
display_id = identifier
self._update_auth()
stream_response = self._download_json(
f'https://cr-play-service.prd.crunchyrollsvc.com/v1/{identifier}/console/switch/play',
display_id, note='Downloading stream info', errnote='Failed to download stream info',
headers=CrunchyrollBaseIE._AUTH_HEADERS)
available_formats = {'': ('', '', stream_response['url'])}
for hardsub_lang, stream in traverse_obj(stream_response, ('hardSubs', {dict.items}, lambda _, v: v[1]['url'])):
available_formats[hardsub_lang] = (f'hardsub-{hardsub_lang}', hardsub_lang, stream['url'])
requested_hardsubs = [('' if val == 'none' else val) for val in (self._configuration_arg('hardsub') or ['none'])]
if '' in available_formats and 'all' not in requested_hardsubs:
hardsub_langs = [lang for lang in available_formats if lang]
if hardsub_langs and 'all' not in requested_hardsubs:
full_format_langs = set(requested_hardsubs)
self.to_screen(f'Available hardsub languages: {", ".join(hardsub_langs)}')
self.to_screen(
'To get all formats of a hardsub language, use '
'To extract formats of a hardsub language, use '
'"--extractor-args crunchyrollbeta:hardsub=<language_code or all>". '
'See https://github.com/yt-dlp/yt-dlp#crunchyrollbeta-crunchyroll for more info',
only_once=True)
else:
full_format_langs = set(map(str.lower, available_formats))
audio_locale = traverse_obj(stream_response, ((None, 'meta'), 'audio_locale'), get_all=False)
audio_locale = traverse_obj(stream_response, ('audioLocale', {str}))
hardsub_preference = qualities(requested_hardsubs[::-1])
formats = []
for stream_type, format_id, hardsub_lang, stream_url in available_formats.values():
if stream_type.endswith('hls'):
if hardsub_lang.lower() in full_format_langs:
adaptive_formats = self._extract_m3u8_formats(
stream_url, display_id, 'mp4', m3u8_id=format_id,
fatal=False, note=f'Downloading {format_id} HLS manifest')
else:
adaptive_formats = (self._m3u8_meta_format(stream_url, ext='mp4', m3u8_id=format_id),)
elif stream_type.endswith('dash'):
adaptive_formats = self._extract_mpd_formats(
stream_url, display_id, mpd_id=format_id,
fatal=False, note=f'Downloading {format_id} MPD manifest')
formats, subtitles = [], {}
for format_id, hardsub_lang, stream_url in available_formats.values():
if hardsub_lang.lower() in full_format_langs:
adaptive_formats, dash_subs = self._extract_mpd_formats_and_subtitles(
stream_url, display_id, mpd_id=format_id, headers=CrunchyrollBaseIE._AUTH_HEADERS,
fatal=False, note=f'Downloading {f"{format_id} " if hardsub_lang else ""}MPD manifest')
self._merge_subtitles(dash_subs, target=subtitles)
else:
self.report_warning(f'Encountered unknown stream_type: {stream_type!r}', display_id, only_once=True)
continue
continue # XXX: Update this if/when meta mpd formats are working
for f in adaptive_formats:
if f.get('acodec') != 'none':
f['language'] = audio_locale
f['quality'] = hardsub_preference(hardsub_lang.lower())
formats.extend(adaptive_formats)
return formats
for locale, subtitle in traverse_obj(stream_response, (('subtitles', 'captions'), {dict.items}, ...)):
subtitles.setdefault(locale, []).append(traverse_obj(subtitle, {'url': 'url', 'ext': 'format'}))
def _extract_subtitles(self, data):
subtitles = {}
for locale, subtitle in traverse_obj(data, ((None, 'meta'), 'subtitles', {dict.items}, ...)):
subtitles[locale] = [traverse_obj(subtitle, {'url': 'url', 'ext': 'format'})]
return subtitles
return formats, subtitles
class CrunchyrollCmsBaseIE(CrunchyrollBaseIE):
@ -245,7 +276,11 @@ class CrunchyrollBetaIE(CrunchyrollCmsBaseIE):
'like_count': int,
'dislike_count': int,
},
'params': {'skip_download': 'm3u8', 'format': 'all[format_id~=hardsub]'},
'params': {
'skip_download': 'm3u8',
'extractor_args': {'crunchyrollbeta': {'hardsub': ['de-DE']}},
'format': 'bv[format_id~=hardsub]',
},
}, {
# Premium only
'url': 'https://www.crunchyroll.com/watch/GYE5WKQGR',
@ -306,6 +341,7 @@ class CrunchyrollBetaIE(CrunchyrollCmsBaseIE):
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
},
'params': {'skip_download': 'm3u8'},
'skip': 'no longer exists',
}, {
'url': 'https://www.crunchyroll.com/watch/G62PEZ2E6',
'info_dict': {
@ -359,31 +395,16 @@ class CrunchyrollBetaIE(CrunchyrollCmsBaseIE):
else:
raise ExtractorError(f'Unknown object type {object_type}')
# There might be multiple audio languages for one object (`<object>_metadata.versions`),
# so we need to get the id from `streams_link` instead or we dont know which language to choose
streams_link = response.get('streams_link')
if not streams_link and traverse_obj(response, (f'{object_type}_metadata', 'is_premium_only')):
if not self._IS_PREMIUM and traverse_obj(response, (f'{object_type}_metadata', 'is_premium_only')):
message = f'This {object_type} is for premium members only'
if self.is_logged_in:
raise ExtractorError(message, expected=True)
self.raise_login_required(message)
if CrunchyrollBaseIE._REFRESH_TOKEN:
self.raise_no_formats(message, expected=True, video_id=internal_id)
else:
self.raise_login_required(message, method='password', metadata_available=True)
else:
result['formats'], result['subtitles'] = self._extract_stream(internal_id)
# We need go from unsigned to signed api to avoid getting soft banned
stream_response = self._call_cms_api_signed(remove_start(
streams_link, '/content/v2/cms/'), internal_id, lang, 'stream info')
result['formats'] = self._extract_formats(stream_response, internal_id)
result['subtitles'] = self._extract_subtitles(stream_response)
# if no intro chapter is available, a 403 without usable data is returned
intro_chapter = self._download_json(
f'https://static.crunchyroll.com/datalab-intro-v2/{internal_id}.json',
internal_id, note='Downloading chapter info', fatal=False, errnote=False)
if isinstance(intro_chapter, dict):
result['chapters'] = [{
'title': 'Intro',
'start_time': float_or_none(intro_chapter.get('startTime')),
'end_time': float_or_none(intro_chapter.get('endTime')),
}]
result['chapters'] = self._extract_chapters(internal_id)
def calculate_count(item):
return parse_count(''.join((item['displayed'], item.get('unit') or '')))
@ -512,7 +533,7 @@ class CrunchyrollMusicIE(CrunchyrollBaseIE):
'display_id': 'egaono-hana',
'title': 'Egaono Hana',
'track': 'Egaono Hana',
'artist': 'Goose house',
'artists': ['Goose house'],
'thumbnail': r're:(?i)^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'genres': ['J-Pop'],
},
@ -525,11 +546,12 @@ class CrunchyrollMusicIE(CrunchyrollBaseIE):
'display_id': 'crossing-field',
'title': 'Crossing Field',
'track': 'Crossing Field',
'artist': 'LiSA',
'artists': ['LiSA'],
'thumbnail': r're:(?i)^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'genres': ['Anime'],
},
'params': {'skip_download': 'm3u8'},
'skip': 'no longer exists',
}, {
'url': 'https://www.crunchyroll.com/watch/concert/MC2E2AC135',
'info_dict': {
@ -538,7 +560,7 @@ class CrunchyrollMusicIE(CrunchyrollBaseIE):
'display_id': 'live-is-smile-always-364joker-at-yokohama-arena',
'title': 'LiVE is Smile Always-364+JOKER- at YOKOHAMA ARENA',
'track': 'LiVE is Smile Always-364+JOKER- at YOKOHAMA ARENA',
'artist': 'LiSA',
'artists': ['LiSA'],
'thumbnail': r're:(?i)^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'description': 'md5:747444e7e6300907b7a43f0a0503072e',
'genres': ['J-Pop'],
@ -566,16 +588,16 @@ class CrunchyrollMusicIE(CrunchyrollBaseIE):
if not response:
raise ExtractorError(f'No video with id {internal_id} could be found (possibly region locked?)', expected=True)
streams_link = response.get('streams_link')
if not streams_link and response.get('isPremiumOnly'):
message = f'This {response.get("type") or "media"} is for premium members only'
if self.is_logged_in:
raise ExtractorError(message, expected=True)
self.raise_login_required(message)
result = self._transform_music_response(response)
stream_response = self._call_api(streams_link, internal_id, lang, 'stream info')
result['formats'] = self._extract_formats(stream_response, internal_id)
if not self._IS_PREMIUM and response.get('isPremiumOnly'):
message = f'This {response.get("type") or "media"} is for premium members only'
if CrunchyrollBaseIE._REFRESH_TOKEN:
self.raise_no_formats(message, expected=True, video_id=internal_id)
else:
self.raise_login_required(message, method='password', metadata_available=True)
else:
result['formats'], _ = self._extract_stream(f'music/{internal_id}', internal_id)
return result
@ -587,7 +609,7 @@ class CrunchyrollMusicIE(CrunchyrollBaseIE):
'display_id': 'slug',
'title': 'title',
'track': 'title',
'artist': ('artist', 'name'),
'artists': ('artist', 'name', all),
'description': ('description', {str}, {lambda x: x.replace(r'\r\n', '\n') or None}),
'thumbnails': ('images', ..., ..., {
'url': ('source', {url_or_none}),
@ -611,7 +633,7 @@ class CrunchyrollArtistIE(CrunchyrollBaseIE):
'info_dict': {
'id': 'MA179CB50D',
'title': 'LiSA',
'genres': ['J-Pop', 'Anime', 'Rock'],
'genres': ['Anime', 'J-Pop', 'Rock'],
'description': 'md5:16d87de61a55c3f7d6c454b73285938e',
},
'playlist_mincount': 83,

Wyświetl plik

@ -65,12 +65,14 @@ class DropboxIE(InfoExtractor):
formats, subtitles, has_anonymous_download = [], {}, False
for encoded in reversed(re.findall(r'registerStreamedPrefetch\s*\(\s*"[\w/+=]+"\s*,\s*"([\w/+=]+)"', webpage)):
decoded = base64.b64decode(encoded).decode('utf-8', 'ignore')
if not has_anonymous_download:
has_anonymous_download = self._search_regex(
r'(anonymous:\tanonymous)', decoded, 'anonymous', default=False)
transcode_url = self._search_regex(
r'\n.(https://[^\x03\x08\x12\n]+\.m3u8)', decoded, 'transcode url', default=None)
if not transcode_url:
continue
formats, subtitles = self._extract_m3u8_formats_and_subtitles(transcode_url, video_id, 'mp4')
has_anonymous_download = self._search_regex(r'(anonymous:\tanonymous)', decoded, 'anonymous', default=False)
break
# downloads enabled we can get the original file

Wyświetl plik

@ -1,5 +1,5 @@
import json
from socket import timeout
import socket
from .common import InfoExtractor
from ..utils import (
@ -56,7 +56,7 @@ class DTubeIE(InfoExtractor):
try:
self.to_screen('%s: Checking %s video format URL' % (video_id, format_id))
self._downloader._opener.open(video_url, timeout=5).close()
except timeout:
except socket.timeout:
self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, format_id))
continue

Wyświetl plik

@ -94,13 +94,14 @@ class EuropaIE(InfoExtractor):
class EuroParlWebstreamIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://multimedia\.europarl\.europa\.eu/[^/#?]+/
(?:(?!video)[^/#?]+/[\w-]+_)(?P<id>[\w-]+)
https?://multimedia\.europarl\.europa\.eu/
(?:\w+/)?webstreaming/(?:[\w-]+_)?(?P<id>[\w-]+)
'''
_TESTS = [{
'url': 'https://multimedia.europarl.europa.eu/pl/webstreaming/plenary-session_20220914-0900-PLENARY',
'info_dict': {
'id': '62388b15-d85b-4add-99aa-ba12ccf64f0d',
'display_id': '20220914-0900-PLENARY',
'ext': 'mp4',
'title': 'Plenary session',
'release_timestamp': 1663139069,
@ -125,6 +126,7 @@ class EuroParlWebstreamIE(InfoExtractor):
'url': 'https://multimedia.europarl.europa.eu/en/webstreaming/committee-on-culture-and-education_20230301-1130-COMMITTEE-CULT',
'info_dict': {
'id': '7355662c-8eac-445e-4bb9-08db14b0ddd7',
'display_id': '20230301-1130-COMMITTEE-CULT',
'ext': 'mp4',
'release_date': '20230301',
'title': 'Committee on Culture and Education',
@ -142,6 +144,19 @@ class EuroParlWebstreamIE(InfoExtractor):
'live_status': 'is_live',
},
'skip': 'Not live anymore'
}, {
'url': 'https://multimedia.europarl.europa.eu/en/webstreaming/20240320-1345-SPECIAL-PRESSER',
'info_dict': {
'id': 'c1f11567-5b52-470a-f3e1-08dc3c216ace',
'display_id': '20240320-1345-SPECIAL-PRESSER',
'ext': 'mp4',
'release_date': '20240320',
'title': 'md5:7c6c814cac55dea5e2d87bf8d3db2234',
'release_timestamp': 1710939767,
}
}, {
'url': 'https://multimedia.europarl.europa.eu/webstreaming/briefing-for-media-on-2024-european-elections_20240429-1000-SPECIAL-OTHER',
'only_matching': True,
}]
def _real_extract(self, url):
@ -166,6 +181,7 @@ class EuroParlWebstreamIE(InfoExtractor):
return {
'id': json_info['id'],
'display_id': display_id,
'title': traverse_obj(webpage_nextjs, (('mediaItem', 'title'), ('title', )), get_all=False),
'formats': formats,
'subtitles': subtitles,

Wyświetl plik

@ -560,7 +560,7 @@ class FacebookIE(InfoExtractor):
js_data, lambda x: x['jsmods']['instances'], list) or [])
def extract_dash_manifest(video, formats):
dash_manifest = video.get('dash_manifest')
dash_manifest = traverse_obj(video, 'dash_manifest', 'playlist', expected_type=str)
if dash_manifest:
formats.extend(self._parse_mpd_formats(
compat_etree_fromstring(urllib.parse.unquote_plus(dash_manifest)),

Wyświetl plik

@ -0,0 +1,54 @@
import json
from .common import InfoExtractor
from ..utils import (
extract_attributes,
float_or_none,
get_element_html_by_id,
parse_iso8601,
)
from ..utils.traversal import traverse_obj
class FathomIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fathom\.video/share/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://fathom.video/share/G9mkjkspnohVVZ_L5nrsoPycyWcB8y7s',
'md5': '0decd5343b8f30ae268625e79a02b60f',
'info_dict': {
'id': '47200596',
'ext': 'mp4',
'title': 'eCom Inucbator - Coaching Session',
'duration': 8125.380507,
'timestamp': 1699048914,
'upload_date': '20231103',
},
}, {
'url': 'https://fathom.video/share/mEws3bybftHL2QLymxYEDeE21vtLxGVm',
'md5': '4f5cb382126c22d1aba8a939f9c49690',
'info_dict': {
'id': '46812957',
'ext': 'mp4',
'title': 'Jon, Lawrence, Neman chat about practice',
'duration': 3571.517847,
'timestamp': 1698933600,
'upload_date': '20231102',
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
props = traverse_obj(
get_element_html_by_id('app', webpage), ({extract_attributes}, 'data-page', {json.loads}, 'props'))
video_id = str(props['call']['id'])
return {
'id': video_id,
'formats': self._extract_m3u8_formats(props['call']['video_url'], video_id, 'mp4'),
**traverse_obj(props, {
'title': ('head', 'title', {str}),
'duration': ('duration', {float_or_none}),
'timestamp': ('call', 'started_at', {parse_iso8601}),
}),
}

Wyświetl plik

@ -2104,22 +2104,6 @@ class GenericIE(InfoExtractor):
'age_limit': 0,
},
},
{
'note': 'JW Player embed with unicode-escape sequences in URL',
'url': 'https://www.medici.tv/en/concerts/lahav-shani-mozart-mahler-israel-philharmonic-abu-dhabi-classics',
'info_dict': {
'id': 'm',
'ext': 'mp4',
'title': 'Lahav Shani conducts the Israel Philharmonic\'s first-ever concert in Abu Dhabi',
'description': 'Mahler\'s ',
'uploader': 'www.medici.tv',
'age_limit': 0,
'thumbnail': r're:^https?://.+\.jpg',
},
'params': {
'skip_download': True,
},
},
{
'url': 'https://shooshtime.com/videos/284002/just-out-of-the-shower-joi/',
'md5': 'e2f0a4c329f7986280b7328e24036d60',

Wyświetl plik

@ -58,21 +58,18 @@ class GofileIE(InfoExtractor):
return
account_data = self._download_json(
'https://api.gofile.io/createAccount', None, note='Getting a new guest account')
'https://api.gofile.io/accounts', None, 'Getting a new guest account', data=b'{}')
self._TOKEN = account_data['data']['token']
self._set_cookie('.gofile.io', 'accountToken', self._TOKEN)
def _entries(self, file_id):
query_params = {
'contentId': file_id,
'token': self._TOKEN,
'wt': '4fd6sg89d7s6', # From https://gofile.io/dist/js/alljs.js
}
query_params = {'wt': '4fd6sg89d7s6'} # From https://gofile.io/dist/js/alljs.js
password = self.get_param('videopassword')
if password:
query_params['password'] = hashlib.sha256(password.encode('utf-8')).hexdigest()
files = self._download_json(
'https://api.gofile.io/getContent', file_id, note='Getting filelist', query=query_params)
f'https://api.gofile.io/contents/{file_id}', file_id, 'Getting filelist',
query=query_params, headers={'Authorization': f'Bearer {self._TOKEN}'})
status = files['status']
if status == 'error-passwordRequired':
@ -82,7 +79,7 @@ class GofileIE(InfoExtractor):
raise ExtractorError(f'{self.IE_NAME} said: status {status}', expected=True)
found_files = False
for file in (try_get(files, lambda x: x['data']['contents'], dict) or {}).values():
for file in (try_get(files, lambda x: x['data']['children'], dict) or {}).values():
file_type, file_format = file.get('mimetype').split('/', 1)
if file_type not in ('video', 'audio') and file_format != 'vnd.mts':
continue

Wyświetl plik

@ -1,6 +1,6 @@
import base64
import binascii
import datetime
import datetime as dt
import hashlib
import hmac
import json
@ -422,7 +422,7 @@ class AwsIdp:
months = [None, 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
time_now = datetime.datetime.now(datetime.timezone.utc)
time_now = dt.datetime.now(dt.timezone.utc)
format_string = "{} {} {} %H:%M:%S UTC %Y".format(days[time_now.weekday()], months[time_now.month], time_now.day)
time_string = time_now.strftime(format_string)
return time_string

Wyświetl plik

@ -1,7 +1,8 @@
import re
from .cloudflarestream import CloudflareStreamIE
from .common import InfoExtractor
from ..utils import traverse_obj
from ..utils.traversal import traverse_obj
class HytaleIE(InfoExtractor):
@ -49,7 +50,7 @@ class HytaleIE(InfoExtractor):
entries = [
self.url_result(
f'https://cloudflarestream.com/{video_hash}/manifest/video.mpd?parentOrigin=https%3A%2F%2Fhytale.com',
title=self._titles.get(video_hash), url_transparent=True)
CloudflareStreamIE, title=self._titles.get(video_hash), url_transparent=True)
for video_hash in re.findall(
r'<stream\s+class\s*=\s*"ql-video\s+cf-stream"\s+src\s*=\s*"([a-f0-9]{32})"',
webpage)

Wyświetl plik

@ -76,6 +76,23 @@ class ImgurIE(ImgurBaseIE):
'thumbnail': 'https://i.imgur.com/jxBXAMCh.jpg',
'dislike_count': int,
},
}, {
# needs Accept header, ref: https://github.com/yt-dlp/yt-dlp/issues/9458
'url': 'https://imgur.com/zV03bd5',
'md5': '59df97884e8ba76143ff6b640a0e2904',
'info_dict': {
'id': 'zV03bd5',
'ext': 'mp4',
'title': 'Ive - Liz',
'timestamp': 1710491255,
'upload_date': '20240315',
'like_count': int,
'dislike_count': int,
'duration': 56.92,
'comment_count': int,
'release_timestamp': 1710491255,
'release_date': '20240315',
},
}]
def _real_extract(self, url):
@ -192,6 +209,7 @@ class ImgurIE(ImgurBaseIE):
'id': video_id,
'formats': formats,
'thumbnail': url_or_none(search('thumbnailUrl')),
'http_headers': {'Accept': '*/*'},
}

Wyświetl plik

@ -1,89 +1,143 @@
import functools
import math
import re
from .common import InfoExtractor
from ..utils import (
InAdvancePagedList,
clean_html,
int_or_none,
js_to_json,
make_archive_id,
smuggle_url,
unsmuggle_url,
url_basename,
url_or_none,
urlencode_postdata,
urljoin,
)
from ..utils.traversal import traverse_obj
class JioSaavnBaseIE(InfoExtractor):
def _extract_initial_data(self, url, audio_id):
webpage = self._download_webpage(url, audio_id)
return self._search_json(
r'window\.__INITIAL_DATA__\s*=', webpage,
'init json', audio_id, transform_source=js_to_json)
_API_URL = 'https://www.jiosaavn.com/api.php'
_VALID_BITRATES = {'16', '32', '64', '128', '320'}
class JioSaavnSongIE(JioSaavnBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?:jiosaavn\.com/song/[^/?#]+/|saavn\.com/s/song/(?:[^/?#]+/){3})(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.jiosaavn.com/song/leja-re/OQsEfQFVUXk',
'md5': '3b84396d15ed9e083c3106f1fa589c04',
'info_dict': {
'id': 'OQsEfQFVUXk',
'ext': 'mp4',
'title': 'Leja Re',
'album': 'Leja Re',
'thumbnail': 'https://c.saavncdn.com/258/Leja-Re-Hindi-2018-20181124024539-500x500.jpg',
'duration': 205,
'view_count': int,
'release_year': 2018,
},
}, {
'url': 'https://www.saavn.com/s/song/hindi/Saathiya/O-Humdum-Suniyo-Re/KAMiazoCblU',
'only_matching': True,
}]
_VALID_BITRATES = ('16', '32', '64', '128', '320')
def _real_extract(self, url):
audio_id = self._match_id(url)
extract_bitrates = self._configuration_arg('bitrate', ['128', '320'], ie_key='JioSaavn')
if invalid_bitrates := [br for br in extract_bitrates if br not in self._VALID_BITRATES]:
@functools.cached_property
def requested_bitrates(self):
requested_bitrates = self._configuration_arg('bitrate', ['128', '320'], ie_key='JioSaavn')
if invalid_bitrates := set(requested_bitrates) - self._VALID_BITRATES:
raise ValueError(
f'Invalid bitrate(s): {", ".join(invalid_bitrates)}. '
+ f'Valid bitrates are: {", ".join(self._VALID_BITRATES)}')
+ f'Valid bitrates are: {", ".join(sorted(self._VALID_BITRATES, key=int))}')
return requested_bitrates
song_data = self._extract_initial_data(url, audio_id)['song']['song']
formats = []
for bitrate in extract_bitrates:
def _extract_formats(self, song_data):
for bitrate in self.requested_bitrates:
media_data = self._download_json(
'https://www.jiosaavn.com/api.php', audio_id, f'Downloading format info for {bitrate}',
self._API_URL, song_data['id'],
f'Downloading format info for {bitrate}',
fatal=False, data=urlencode_postdata({
'__call': 'song.generateAuthToken',
'_format': 'json',
'bitrate': bitrate,
'url': song_data['encrypted_media_url'],
}))
if not media_data.get('auth_url'):
if not traverse_obj(media_data, ('auth_url', {url_or_none})):
self.report_warning(f'Unable to extract format info for {bitrate}')
continue
formats.append({
ext = media_data.get('type')
yield {
'url': media_data['auth_url'],
'ext': media_data.get('type'),
'ext': 'm4a' if ext == 'mp4' else ext,
'format_id': bitrate,
'abr': int(bitrate),
'vcodec': 'none',
}
def _extract_song(self, song_data, url=None):
info = traverse_obj(song_data, {
'id': ('id', {str}),
'title': ('song', {clean_html}),
'album': ('album', {clean_html}),
'thumbnail': ('image', {url_or_none}, {lambda x: re.sub(r'-\d+x\d+\.', '-500x500.', x)}),
'duration': ('duration', {int_or_none}),
'view_count': ('play_count', {int_or_none}),
'release_year': ('year', {int_or_none}),
'artists': ('primary_artists', {lambda x: x.split(', ') if x else None}),
'webpage_url': ('perma_url', {url_or_none}),
})
if webpage_url := info.get('webpage_url') or url:
info['display_id'] = url_basename(webpage_url)
info['_old_archive_ids'] = [make_archive_id(JioSaavnSongIE, info['display_id'])]
return info
def _call_api(self, type_, token, note='API', params={}):
return self._download_json(
self._API_URL, token, f'Downloading {note} JSON', f'Unable to download {note} JSON',
query={
'__call': 'webapi.get',
'_format': 'json',
'_marker': '0',
'ctx': 'web6dot0',
'token': token,
'type': type_,
**params,
})
return {
'id': audio_id,
'formats': formats,
**traverse_obj(song_data, {
'title': ('title', 'text'),
'album': ('album', 'text'),
'thumbnail': ('image', 0, {url_or_none}),
'duration': ('duration', {int_or_none}),
'view_count': ('play_count', {int_or_none}),
'release_year': ('year', {int_or_none}),
}),
}
def _yield_songs(self, playlist_data):
for song_data in traverse_obj(playlist_data, ('songs', lambda _, v: v['id'] and v['perma_url'])):
song_info = self._extract_song(song_data)
url = smuggle_url(song_info['webpage_url'], {
'id': song_data['id'],
'encrypted_media_url': song_data['encrypted_media_url'],
})
yield self.url_result(url, JioSaavnSongIE, url_transparent=True, **song_info)
class JioSaavnSongIE(JioSaavnBaseIE):
IE_NAME = 'jiosaavn:song'
_VALID_URL = r'https?://(?:www\.)?(?:jiosaavn\.com/song/[^/?#]+/|saavn\.com/s/song/(?:[^/?#]+/){3})(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.jiosaavn.com/song/leja-re/OQsEfQFVUXk',
'md5': '3b84396d15ed9e083c3106f1fa589c04',
'info_dict': {
'id': 'IcoLuefJ',
'display_id': 'OQsEfQFVUXk',
'ext': 'm4a',
'title': 'Leja Re',
'album': 'Leja Re',
'thumbnail': r're:https?://c.saavncdn.com/258/Leja-Re-Hindi-2018-20181124024539-500x500.jpg',
'duration': 205,
'view_count': int,
'release_year': 2018,
'artists': ['Sandesh Shandilya', 'Dhvani Bhanushali', 'Tanishk Bagchi'],
'_old_archive_ids': ['jiosaavnsong OQsEfQFVUXk'],
},
}, {
'url': 'https://www.saavn.com/s/song/hindi/Saathiya/O-Humdum-Suniyo-Re/KAMiazoCblU',
'only_matching': True,
}]
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url)
song_data = traverse_obj(smuggled_data, ({
'id': ('id', {str}),
'encrypted_media_url': ('encrypted_media_url', {str}),
}))
if 'id' in song_data and 'encrypted_media_url' in song_data:
result = {'id': song_data['id']}
else:
# only extract metadata if this is not a url_transparent result
song_data = self._call_api('song', self._match_id(url))['songs'][0]
result = self._extract_song(song_data, url)
result['formats'] = list(self._extract_formats(song_data))
return result
class JioSaavnAlbumIE(JioSaavnBaseIE):
IE_NAME = 'jiosaavn:album'
_VALID_URL = r'https?://(?:www\.)?(?:jio)?saavn\.com/album/[^/?#]+/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.jiosaavn.com/album/96/buIOjYZDrNA_',
@ -95,11 +149,46 @@ class JioSaavnAlbumIE(JioSaavnBaseIE):
}]
def _real_extract(self, url):
album_id = self._match_id(url)
album_view = self._extract_initial_data(url, album_id)['albumView']
display_id = self._match_id(url)
album_data = self._call_api('album', display_id)
return self.playlist_from_matches(
traverse_obj(album_view, (
'modules', lambda _, x: x['key'] == 'list', 'data', ..., 'title', 'action', {str})),
album_id, traverse_obj(album_view, ('album', 'title', 'text', {str})), ie=JioSaavnSongIE,
getter=lambda x: urljoin('https://www.jiosaavn.com/', x))
return self.playlist_result(
self._yield_songs(album_data), display_id, traverse_obj(album_data, ('title', {str})))
class JioSaavnPlaylistIE(JioSaavnBaseIE):
IE_NAME = 'jiosaavn:playlist'
_VALID_URL = r'https?://(?:www\.)?(?:jio)?saavn\.com/s/playlist/(?:[^/?#]+/){2}(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.jiosaavn.com/s/playlist/2279fbe391defa793ad7076929a2f5c9/mood-english/LlJ8ZWT1ibN5084vKHRj2Q__',
'info_dict': {
'id': 'LlJ8ZWT1ibN5084vKHRj2Q__',
'title': 'Mood English',
},
'playlist_mincount': 301,
}, {
'url': 'https://www.jiosaavn.com/s/playlist/2279fbe391defa793ad7076929a2f5c9/mood-hindi/DVR,pFUOwyXqIp77B1JF,A__',
'info_dict': {
'id': 'DVR,pFUOwyXqIp77B1JF,A__',
'title': 'Mood Hindi',
},
'playlist_mincount': 801,
}]
_PAGE_SIZE = 50
def _fetch_page(self, token, page):
return self._call_api(
'playlist', token, f'playlist page {page}', {'p': page, 'n': self._PAGE_SIZE})
def _entries(self, token, first_page_data, page):
page_data = first_page_data if not page else self._fetch_page(token, page + 1)
yield from self._yield_songs(page_data)
def _real_extract(self, url):
display_id = self._match_id(url)
playlist_data = self._fetch_page(display_id, 1)
total_pages = math.ceil(int(playlist_data['list_count']) / self._PAGE_SIZE)
return self.playlist_result(InAdvancePagedList(
functools.partial(self._entries, display_id, playlist_data),
total_pages, self._PAGE_SIZE), display_id, traverse_obj(playlist_data, ('listname', {str})))

Wyświetl plik

@ -1,4 +1,4 @@
import datetime
import datetime as dt
import urllib.parse
from .common import InfoExtractor
@ -50,8 +50,8 @@ class JoqrAgIE(InfoExtractor):
def _extract_start_timestamp(self, video_id, is_live):
def extract_start_time_from(date_str):
dt = datetime_from_str(date_str) + datetime.timedelta(hours=9)
date = dt.strftime('%Y%m%d')
dt_ = datetime_from_str(date_str) + dt.timedelta(hours=9)
date = dt_.strftime('%Y%m%d')
start_time = self._search_regex(
r'<h3[^>]+\bclass="dailyProgram-itemHeaderTime"[^>]*>[\s\d:]+–\s*(\d{1,2}:\d{1,2})',
self._download_webpage(
@ -60,7 +60,7 @@ class JoqrAgIE(InfoExtractor):
errnote=f'Failed to download program list of {date}') or '',
'start time', default=None)
if start_time:
return unified_timestamp(f'{dt.strftime("%Y/%m/%d")} {start_time} +09:00')
return unified_timestamp(f'{dt_.strftime("%Y/%m/%d")} {start_time} +09:00')
return None
start_timestamp = extract_start_time_from('today')
@ -80,14 +80,14 @@ class JoqrAgIE(InfoExtractor):
note='Downloading metadata', errnote='Failed to download metadata')
title = self._extract_metadata('Program_name', metadata)
if title == '放送休止':
if not title or title == '放送休止':
formats = []
live_status = 'is_upcoming'
release_timestamp = self._extract_start_timestamp(video_id, False)
msg = 'This stream is not currently live'
if release_timestamp:
msg += (' and will start at '
+ datetime.datetime.fromtimestamp(release_timestamp).strftime('%Y-%m-%d %H:%M:%S'))
+ dt.datetime.fromtimestamp(release_timestamp).strftime('%Y-%m-%d %H:%M:%S'))
self.raise_no_formats(msg, expected=True)
else:
m3u8_path = self._search_regex(

Wyświetl plik

@ -13,7 +13,8 @@ from ..utils import (
class KickBaseIE(InfoExtractor):
def _real_initialize(self):
self._request_webpage(HEADRequest('https://kick.com/'), None, 'Setting up session', fatal=False)
self._request_webpage(
HEADRequest('https://kick.com/'), None, 'Setting up session', fatal=False, impersonate=True)
xsrf_token = self._get_cookies('https://kick.com/').get('XSRF-TOKEN')
if not xsrf_token:
self.write_debug('kick.com did not set XSRF-TOKEN cookie')
@ -25,7 +26,7 @@ class KickBaseIE(InfoExtractor):
def _call_api(self, path, display_id, note='Downloading API JSON', headers={}, **kwargs):
return self._download_json(
f'https://kick.com/api/v1/{path}', display_id, note=note,
headers=merge_dicts(headers, self._API_HEADERS), **kwargs)
headers=merge_dicts(headers, self._API_HEADERS), impersonate=True, **kwargs)
class KickIE(KickBaseIE):
@ -82,26 +83,27 @@ class KickIE(KickBaseIE):
class KickVODIE(KickBaseIE):
_VALID_URL = r'https?://(?:www\.)?kick\.com/video/(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})'
_TESTS = [{
'url': 'https://kick.com/video/54244b5e-050a-4df4-a013-b2433dafbe35',
'md5': '73691206a6a49db25c5aa1588e6538fc',
'url': 'https://kick.com/video/58bac65b-e641-4476-a7ba-3707a35e60e3',
'md5': '3870f94153e40e7121a6e46c068b70cb',
'info_dict': {
'id': '54244b5e-050a-4df4-a013-b2433dafbe35',
'id': '58bac65b-e641-4476-a7ba-3707a35e60e3',
'ext': 'mp4',
'title': 'Making 710-carBoosting. Kinda No Pixel inspired. !guilded - !links',
'description': 'md5:a0d3546bf7955d0a8252ffe0fd6f518f',
'channel': 'kmack710',
'channel_id': '16278',
'uploader': 'Kmack710',
'uploader_id': '16412',
'upload_date': '20221206',
'timestamp': 1670318289,
'duration': 40104.0,
'title': '🤠REBIRTH IS BACK!!!!🤠!stake CODE JAREDFPS 🤠',
'description': 'md5:02b0c46f9b4197fb545ab09dddb85b1d',
'channel': 'jaredfps',
'channel_id': '26608',
'uploader': 'JaredFPS',
'uploader_id': '26799',
'upload_date': '20240402',
'timestamp': 1712097108,
'duration': 33859.0,
'thumbnail': r're:^https?://.*\.jpg',
'categories': ['Grand Theft Auto V'],
'categories': ['Call of Duty: Warzone'],
},
'params': {
'skip_download': 'm3u8',
},
'expected_warnings': [r'impersonation'],
}]
def _real_extract(self, url):

Wyświetl plik

@ -1,4 +1,4 @@
import datetime
import datetime as dt
import hashlib
import re
import time
@ -185,7 +185,7 @@ class LeIE(InfoExtractor):
publish_time = parse_iso8601(self._html_search_regex(
r'发布时间&nbsp;([^<>]+) ', page, 'publish time', default=None),
delimiter=' ', timezone=datetime.timedelta(hours=8))
delimiter=' ', timezone=dt.timedelta(hours=8))
description = self._html_search_meta('description', page, fatal=False)
return {

Wyświetl plik

@ -1,4 +1,4 @@
from itertools import zip_longest
import itertools
import re
from .common import InfoExtractor
@ -156,7 +156,7 @@ class LinkedInLearningIE(LinkedInLearningBaseIE):
def json2srt(self, transcript_lines, duration=None):
srt_data = ''
for line, (line_dict, next_dict) in enumerate(zip_longest(transcript_lines, transcript_lines[1:])):
for line, (line_dict, next_dict) in enumerate(itertools.zip_longest(transcript_lines, transcript_lines[1:])):
start_time, caption = line_dict['transcriptStartAt'] / 1000, line_dict['caption']
end_time = next_dict['transcriptStartAt'] / 1000 if next_dict else duration or start_time + 1
srt_data += '%d\n%s --> %s\n%s\n\n' % (line + 1, srt_subtitles_timecode(start_time),

Wyświetl plik

@ -0,0 +1,461 @@
import json
import textwrap
import urllib.parse
import uuid
from .common import InfoExtractor
from ..utils import (
ExtractorError,
determine_ext,
filter_dict,
get_first,
int_or_none,
parse_iso8601,
update_url,
url_or_none,
variadic,
)
from ..utils.traversal import traverse_obj
class LoomIE(InfoExtractor):
IE_NAME = 'loom'
_VALID_URL = r'https?://(?:www\.)?loom\.com/(?:share|embed)/(?P<id>[\da-f]{32})'
_EMBED_REGEX = [rf'<iframe[^>]+\bsrc=["\'](?P<url>{_VALID_URL})']
_TESTS = [{
# m3u8 raw-url, mp4 transcoded-url, cdn url == raw-url, json subs only
'url': 'https://www.loom.com/share/43d05f362f734614a2e81b4694a3a523',
'md5': 'bfc2d7e9c2e0eb4813212230794b6f42',
'info_dict': {
'id': '43d05f362f734614a2e81b4694a3a523',
'ext': 'mp4',
'title': 'A Ruler for Windows - 28 March 2022',
'uploader': 'wILLIAM PIP',
'upload_date': '20220328',
'timestamp': 1648454238,
'duration': 27,
},
}, {
# webm raw-url, mp4 transcoded-url, cdn url == transcoded-url, no subs
'url': 'https://www.loom.com/share/c43a642f815f4378b6f80a889bb73d8d',
'md5': '70f529317be8cf880fcc2c649a531900',
'info_dict': {
'id': 'c43a642f815f4378b6f80a889bb73d8d',
'ext': 'webm',
'title': 'Lilah Nielsen Intro Video',
'uploader': 'Lilah Nielsen',
'upload_date': '20200826',
'timestamp': 1598480716,
'duration': 20,
},
}, {
# m3u8 raw-url, mp4 transcoded-url, cdn url == raw-url, vtt sub and json subs
'url': 'https://www.loom.com/share/9458bcbf79784162aa62ffb8dd66201b',
'md5': '51737ec002969dd28344db4d60b9cbbb',
'info_dict': {
'id': '9458bcbf79784162aa62ffb8dd66201b',
'ext': 'mp4',
'title': 'Sharing screen with gpt-4',
'description': 'Sharing screen with GPT 4 vision model and asking questions to guide through blender.',
'uploader': 'Suneel Matham',
'chapters': 'count:3',
'upload_date': '20231109',
'timestamp': 1699518978,
'duration': 93,
},
}, {
# mpd raw-url, mp4 transcoded-url, cdn url == raw-url, no subs
'url': 'https://www.loom.com/share/24351eb8b317420289b158e4b7e96ff2',
'info_dict': {
'id': '24351eb8b317420289b158e4b7e96ff2',
'ext': 'webm',
'title': 'OMFG clown',
'description': 'md5:285c5ee9d62aa087b7e3271b08796815',
'uploader': 'MrPumkin B',
'upload_date': '20210924',
'timestamp': 1632519618,
'duration': 210,
},
'params': {'skip_download': 'dash'},
}, {
# password-protected
'url': 'https://www.loom.com/share/50e26e8aeb7940189dff5630f95ce1f4',
'md5': '5cc7655e7d55d281d203f8ffd14771f7',
'info_dict': {
'id': '50e26e8aeb7940189dff5630f95ce1f4',
'ext': 'mp4',
'title': 'iOS Mobile Upload',
'uploader': 'Simon Curran',
'upload_date': '20200520',
'timestamp': 1590000123,
'duration': 35,
},
'params': {'videopassword': 'seniorinfants2'},
}, {
# embed, transcoded-url endpoint sends empty JSON response
'url': 'https://www.loom.com/embed/ddcf1c1ad21f451ea7468b1e33917e4e',
'md5': '8488817242a0db1cb2ad0ea522553cf6',
'info_dict': {
'id': 'ddcf1c1ad21f451ea7468b1e33917e4e',
'ext': 'mp4',
'title': 'CF Reset User\'s Password',
'uploader': 'Aimee Heintz',
'upload_date': '20220707',
'timestamp': 1657216459,
'duration': 181,
},
'expected_warnings': ['Failed to parse JSON'],
}]
_WEBPAGE_TESTS = [{
'url': 'https://www.loom.com/community/e1229802a8694a09909e8ba0fbb6d073-pg',
'md5': 'ec838cd01b576cf0386f32e1ae424609',
'info_dict': {
'id': 'e1229802a8694a09909e8ba0fbb6d073',
'ext': 'mp4',
'title': 'Rexie Jane Cimafranca - Founder\'s Presentation',
'uploader': 'Rexie Cimafranca',
'upload_date': '20230213',
'duration': 247,
'timestamp': 1676274030,
},
}]
_GRAPHQL_VARIABLES = {
'GetVideoSource': {
'acceptableMimes': ['DASH', 'M3U8', 'MP4'],
},
}
_GRAPHQL_QUERIES = {
'GetVideoSSR': textwrap.dedent('''\
query GetVideoSSR($videoId: ID!, $password: String) {
getVideo(id: $videoId, password: $password) {
__typename
... on PrivateVideo {
id
status
message
__typename
}
... on VideoPasswordMissingOrIncorrect {
id
message
__typename
}
... on RegularUserVideo {
id
__typename
createdAt
description
download_enabled
folder_id
is_protected
needs_password
owner {
display_name
__typename
}
privacy
s3_id
name
video_properties {
avgBitRate
client
camera_enabled
client_version
duration
durationMs
format
height
microphone_enabled
os
os_version
recordingClient
recording_type
recording_version
screen_type
tab_audio
trim_duration
width
__typename
}
playable_duration
source_duration
visibility
}
}
}\n'''),
'GetVideoSource': textwrap.dedent('''\
query GetVideoSource($videoId: ID!, $password: String, $acceptableMimes: [CloudfrontVideoAcceptableMime]) {
getVideo(id: $videoId, password: $password) {
... on RegularUserVideo {
id
nullableRawCdnUrl(acceptableMimes: $acceptableMimes, password: $password) {
url
__typename
}
__typename
}
__typename
}
}\n'''),
'FetchVideoTranscript': textwrap.dedent('''\
query FetchVideoTranscript($videoId: ID!, $password: String) {
fetchVideoTranscript(videoId: $videoId, password: $password) {
... on VideoTranscriptDetails {
id
video_id
source_url
captions_source_url
__typename
}
... on GenericError {
message
__typename
}
__typename
}
}\n'''),
'FetchChapters': textwrap.dedent('''\
query FetchChapters($videoId: ID!, $password: String) {
fetchVideoChapters(videoId: $videoId, password: $password) {
... on VideoChapters {
video_id
content
__typename
}
... on EmptyChaptersPayload {
content
__typename
}
... on InvalidRequestWarning {
message
__typename
}
... on Error {
message
__typename
}
__typename
}
}\n'''),
}
_APOLLO_GRAPHQL_VERSION = '0a1856c'
def _call_graphql_api(self, operations, video_id, note=None, errnote=None):
password = self.get_param('videopassword')
return self._download_json(
'https://www.loom.com/graphql', video_id, note or 'Downloading GraphQL JSON',
errnote or 'Failed to download GraphQL JSON', headers={
'Accept': 'application/json',
'Content-Type': 'application/json',
'x-loom-request-source': f'loom_web_{self._APOLLO_GRAPHQL_VERSION}',
'apollographql-client-name': 'web',
'apollographql-client-version': self._APOLLO_GRAPHQL_VERSION,
}, data=json.dumps([{
'operationName': operation_name,
'variables': {
'videoId': video_id,
'password': password,
**self._GRAPHQL_VARIABLES.get(operation_name, {}),
},
'query': self._GRAPHQL_QUERIES[operation_name],
} for operation_name in variadic(operations)], separators=(',', ':')).encode())
def _call_url_api(self, endpoint, video_id):
response = self._download_json(
f'https://www.loom.com/api/campaigns/sessions/{video_id}/{endpoint}', video_id,
f'Downloading {endpoint} JSON', f'Failed to download {endpoint} JSON', fatal=False,
headers={'Accept': 'application/json', 'Content-Type': 'application/json'},
data=json.dumps({
'anonID': str(uuid.uuid4()),
'deviceID': None,
'force_original': False, # HTTP error 401 if True
'password': self.get_param('videopassword'),
}, separators=(',', ':')).encode())
return traverse_obj(response, ('url', {url_or_none}))
def _extract_formats(self, video_id, metadata, gql_data):
formats = []
video_properties = traverse_obj(metadata, ('video_properties', {
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
'acodec': ('microphone_enabled', {lambda x: 'none' if x is False else None}),
}))
def get_formats(format_url, format_id, quality):
if not format_url:
return
ext = determine_ext(format_url)
query = urllib.parse.urlparse(format_url).query
if ext == 'm3u8':
# Extract pre-merged HLS formats to avoid buggy parsing of metadata in split playlists
format_url = format_url.replace('-split.m3u8', '.m3u8')
m3u8_formats = self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id=f'hls-{format_id}', fatal=False, quality=quality)
for fmt in m3u8_formats:
yield {
**fmt,
'url': update_url(fmt['url'], query=query),
'extra_param_to_segment_url': query,
}
elif ext == 'mpd':
dash_formats = self._extract_mpd_formats(
format_url, video_id, mpd_id=f'dash-{format_id}', fatal=False)
for fmt in dash_formats:
yield {
**fmt,
'extra_param_to_segment_url': query,
'quality': quality,
}
else:
yield {
'url': format_url,
'ext': ext,
'format_id': f'http-{format_id}',
'quality': quality,
**video_properties,
}
raw_url = self._call_url_api('raw-url', video_id)
formats.extend(get_formats(raw_url, 'raw', quality=1)) # original quality
transcoded_url = self._call_url_api('transcoded-url', video_id)
formats.extend(get_formats(transcoded_url, 'transcoded', quality=-1)) # transcoded quality
cdn_url = get_first(gql_data, ('data', 'getVideo', 'nullableRawCdnUrl', 'url', {url_or_none}))
# cdn_url is usually a dupe, but the raw-url/transcoded-url endpoints could return errors
valid_urls = [update_url(url, query=None) for url in (raw_url, transcoded_url) if url]
if cdn_url and update_url(cdn_url, query=None) not in valid_urls:
formats.extend(get_formats(cdn_url, 'cdn', quality=0)) # could be original or transcoded
return formats
def _real_extract(self, url):
video_id = self._match_id(url)
metadata = get_first(
self._call_graphql_api('GetVideoSSR', video_id, 'Downloading GraphQL metadata JSON'),
('data', 'getVideo', {dict})) or {}
if metadata.get('__typename') == 'VideoPasswordMissingOrIncorrect':
if not self.get_param('videopassword'):
raise ExtractorError(
'This video is password-protected, use the --video-password option', expected=True)
raise ExtractorError('Invalid video password', expected=True)
gql_data = self._call_graphql_api(['FetchChapters', 'FetchVideoTranscript', 'GetVideoSource'], video_id)
duration = traverse_obj(metadata, ('video_properties', 'duration', {int_or_none}))
return {
'id': video_id,
'duration': duration,
'chapters': self._extract_chapters_from_description(
get_first(gql_data, ('data', 'fetchVideoChapters', 'content', {str})), duration) or None,
'formats': self._extract_formats(video_id, metadata, gql_data),
'subtitles': filter_dict({
'en': traverse_obj(gql_data, (
..., 'data', 'fetchVideoTranscript',
('source_url', 'captions_source_url'), {
'url': {url_or_none},
})) or None,
}),
**traverse_obj(metadata, {
'title': ('name', {str}),
'description': ('description', {str}),
'uploader': ('owner', 'display_name', {str}),
'timestamp': ('createdAt', {parse_iso8601}),
}),
}
class LoomFolderIE(InfoExtractor):
IE_NAME = 'loom:folder'
_VALID_URL = r'https?://(?:www\.)?loom\.com/share/folder/(?P<id>[\da-f]{32})'
_TESTS = [{
# 2 subfolders, no videos in root
'url': 'https://www.loom.com/share/folder/997db4db046f43e5912f10dc5f817b5c',
'playlist_mincount': 16,
'info_dict': {
'id': '997db4db046f43e5912f10dc5f817b5c',
'title': 'Blending Lessons',
},
}, {
# only videos, no subfolders
'url': 'https://www.loom.com/share/folder/9a8a87f6b6f546d9a400c8e7575ff7f2',
'playlist_mincount': 12,
'info_dict': {
'id': '9a8a87f6b6f546d9a400c8e7575ff7f2',
'title': 'List A- a, i, o',
},
}, {
# videos in root and empty subfolder
'url': 'https://www.loom.com/share/folder/886e534218c24fd292e97e9563078cc4',
'playlist_mincount': 21,
'info_dict': {
'id': '886e534218c24fd292e97e9563078cc4',
'title': 'Medicare Agent Training videos',
},
}, {
# videos in root and videos in subfolders
'url': 'https://www.loom.com/share/folder/b72c4ecdf04745da9403926d80a40c38',
'playlist_mincount': 21,
'info_dict': {
'id': 'b72c4ecdf04745da9403926d80a40c38',
'title': 'Quick Altos Q & A Tutorials',
},
}, {
# recursive folder extraction
'url': 'https://www.loom.com/share/folder/8b458a94e0e4449b8df9ea7a68fafc4e',
'playlist_count': 23,
'info_dict': {
'id': '8b458a94e0e4449b8df9ea7a68fafc4e',
'title': 'Sezer Texting Guide',
},
}, {
# more than 50 videos in 1 folder
'url': 'https://www.loom.com/share/folder/e056a91d290d47ca9b00c9d1df56c463',
'playlist_mincount': 61,
'info_dict': {
'id': 'e056a91d290d47ca9b00c9d1df56c463',
'title': 'User Videos',
},
}, {
# many subfolders
'url': 'https://www.loom.com/share/folder/c2dde8cc67454f0e99031677279d8954',
'playlist_mincount': 75,
'info_dict': {
'id': 'c2dde8cc67454f0e99031677279d8954',
'title': 'Honors 1',
},
}, {
'url': 'https://www.loom.com/share/folder/bae17109a68146c7803454f2893c8cf8/Edpuzzle',
'only_matching': True,
}]
def _extract_folder_data(self, folder_id):
return self._download_json(
f'https://www.loom.com/v1/folders/{folder_id}', folder_id,
'Downloading folder info JSON', query={'limit': '10000'})
def _extract_folder_entries(self, folder_id, initial_folder_data=None):
folder_data = initial_folder_data or self._extract_folder_data(folder_id)
for video in traverse_obj(folder_data, ('videos', lambda _, v: v['id'])):
video_id = video['id']
yield self.url_result(
f'https://www.loom.com/share/{video_id}', LoomIE, video_id, video.get('name'))
# Recurse into subfolders
for subfolder_id in traverse_obj(folder_data, (
'folders', lambda _, v: v['id'] != folder_id, 'id', {str})):
yield from self._extract_folder_entries(subfolder_id)
def _real_extract(self, url):
playlist_id = self._match_id(url)
playlist_data = self._extract_folder_data(playlist_id)
return self.playlist_result(
self._extract_folder_entries(playlist_id, playlist_data), playlist_id,
traverse_obj(playlist_data, ('folder', 'name', {str.strip})))

Wyświetl plik

@ -1,4 +1,3 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
traverse_obj,

Wyświetl plik

@ -1,67 +1,153 @@
import urllib.parse
from .common import InfoExtractor
from ..utils import (
unified_strdate,
update_url_query,
urlencode_postdata,
filter_dict,
parse_iso8601,
traverse_obj,
try_call,
url_or_none,
)
class MediciIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?medici\.tv/#!/(?P<id>[^?#&]+)'
_TEST = {
'url': 'http://www.medici.tv/#!/daniel-harding-frans-helmerson-verbier-festival-music-camp',
'md5': '004c21bb0a57248085b6ff3fec72719d',
_VALID_URL = r'https?://(?:(?P<sub>www|edu)\.)?medici\.tv/[a-z]{2}/[\w.-]+/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.medici.tv/en/operas/thomas-ades-the-exterminating-angel-calixto-bieito-opera-bastille-paris',
'md5': 'd483f74e7a7a9eac0dbe152ab189050d',
'info_dict': {
'id': '3059',
'ext': 'flv',
'title': 'Daniel Harding conducts the Verbier Festival Music Camp \u2013 With Frans Helmerson',
'description': 'md5:322a1e952bafb725174fd8c1a8212f58',
'thumbnail': r're:^https?://.*\.jpg$',
'upload_date': '20170408',
'id': '8032',
'ext': 'mp4',
'title': 'Thomas Adès\'s The Exterminating Angel',
'description': 'md5:708ae6350dadc604225b4a6e32482bab',
'thumbnail': r're:https://.+/.+\.jpg',
'upload_date': '20240304',
'timestamp': 1709561766,
'display_id': 'thomas-ades-the-exterminating-angel-calixto-bieito-opera-bastille-paris',
},
}
'expected_warnings': [r'preview'],
}, {
'url': 'https://edu.medici.tv/en/operas/wagner-lohengrin-paris-opera-kirill-serebrennikov-piotr-beczala-kwangchul-youn-johanni-van-oostrum',
'md5': '4ef3f4079a6e1c617584463a9eb84f99',
'info_dict': {
'id': '7900',
'ext': 'mp4',
'title': 'Wagner\'s Lohengrin',
'description': 'md5:a384a62937866101f86902f21752cd89',
'thumbnail': r're:https://.+/.+\.jpg',
'upload_date': '20231017',
'timestamp': 1697554771,
'display_id': 'wagner-lohengrin-paris-opera-kirill-serebrennikov-piotr-beczala-kwangchul-youn-johanni-van-oostrum',
},
'expected_warnings': [r'preview'],
}, {
'url': 'https://www.medici.tv/en/concerts/sergey-smbatyan-conducts-mansurian-chouchane-siranossian-mario-brunello',
'md5': '9dd757e53b22b2511e85ea9ea60e4815',
'info_dict': {
'id': '5712',
'ext': 'mp4',
'title': 'Sergey Smbatyan conducts Tigran Mansurian — With Chouchane Siranossian and Mario Brunello',
'thumbnail': r're:https://.+/.+\.jpg',
'description': 'md5:9411fe44c874bb10e9af288c65816e41',
'upload_date': '20200323',
'timestamp': 1584975600,
'display_id': 'sergey-smbatyan-conducts-mansurian-chouchane-siranossian-mario-brunello',
},
'expected_warnings': [r'preview'],
}, {
'url': 'https://www.medici.tv/en/ballets/carmen-ballet-choregraphie-de-jiri-bubenicek-teatro-dellopera-di-roma',
'md5': '40f5e76cb701a97a6d7ba23b62c49990',
'info_dict': {
'id': '7857',
'ext': 'mp4',
'title': 'Carmen by Jiří Bubeníček after Roland Petit, music by Bizet, de Falla, Castelnuovo-Tedesco, and Bonolis',
'thumbnail': r're:https://.+/.+\.jpg',
'description': 'md5:0f15a15611ed748020c769873e10a8bb',
'upload_date': '20240223',
'timestamp': 1708707600,
'display_id': 'carmen-ballet-choregraphie-de-jiri-bubenicek-teatro-dellopera-di-roma',
},
'expected_warnings': [r'preview'],
}, {
'url': 'https://www.medici.tv/en/documentaries/la-sonnambula-liege-2023-documentaire',
'md5': '87ff198018ce79a34757ab0dd6f21080',
'info_dict': {
'id': '7513',
'ext': 'mp4',
'title': 'La Sonnambula',
'thumbnail': r're:https://.+/.+\.jpg',
'description': 'md5:0caf9109a860fd50cd018df062a67f34',
'upload_date': '20231103',
'timestamp': 1699010830,
'display_id': 'la-sonnambula-liege-2023-documentaire',
},
'expected_warnings': [r'preview'],
}, {
'url': 'https://edu.medici.tv/en/masterclasses/yvonne-loriod-olivier-messiaen',
'md5': 'fb5dcec46d76ad20fbdbaabb01da191d',
'info_dict': {
'id': '3024',
'ext': 'mp4',
'title': 'Olivier Messiaen and Yvonne Loriod, pianists and teachers',
'thumbnail': r're:https://.+/.+\.jpg',
'description': 'md5:aab948e2f7690214b5c28896c83f1fc1',
'upload_date': '20150223',
'timestamp': 1424706608,
'display_id': 'yvonne-loriod-olivier-messiaen',
},
'skip': 'Requires authentication; preview starts in the middle',
}, {
'url': 'https://www.medici.tv/en/jazz/makaya-mccraven-la-rochelle',
'md5': '4cc279a8b06609782747c8f50beea2b3',
'info_dict': {
'id': '7922',
'ext': 'mp4',
'title': 'NEW: Makaya McCraven in La Rochelle',
'thumbnail': r're:https://.+/.+\.jpg',
'description': 'md5:b5a8aaeb6993d8ccb18bde8abb8aa8d2',
'upload_date': '20231228',
'timestamp': 1703754863,
'display_id': 'makaya-mccraven-la-rochelle',
},
'expected_warnings': [r'preview'],
}]
def _real_extract(self, url):
video_id = self._match_id(url)
display_id, subdomain = self._match_valid_url(url).group('id', 'sub')
self._request_webpage(url, display_id, 'Requesting CSRF token cookie')
# Sets csrftoken cookie
self._download_webpage(url, video_id)
MEDICI_URL = 'http://www.medici.tv/'
subdomain = 'edu-' if subdomain == 'edu' else ''
origin = f'https://{urllib.parse.urlparse(url).hostname}'
data = self._download_json(
MEDICI_URL, video_id,
data=urlencode_postdata({
'json': 'true',
'page': '/%s' % video_id,
'timezone_offset': -420,
}), headers={
'X-CSRFToken': self._get_cookies(url)['csrftoken'].value,
'X-Requested-With': 'XMLHttpRequest',
'Referer': MEDICI_URL,
'Content-Type': 'application/x-www-form-urlencoded',
})
f'https://api.medici.tv/{subdomain}satie/edito/movie-file/{display_id}/', display_id,
headers=filter_dict({
'Authorization': try_call(
lambda: urllib.parse.unquote(self._get_cookies(url)['auth._token.mAuth'].value)),
'Device-Type': 'web',
'Origin': origin,
'Referer': f'{origin}/',
'Accept': 'application/json, text/plain, */*',
}))
video = data['video']['videos']['video1']
if not traverse_obj(data, ('video', 'is_full_video')) and traverse_obj(
data, ('video', 'is_limited_by_user_access')):
self.report_warning(
'The full video is for subscribers only. Only previews will be downloaded. If you '
'have used the --cookies-from-browser option, try using the --cookies option instead')
title = video.get('nom') or data['title']
video_id = video.get('id') or video_id
formats = self._extract_f4m_formats(
update_url_query(video['url_akamai'], {
'hdcore': '3.1.0',
'plugin=aasp': '3.1.0.43.124',
}), video_id, f4m_id='hds')
description = data.get('meta_description')
thumbnail = video.get('url_thumbnail') or data.get('main_image')
upload_date = unified_strdate(data['video'].get('date'))
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
data['video']['video_url'], display_id, 'mp4')
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
'id': str(data['id']),
'display_id': display_id,
'formats': formats,
'subtitles': subtitles,
**traverse_obj(data, {
'title': ('title', {str}),
'description': ('subtitle', {str}),
'thumbnail': ('picture', {url_or_none}),
'timestamp': ('date_publish', {parse_iso8601}),
}),
}

Wyświetl plik

@ -1,4 +1,4 @@
from base64 import b64decode
import base64
from .common import InfoExtractor
from ..utils import (
@ -81,7 +81,7 @@ class MicrosoftStreamIE(InfoExtractor):
'url': thumbnail_url,
}
thumb_name = url_basename(thumbnail_url)
thumb_name = str(b64decode(thumb_name + '=' * (-len(thumb_name) % 4)))
thumb_name = str(base64.b64decode(thumb_name + '=' * (-len(thumb_name) % 4)))
thumb.update(parse_resolution(thumb_name))
thumbnails.append(thumb)

Wyświetl plik

@ -1,5 +1,13 @@
from .common import InfoExtractor
from ..utils import UserNotLive, traverse_obj
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
UserNotLive,
int_or_none,
str_or_none,
url_or_none,
)
from ..utils.traversal import traverse_obj
class MixchIE(InfoExtractor):
@ -7,17 +15,20 @@ class MixchIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?mixch\.tv/u/(?P<id>\d+)'
_TESTS = [{
'url': 'https://mixch.tv/u/16236849/live',
'url': 'https://mixch.tv/u/16943797/live',
'skip': 'don\'t know if this live persists',
'info_dict': {
'id': '16236849',
'title': '24配信シェア⭕投票🙏💦',
'comment_count': 13145,
'view_count': 28348,
'timestamp': 1636189377,
'uploader': '🦥伊咲👶🏻#フレアワ',
'uploader_id': '16236849',
}
'id': '16943797',
'ext': 'mp4',
'title': '#EntView #カリナ #セブチ 2024-05-05 06:58',
'comment_count': int,
'view_count': int,
'timestamp': 1714726805,
'uploader': 'Ent.View K-news🎶💕',
'uploader_id': '16943797',
'live_status': 'is_live',
'upload_date': '20240503',
},
}, {
'url': 'https://mixch.tv/u/16137876/live',
'only_matching': True,
@ -25,31 +36,41 @@ class MixchIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(f'https://mixch.tv/u/{video_id}/live', video_id)
initial_js_state = self._parse_json(self._search_regex(
r'(?m)^\s*window\.__INITIAL_JS_STATE__\s*=\s*(\{.+?\});\s*$', webpage, 'initial JS state'), video_id)
if not initial_js_state.get('liveInfo'):
data = self._download_json(f'https://mixch.tv/api-web/users/{video_id}/live', video_id)
if not traverse_obj(data, ('liveInfo', {dict})):
raise UserNotLive(video_id=video_id)
return {
'id': video_id,
'title': traverse_obj(initial_js_state, ('liveInfo', 'title')),
'comment_count': traverse_obj(initial_js_state, ('liveInfo', 'comments')),
'view_count': traverse_obj(initial_js_state, ('liveInfo', 'visitor')),
'timestamp': traverse_obj(initial_js_state, ('liveInfo', 'created')),
'uploader': traverse_obj(initial_js_state, ('broadcasterInfo', 'name')),
'uploader_id': video_id,
**traverse_obj(data, {
'title': ('liveInfo', 'title', {str}),
'comment_count': ('liveInfo', 'comments', {int_or_none}),
'view_count': ('liveInfo', 'visitor', {int_or_none}),
'timestamp': ('liveInfo', 'created', {int_or_none}),
'uploader': ('broadcasterInfo', 'name', {str}),
}),
'formats': [{
'format_id': 'hls',
'url': (traverse_obj(initial_js_state, ('liveInfo', 'hls'))
or f'https://d1hd0ww6piyb43.cloudfront.net/hls/torte_{video_id}.m3u8'),
'url': data['liveInfo']['hls'],
'ext': 'mp4',
'protocol': 'm3u8',
}],
'is_live': True,
'__post_extractor': self.extract_comments(video_id),
}
def _get_comments(self, video_id):
yield from traverse_obj(self._download_json(
f'https://mixch.tv/api-web/lives/{video_id}/messages', video_id,
note='Downloading comments', errnote='Failed to download comments'), (..., {
'author': ('name', {str}),
'author_id': ('user_id', {str_or_none}),
'id': ('message_id', {str}, {lambda x: x or None}),
'text': ('body', {str}),
'timestamp': ('created', {int}),
}))
class MixchArchiveIE(InfoExtractor):
IE_NAME = 'mixch:archive'
@ -60,22 +81,38 @@ class MixchArchiveIE(InfoExtractor):
'skip': 'paid video, no DRM. expires at Jan 23',
'info_dict': {
'id': '421',
'ext': 'mp4',
'title': '96NEKO SHOW TIME',
}
}, {
'url': 'https://mixch.tv/archive/1213',
'skip': 'paid video, no DRM. expires at Dec 31, 2023',
'info_dict': {
'id': '1213',
'ext': 'mp4',
'title': '【特別トーク番組アーカイブス】Merm4id×燐舞曲 2nd LIVE「VERSUS」',
'release_date': '20231201',
'thumbnail': str,
}
}, {
'url': 'https://mixch.tv/archive/1214',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
html5_videos = self._parse_html5_media_entries(
url, webpage.replace('video-js', 'video'), video_id, 'hls')
if not html5_videos:
self.raise_login_required(method='cookies')
infodict = html5_videos[0]
infodict.update({
try:
info_json = self._download_json(
f'https://mixch.tv/api-web/archive/{video_id}', video_id)['archive']
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 401:
self.raise_login_required()
raise
return {
'id': video_id,
'title': self._html_search_regex(r'class="archive-title">(.+?)</', webpage, 'title')
})
return infodict
'title': traverse_obj(info_json, ('title', {str})),
'formats': self._extract_m3u8_formats(info_json['archiveURL'], video_id),
'thumbnail': traverse_obj(info_json, ('thumbnailURL', {url_or_none})),
}

Wyświetl plik

@ -1,4 +1,4 @@
import datetime
import datetime as dt
import re
import urllib.parse
@ -151,7 +151,7 @@ class MotherlessIE(InfoExtractor):
'd': 'days',
}
kwargs = {_AGO_UNITS.get(uploaded_ago[-1]): delta}
upload_date = (datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(**kwargs)).strftime('%Y%m%d')
upload_date = (dt.datetime.now(dt.timezone.utc) - dt.timedelta(**kwargs)).strftime('%Y%m%d')
comment_count = len(re.findall(r'''class\s*=\s*['"]media-comment-contents\b''', webpage))
uploader_id = self._html_search_regex(

Wyświetl plik

@ -4,8 +4,8 @@ import hmac
import itertools
import json
import re
import urllib.parse
import time
from urllib.parse import parse_qs, urlparse
from .common import InfoExtractor
from ..utils import (
@ -388,7 +388,7 @@ class NaverNowIE(NaverBaseIE):
def _real_extract(self, url):
show_id = self._match_id(url)
qs = parse_qs(urlparse(url).query)
qs = urllib.parse.parse_qs(urllib.parse.urlparse(url).query)
if not self._yes_playlist(show_id, qs.get('shareHightlight')):
return self._extract_highlight(show_id, qs['shareHightlight'][0])

Wyświetl plik

@ -1,9 +1,9 @@
import hashlib
import itertools
import json
import random
import re
import time
from hashlib import md5
from random import randint
from .common import InfoExtractor
from ..aes import aes_ecb_encrypt, pkcs7_padding
@ -34,7 +34,7 @@ class NetEaseMusicBaseIE(InfoExtractor):
request_text = json.dumps({**query_body, 'header': cookies}, separators=(',', ':'))
message = f'nobody{api_path}use{request_text}md5forencrypt'.encode('latin1')
msg_digest = md5(message).hexdigest()
msg_digest = hashlib.md5(message).hexdigest()
data = pkcs7_padding(list(str.encode(
f'{api_path}-36cd479b6b5-{request_text}-36cd479b6b5-{msg_digest}')))
@ -53,7 +53,7 @@ class NetEaseMusicBaseIE(InfoExtractor):
'__csrf': '',
'os': 'pc',
'channel': 'undefined',
'requestId': f'{int(time.time() * 1000)}_{randint(0, 1000):04}',
'requestId': f'{int(time.time() * 1000)}_{random.randint(0, 1000):04}',
**traverse_obj(self._get_cookies(self._API_BASE), {
'MUSIC_U': ('MUSIC_U', {lambda i: i.value}),
})
@ -561,7 +561,8 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
'timestamp': ('createTime', {self.kilo_or_none}),
})
if not self._yes_playlist(info['songs'] and program_id, info['mainSong']['id']):
if not self._yes_playlist(
info['songs'] and program_id, info['mainSong']['id'], playlist_label='program', video_label='song'):
formats = self.extract_formats(info['mainSong'])
return {

Wyświetl plik

@ -5,7 +5,6 @@ from ..utils import (
merge_dicts,
parse_count,
url_or_none,
urljoin,
)
from ..utils.traversal import traverse_obj
@ -16,8 +15,7 @@ class NFBBaseIE(InfoExtractor):
def _extract_ep_data(self, webpage, video_id, fatal=False):
return self._search_json(
r'const\s+episodesData\s*=', webpage, 'episode data', video_id,
contains_pattern=r'\[\s*{(?s:.+)}\s*\]', fatal=fatal) or []
r'episodesData\s*:', webpage, 'episode data', video_id, fatal=fatal) or {}
def _extract_ep_info(self, data, video_id, slug=None):
info = traverse_obj(data, (lambda _, v: video_id in v['embed_url'], {
@ -224,18 +222,14 @@ class NFBIE(NFBBaseIE):
# type_ can change from film to serie(s) after redirect; new slug may have episode number
type_, slug = self._match_valid_url(urlh.url).group('type', 'id')
embed_url = urljoin(f'https://www.{site}.ca', self._html_search_regex(
r'<[^>]+\bid=["\']player-iframe["\'][^>]*\bsrc=["\']([^"\']+)', webpage, 'embed url'))
video_id = self._match_id(embed_url) # embed url has unique slug
player = self._download_webpage(embed_url, video_id, 'Downloading player page')
if 'MESSAGE_GEOBLOCKED' in player:
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
player_data = self._search_json(
r'window\.PLAYER_OPTIONS\[[^\]]+\]\s*=', webpage, 'player data', slug)
video_id = self._match_id(player_data['overlay']['url']) # overlay url always has unique slug
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
self._html_search_regex(r'source:\s*\'([^\']+)', player, 'm3u8 url'),
video_id, 'mp4', m3u8_id='hls')
player_data['source'], video_id, 'mp4', m3u8_id='hls')
if dv_source := self._html_search_regex(r'dvSource:\s*\'([^\']+)', player, 'dv', default=None):
if dv_source := url_or_none(player_data.get('dvSource')):
fmts, subs = self._extract_m3u8_formats_and_subtitles(
dv_source, video_id, 'mp4', m3u8_id='dv', preference=-2, fatal=False)
for fmt in fmts:
@ -246,17 +240,16 @@ class NFBIE(NFBBaseIE):
info = {
'id': video_id,
'title': self._html_search_regex(
r'<[^>]+\bid=["\']titleHeader["\'][^>]*>\s*<h1[^>]*>\s*([^<]+?)\s*</h1>',
r'["\']nfb_version_title["\']\s*:\s*["\']([^"\']+)',
webpage, 'title', default=None),
'description': self._html_search_regex(
r'<[^>]+\bid=["\']tabSynopsis["\'][^>]*>\s*<p[^>]*>\s*([^<]+)',
webpage, 'description', default=None),
'thumbnail': self._html_search_regex(
r'poster:\s*\'([^\']+)', player, 'thumbnail', default=None),
'thumbnail': url_or_none(player_data.get('poster')),
'uploader': self._html_search_regex(
r'<[^>]+\bitemprop=["\']name["\'][^>]*>([^<]+)', webpage, 'uploader', default=None),
r'<[^>]+\bitemprop=["\']director["\'][^>]*>([^<]+)', webpage, 'uploader', default=None),
'release_year': int_or_none(self._html_search_regex(
r'<[^>]+\bitemprop=["\']datePublished["\'][^>]*>([^<]+)',
r'["\']nfb_version_year["\']\s*:\s*["\']([^"\']+)',
webpage, 'release_year', default=None)),
} if type_ == 'film' else self._extract_ep_info(self._extract_ep_data(webpage, video_id, slug), video_id)

Wyświetl plik

@ -8,6 +8,7 @@ from ..utils import (
int_or_none,
join_nonempty,
parse_duration,
remove_end,
traverse_obj,
try_call,
unescapeHTML,
@ -19,8 +20,7 @@ from ..utils import (
class NhkBaseIE(InfoExtractor):
_API_URL_TEMPLATE = 'https://nwapi.nhk.jp/nhkworld/%sod%slist/v7b/%s/%s/%s/all%s.json'
_BASE_URL_REGEX = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand'
_TYPE_REGEX = r'/(?P<type>video|audio)/'
_BASE_URL_REGEX = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/'
def _call_api(self, m_id, lang, is_video, is_episode, is_clip):
return self._download_json(
@ -83,7 +83,7 @@ class NhkBaseIE(InfoExtractor):
def _extract_episode_info(self, url, episode=None):
fetch_episode = episode is None
lang, m_type, episode_id = NhkVodIE._match_valid_url(url).group('lang', 'type', 'id')
is_video = m_type == 'video'
is_video = m_type != 'audio'
if is_video:
episode_id = episode_id[:4] + '-' + episode_id[4:]
@ -138,9 +138,10 @@ class NhkBaseIE(InfoExtractor):
else:
if fetch_episode:
audio_path = episode['audio']['audio']
# From https://www3.nhk.or.jp/nhkworld/common/player/radio/inline/rod.html
audio_path = remove_end(episode['audio']['audio'], '.m4a')
info['formats'] = self._extract_m3u8_formats(
'https://nhkworld-vh.akamaihd.net/i%s/master.m3u8' % audio_path,
f'{urljoin("https://vod-stream.nhk.jp", audio_path)}/index.m3u8',
episode_id, 'm4a', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)
for f in info['formats']:
@ -155,9 +156,11 @@ class NhkBaseIE(InfoExtractor):
class NhkVodIE(NhkBaseIE):
# the 7-character IDs can have alphabetic chars too: assume [a-z] rather than just [a-f], eg
_VALID_URL = [rf'{NhkBaseIE._BASE_URL_REGEX}/(?P<type>video)/(?P<id>[0-9a-z]+)',
rf'{NhkBaseIE._BASE_URL_REGEX}/(?P<type>audio)/(?P<id>[^/?#]+?-\d{{8}}-[0-9a-z]+)']
_VALID_URL = [
rf'{NhkBaseIE._BASE_URL_REGEX}shows/(?:(?P<type>video)/)?(?P<id>\d{{4}}[\da-z]\d+)/?(?:$|[?#])',
rf'{NhkBaseIE._BASE_URL_REGEX}(?:ondemand|shows)/(?P<type>audio)/(?P<id>[^/?#]+?-\d{{8}}-[\da-z]+)',
rf'{NhkBaseIE._BASE_URL_REGEX}ondemand/(?P<type>video)/(?P<id>\d{{4}}[\da-z]\d+)', # deprecated
]
# Content available only for a limited period of time. Visit
# https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples.
_TESTS = [{
@ -167,17 +170,16 @@ class NhkVodIE(NhkBaseIE):
'ext': 'mp4',
'title': 'Japan Railway Journal - The Tohoku Shinkansen: Full Speed Ahead',
'description': 'md5:49f7c5b206e03868a2fdf0d0814b92f6',
'thumbnail': 'md5:51bcef4a21936e7fea1ff4e06353f463',
'thumbnail': r're:https://.+/.+\.jpg',
'episode': 'The Tohoku Shinkansen: Full Speed Ahead',
'series': 'Japan Railway Journal',
'modified_timestamp': 1694243656,
'modified_timestamp': 1707217907,
'timestamp': 1681428600,
'release_timestamp': 1693883728,
'duration': 1679,
'upload_date': '20230413',
'modified_date': '20230909',
'modified_date': '20240206',
'release_date': '20230905',
},
}, {
# video clip
@ -188,15 +190,15 @@ class NhkVodIE(NhkBaseIE):
'ext': 'mp4',
'title': 'Dining with the Chef - Chef Saito\'s Family recipe: MENCHI-KATSU',
'description': 'md5:5aee4a9f9d81c26281862382103b0ea5',
'thumbnail': 'md5:d6a4d9b6e9be90aaadda0bcce89631ed',
'thumbnail': r're:https://.+/.+\.jpg',
'series': 'Dining with the Chef',
'episode': 'Chef Saito\'s Family recipe: MENCHI-KATSU',
'duration': 148,
'upload_date': '20190816',
'release_date': '20230902',
'release_timestamp': 1693619292,
'modified_timestamp': 1694168033,
'modified_date': '20230908',
'modified_timestamp': 1707217907,
'modified_date': '20240206',
'timestamp': 1565997540,
},
}, {
@ -208,7 +210,7 @@ class NhkVodIE(NhkBaseIE):
'title': 'Living in Japan - Tips for Travelers to Japan / Ramen Vending Machines',
'series': 'Living in Japan',
'description': 'md5:0a0e2077d8f07a03071e990a6f51bfab',
'thumbnail': 'md5:960622fb6e06054a4a1a0c97ea752545',
'thumbnail': r're:https://.+/.+\.jpg',
'episode': 'Tips for Travelers to Japan / Ramen Vending Machines'
},
}, {
@ -245,7 +247,7 @@ class NhkVodIE(NhkBaseIE):
'title': 'おはよう日本7時台 - 10月8日放送',
'series': 'おはよう日本7時台',
'episode': '10月8日放送',
'thumbnail': 'md5:d733b1c8e965ab68fb02b2d347d0e9b4',
'thumbnail': r're:https://.+/.+\.jpg',
'description': 'md5:9c1d6cbeadb827b955b20e99ab920ff0',
},
'skip': 'expires 2023-10-15',
@ -255,17 +257,100 @@ class NhkVodIE(NhkBaseIE):
'info_dict': {
'id': 'nw_vod_v_en_3004_952_20230723091000_01_1690074552',
'ext': 'mp4',
'title': 'Barakan Discovers AMAMI OSHIMA: Isson\'s Treasure Island',
'title': 'Barakan Discovers - AMAMI OSHIMA: Isson\'s Treasure Isla',
'description': 'md5:5db620c46a0698451cc59add8816b797',
'thumbnail': 'md5:67d9ff28009ba379bfa85ad1aaa0e2bd',
'thumbnail': r're:https://.+/.+\.jpg',
'release_date': '20230905',
'timestamp': 1690103400,
'duration': 2939,
'release_timestamp': 1693898699,
'modified_timestamp': 1698057495,
'modified_date': '20231023',
'upload_date': '20230723',
'modified_timestamp': 1707217907,
'modified_date': '20240206',
'episode': 'AMAMI OSHIMA: Isson\'s Treasure Isla',
'series': 'Barakan Discovers',
},
}, {
# /ondemand/video/ url with alphabetical character in 5th position of id
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/9999a07/',
'info_dict': {
'id': 'nw_c_en_9999-a07',
'ext': 'mp4',
'episode': 'Mini-Dramas on SDGs: Ep 1 Close the Gender Gap [Director\'s Cut]',
'series': 'Mini-Dramas on SDGs',
'modified_date': '20240206',
'title': 'Mini-Dramas on SDGs - Mini-Dramas on SDGs: Ep 1 Close the Gender Gap [Director\'s Cut]',
'description': 'md5:3f9dcb4db22fceb675d90448a040d3f6',
'timestamp': 1621962360,
'duration': 189,
'release_date': '20230903',
'modified_timestamp': 1707217907,
'upload_date': '20210525',
'thumbnail': r're:https://.+/.+\.jpg',
'release_timestamp': 1693713487,
},
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/9999d17/',
'info_dict': {
'id': 'nw_c_en_9999-d17',
'ext': 'mp4',
'title': 'Flowers of snow blossom - The 72 Pentads of Yamato',
'description': 'Todays focus: Snow',
'release_timestamp': 1693792402,
'release_date': '20230904',
'upload_date': '20220128',
'timestamp': 1643370960,
'thumbnail': r're:https://.+/.+\.jpg',
'duration': 136,
'series': '',
'modified_date': '20240206',
'modified_timestamp': 1707217907,
},
}, {
# new /shows/ url format
'url': 'https://www3.nhk.or.jp/nhkworld/en/shows/2032307/',
'info_dict': {
'id': 'nw_vod_v_en_2032_307_20240321113000_01_1710990282',
'ext': 'mp4',
'title': 'Japanology Plus - 20th Anniversary Special Part 1',
'description': 'md5:817d41fc8e54339ad2a916161ea24faf',
'episode': '20th Anniversary Special Part 1',
'series': 'Japanology Plus',
'thumbnail': r're:https://.+/.+\.jpg',
'duration': 1680,
'timestamp': 1711020600,
'upload_date': '20240321',
'release_timestamp': 1711022683,
'release_date': '20240321',
'modified_timestamp': 1711031012,
'modified_date': '20240321',
},
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/shows/3020025/',
'info_dict': {
'id': 'nw_vod_v_en_3020_025_20230325144000_01_1679723944',
'ext': 'mp4',
'title': '100 Ideas to Save the World - Working Styles Evolve',
'description': 'md5:9e6c7778eaaf4f7b4af83569649f84d9',
'episode': 'Working Styles Evolve',
'series': '100 Ideas to Save the World',
'thumbnail': r're:https://.+/.+\.jpg',
'duration': 899,
'upload_date': '20230325',
'timestamp': 1679755200,
'release_date': '20230905',
'release_timestamp': 1693880540,
'modified_date': '20240206',
'modified_timestamp': 1707217907,
},
}, {
# new /shows/audio/ url format
'url': 'https://www3.nhk.or.jp/nhkworld/en/shows/audio/livinginjapan-20231001-1/',
'only_matching': True,
}, {
# valid url even if can't be found in wild; support needed for clip entries extraction
'url': 'https://www3.nhk.or.jp/nhkworld/en/shows/9999o80/',
'only_matching': True,
}]
def _real_extract(self, url):
@ -273,18 +358,21 @@ class NhkVodIE(NhkBaseIE):
class NhkVodProgramIE(NhkBaseIE):
_VALID_URL = rf'{NhkBaseIE._BASE_URL_REGEX}/program{NhkBaseIE._TYPE_REGEX}(?P<id>\w+)(?:.+?\btype=(?P<episode_type>clip|(?:radio|tv)Episode))?'
_VALID_URL = rf'''(?x)
{NhkBaseIE._BASE_URL_REGEX}(?:shows|tv)/
(?:(?P<type>audio)/programs/)?(?P<id>\w+)/?
(?:\?(?:[^#]+&)?type=(?P<episode_type>clip|(?:radio|tv)Episode))?'''
_TESTS = [{
# video program episodes
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/program/video/sumo',
'url': 'https://www3.nhk.or.jp/nhkworld/en/shows/sumo/',
'info_dict': {
'id': 'sumo',
'title': 'GRAND SUMO Highlights',
'description': 'md5:fc20d02dc6ce85e4b72e0273aa52fdbf',
},
'playlist_mincount': 0,
'playlist_mincount': 1,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/program/video/japanrailway',
'url': 'https://www3.nhk.or.jp/nhkworld/en/shows/japanrailway/',
'info_dict': {
'id': 'japanrailway',
'title': 'Japan Railway Journal',
@ -293,40 +381,68 @@ class NhkVodProgramIE(NhkBaseIE):
'playlist_mincount': 12,
}, {
# video program clips
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/program/video/japanrailway/?type=clip',
'url': 'https://www3.nhk.or.jp/nhkworld/en/shows/japanrailway/?type=clip',
'info_dict': {
'id': 'japanrailway',
'title': 'Japan Railway Journal',
'description': 'md5:ea39d93af7d05835baadf10d1aae0e3f',
},
'playlist_mincount': 5,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/program/video/10yearshayaomiyazaki/',
'only_matching': True,
'playlist_mincount': 12,
}, {
# audio program
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/program/audio/listener/',
'url': 'https://www3.nhk.or.jp/nhkworld/en/shows/audio/programs/livinginjapan/',
'info_dict': {
'id': 'livinginjapan',
'title': 'Living in Japan',
'description': 'md5:665bb36ec2a12c5a7f598ee713fc2b54',
},
'playlist_mincount': 12,
}, {
# /tv/ program url
'url': 'https://www3.nhk.or.jp/nhkworld/en/tv/designtalksplus/',
'info_dict': {
'id': 'designtalksplus',
'title': 'DESIGN TALKS plus',
'description': 'md5:47b3b3a9f10d4ac7b33b53b70a7d2837',
},
'playlist_mincount': 20,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/shows/10yearshayaomiyazaki/',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if NhkVodIE.suitable(url) else super().suitable(url)
def _extract_meta_from_class_elements(self, class_values, html):
for class_value in class_values:
if value := clean_html(get_element_by_class(class_value, html)):
return value
def _real_extract(self, url):
lang, m_type, program_id, episode_type = self._match_valid_url(url).group('lang', 'type', 'id', 'episode_type')
episodes = self._call_api(
program_id, lang, m_type == 'video', False, episode_type == 'clip')
program_id, lang, m_type != 'audio', False, episode_type == 'clip')
entries = []
for episode in episodes:
episode_path = episode.get('url')
if not episode_path:
continue
entries.append(self._extract_episode_info(
urljoin(url, episode_path), episode))
def entries():
for episode in episodes:
if episode_path := episode.get('url'):
yield self._extract_episode_info(urljoin(url, episode_path), episode)
html = self._download_webpage(url, program_id)
program_title = clean_html(get_element_by_class('p-programDetail__title', html))
program_description = clean_html(get_element_by_class('p-programDetail__text', html))
program_title = self._extract_meta_from_class_elements([
'p-programDetail__title', # /ondemand/program/
'pProgramHero__logoText', # /shows/
'tAudioProgramMain__title', # /shows/audio/programs/
'p-program-name'], html) # /tv/
program_description = self._extract_meta_from_class_elements([
'p-programDetail__text', # /ondemand/program/
'pProgramHero__description', # /shows/
'tAudioProgramMain__info', # /shows/audio/programs/
'p-program-description'], html) # /tv/
return self.playlist_result(entries, program_id, program_title, program_description)
return self.playlist_result(entries(), program_id, program_title, program_description)
class NhkForSchoolBangumiIE(InfoExtractor):

Wyświetl plik

@ -1,11 +1,10 @@
import datetime
import datetime as dt
import functools
import itertools
import json
import re
import time
from urllib.parse import urlparse
import urllib.parse
from .common import InfoExtractor, SearchInfoExtractor
from ..networking import Request
@ -820,12 +819,12 @@ class NicovideoSearchDateIE(NicovideoSearchBaseIE, SearchInfoExtractor):
'playlist_mincount': 1610,
}]
_START_DATE = datetime.date(2007, 1, 1)
_START_DATE = dt.date(2007, 1, 1)
_RESULTS_PER_PAGE = 32
_MAX_PAGES = 50
def _entries(self, url, item_id, start_date=None, end_date=None):
start_date, end_date = start_date or self._START_DATE, end_date or datetime.datetime.now().date()
start_date, end_date = start_date or self._START_DATE, end_date or dt.datetime.now().date()
# If the last page has a full page of videos, we need to break down the query interval further
last_page_len = len(list(self._get_entries_for_date(
@ -957,7 +956,7 @@ class NiconicoLiveIE(InfoExtractor):
'frontend_id': traverse_obj(embedded_data, ('site', 'frontendId')) or '9',
})
hostname = remove_start(urlparse(urlh.url).hostname, 'sp.')
hostname = remove_start(urllib.parse.urlparse(urlh.url).hostname, 'sp.')
latency = try_get(self._configuration_arg('latency'), lambda x: x[0])
if latency not in self._KNOWN_LATENCY:
latency = 'high'

Wyświetl plik

@ -1,8 +1,8 @@
import calendar
import json
import datetime as dt
import functools
from datetime import datetime, timezone
from random import random
import json
import random
from .common import InfoExtractor
from ..compat import (
@ -243,7 +243,7 @@ class PanoptoIE(PanoptoBaseIE):
invocation_id = delivery_info.get('InvocationId')
stream_id = traverse_obj(delivery_info, ('Delivery', 'Streams', ..., 'PublicID'), get_all=False, expected_type=str)
if invocation_id and stream_id and duration:
timestamp_str = f'/Date({calendar.timegm(datetime.now(timezone.utc).timetuple())}000)/'
timestamp_str = f'/Date({calendar.timegm(dt.datetime.now(dt.timezone.utc).timetuple())}000)/'
data = {
'streamRequests': [
{
@ -415,7 +415,7 @@ class PanoptoIE(PanoptoBaseIE):
'cast': traverse_obj(delivery, ('Contributors', ..., 'DisplayName'), expected_type=lambda x: x or None),
'timestamp': session_start_time - 11640000000 if session_start_time else None,
'duration': delivery.get('Duration'),
'thumbnail': base_url + f'/Services/FrameGrabber.svc/FrameRedirect?objectId={video_id}&mode=Delivery&random={random()}',
'thumbnail': base_url + f'/Services/FrameGrabber.svc/FrameRedirect?objectId={video_id}&mode=Delivery&random={random.random()}',
'average_rating': delivery.get('AverageRating'),
'chapters': self._extract_chapters(timestamps),
'uploader': delivery.get('OwnerDisplayName') or None,

Wyświetl plik

@ -1,8 +1,8 @@
import itertools
import urllib.parse
from .common import InfoExtractor
from .vimeo import VimeoIE
from ..compat import compat_urllib_parse_unquote
from ..networking.exceptions import HTTPError
from ..utils import (
KNOWN_EXTENSIONS,
@ -14,7 +14,6 @@ from ..utils import (
parse_iso8601,
str_or_none,
traverse_obj,
try_get,
url_or_none,
urljoin,
)
@ -92,7 +91,7 @@ class PatreonIE(PatreonBaseIE):
'thumbnail': 're:^https?://.*$',
'upload_date': '20150211',
'description': 'md5:8af6425f50bd46fbf29f3db0fc3a8364',
'uploader_id': 'TraciJHines',
'uploader_id': '@TraciHinesMusic',
'categories': ['Entertainment'],
'duration': 282,
'view_count': int,
@ -106,8 +105,10 @@ class PatreonIE(PatreonBaseIE):
'availability': 'public',
'channel_follower_count': int,
'playable_in_embed': True,
'uploader_url': 'http://www.youtube.com/user/TraciJHines',
'uploader_url': 'https://www.youtube.com/@TraciHinesMusic',
'comment_count': int,
'channel_is_verified': True,
'chapters': 'count:4',
},
'params': {
'noplaylist': True,
@ -176,7 +177,71 @@ class PatreonIE(PatreonBaseIE):
'uploader_url': 'https://www.patreon.com/thenormies',
},
'skip': 'Patron-only content',
}, {
# dead vimeo and embed URLs, need to extract post_file
'url': 'https://www.patreon.com/posts/hunter-x-hunter-34007913',
'info_dict': {
'id': '34007913',
'ext': 'mp4',
'title': 'Hunter x Hunter | Kurapika DESTROYS Uvogin!!!',
'like_count': int,
'uploader': 'YaBoyRoshi',
'timestamp': 1581636833,
'channel_url': 'https://www.patreon.com/yaboyroshi',
'thumbnail': r're:^https?://.*$',
'tags': ['Hunter x Hunter'],
'uploader_id': '14264111',
'comment_count': int,
'channel_follower_count': int,
'description': 'Kurapika is a walking cheat code!',
'upload_date': '20200213',
'channel_id': '2147162',
'uploader_url': 'https://www.patreon.com/yaboyroshi',
},
}, {
# NSFW vimeo embed URL
'url': 'https://www.patreon.com/posts/4k-spiderman-4k-96414599',
'info_dict': {
'id': '902250943',
'ext': 'mp4',
'title': '❤️(4K) Spiderman Girl Yeonhwas Gift ❤️(4K) 스파이더맨걸 연화의 선물',
'description': '❤️(4K) Spiderman Girl Yeonhwas Gift \n❤️(4K) 스파이더맨걸 연화의 선물',
'uploader': 'Npickyeonhwa',
'uploader_id': '90574422',
'uploader_url': 'https://www.patreon.com/Yeonhwa726',
'channel_id': '10237902',
'channel_url': 'https://www.patreon.com/Yeonhwa726',
'duration': 70,
'timestamp': 1705150153,
'upload_date': '20240113',
'comment_count': int,
'like_count': int,
'thumbnail': r're:^https?://.+',
},
'params': {'skip_download': 'm3u8'},
}, {
# multiple attachments/embeds
'url': 'https://www.patreon.com/posts/holy-wars-solos-100601977',
'playlist_count': 3,
'info_dict': {
'id': '100601977',
'title': '"Holy Wars" (Megadeth) Solos Transcription & Lesson/Analysis',
'description': 'md5:d099ab976edfce6de2a65c2b169a88d3',
'uploader': 'Bradley Hall',
'uploader_id': '24401883',
'uploader_url': 'https://www.patreon.com/bradleyhallguitar',
'channel_id': '3193932',
'channel_url': 'https://www.patreon.com/bradleyhallguitar',
'channel_follower_count': int,
'timestamp': 1710777855,
'upload_date': '20240318',
'like_count': int,
'comment_count': int,
'thumbnail': r're:^https?://.+',
},
'skip': 'Patron-only content',
}]
_RETURN_TYPE = 'video'
def _real_extract(self, url):
video_id = self._match_id(url)
@ -191,102 +256,108 @@ class PatreonIE(PatreonBaseIE):
'include': 'audio,user,user_defined_tags,campaign,attachments_media',
})
attributes = post['data']['attributes']
title = attributes['title'].strip()
image = attributes.get('image') or {}
info = {
'id': video_id,
'title': title,
'description': clean_html(attributes.get('content')),
'thumbnail': image.get('large_url') or image.get('url'),
'timestamp': parse_iso8601(attributes.get('published_at')),
'like_count': int_or_none(attributes.get('like_count')),
'comment_count': int_or_none(attributes.get('comment_count')),
}
can_view_post = traverse_obj(attributes, 'current_user_can_view')
if can_view_post and info['comment_count']:
info['__post_extractor'] = self.extract_comments(video_id)
info = traverse_obj(attributes, {
'title': ('title', {str.strip}),
'description': ('content', {clean_html}),
'thumbnail': ('image', ('large_url', 'url'), {url_or_none}, any),
'timestamp': ('published_at', {parse_iso8601}),
'like_count': ('like_count', {int_or_none}),
'comment_count': ('comment_count', {int_or_none}),
})
for i in post.get('included', []):
i_type = i.get('type')
if i_type == 'media':
media_attributes = i.get('attributes') or {}
download_url = media_attributes.get('download_url')
entries = []
idx = 0
for include in traverse_obj(post, ('included', lambda _, v: v['type'])):
include_type = include['type']
if include_type == 'media':
media_attributes = traverse_obj(include, ('attributes', {dict})) or {}
download_url = url_or_none(media_attributes.get('download_url'))
ext = mimetype2ext(media_attributes.get('mimetype'))
# if size_bytes is None, this media file is likely unavailable
# See: https://github.com/yt-dlp/yt-dlp/issues/4608
size_bytes = int_or_none(media_attributes.get('size_bytes'))
if download_url and ext in KNOWN_EXTENSIONS and size_bytes is not None:
# XXX: what happens if there are multiple attachments?
return {
**info,
idx += 1
entries.append({
'id': f'{video_id}-{idx}',
'ext': ext,
'filesize': size_bytes,
'url': download_url,
}
elif i_type == 'user':
user_attributes = i.get('attributes')
if user_attributes:
info.update({
'uploader': user_attributes.get('full_name'),
'uploader_id': str_or_none(i.get('id')),
'uploader_url': user_attributes.get('url'),
})
elif i_type == 'post_tag':
info.setdefault('tags', []).append(traverse_obj(i, ('attributes', 'value')))
elif include_type == 'user':
info.update(traverse_obj(include, {
'uploader': ('attributes', 'full_name', {str}),
'uploader_id': ('id', {str_or_none}),
'uploader_url': ('attributes', 'url', {url_or_none}),
}))
elif i_type == 'campaign':
info.update({
'channel': traverse_obj(i, ('attributes', 'title')),
'channel_id': str_or_none(i.get('id')),
'channel_url': traverse_obj(i, ('attributes', 'url')),
'channel_follower_count': int_or_none(traverse_obj(i, ('attributes', 'patron_count'))),
})
elif include_type == 'post_tag':
if post_tag := traverse_obj(include, ('attributes', 'value', {str})):
info.setdefault('tags', []).append(post_tag)
elif include_type == 'campaign':
info.update(traverse_obj(include, {
'channel': ('attributes', 'title', {str}),
'channel_id': ('id', {str_or_none}),
'channel_url': ('attributes', 'url', {url_or_none}),
'channel_follower_count': ('attributes', 'patron_count', {int_or_none}),
}))
# handle Vimeo embeds
if try_get(attributes, lambda x: x['embed']['provider']) == 'Vimeo':
embed_html = try_get(attributes, lambda x: x['embed']['html'])
v_url = url_or_none(compat_urllib_parse_unquote(
self._search_regex(r'(https(?:%3A%2F%2F|://)player\.vimeo\.com.+app_id(?:=|%3D)+\d+)', embed_html, 'vimeo url', fatal=False)))
if v_url:
return {
**info,
'_type': 'url_transparent',
'url': VimeoIE._smuggle_referrer(v_url, 'https://patreon.com'),
'ie_key': 'Vimeo',
}
if traverse_obj(attributes, ('embed', 'provider')) == 'Vimeo':
v_url = urllib.parse.unquote(self._html_search_regex(
r'(https(?:%3A%2F%2F|://)player\.vimeo\.com.+app_id(?:=|%3D)+\d+)',
traverse_obj(attributes, ('embed', 'html', {str})), 'vimeo url', fatal=False) or '')
if url_or_none(v_url) and self._request_webpage(
v_url, video_id, 'Checking Vimeo embed URL',
headers={'Referer': 'https://patreon.com/'},
fatal=False, errnote=False):
entries.append(self.url_result(
VimeoIE._smuggle_referrer(v_url, 'https://patreon.com/'),
VimeoIE, url_transparent=True))
embed_url = try_get(attributes, lambda x: x['embed']['url'])
if embed_url:
return {
**info,
'_type': 'url',
'url': embed_url,
}
embed_url = traverse_obj(attributes, ('embed', 'url', {url_or_none}))
if embed_url and self._request_webpage(embed_url, video_id, 'Checking embed URL', fatal=False, errnote=False):
entries.append(self.url_result(embed_url))
post_file = traverse_obj(attributes, 'post_file')
post_file = traverse_obj(attributes, ('post_file', {dict}))
if post_file:
name = post_file.get('name')
ext = determine_ext(name)
if ext in KNOWN_EXTENSIONS:
return {
**info,
entries.append({
'id': video_id,
'ext': ext,
'url': post_file['url'],
}
})
elif name == 'video' or determine_ext(post_file.get('url')) == 'm3u8':
formats, subtitles = self._extract_m3u8_formats_and_subtitles(post_file['url'], video_id)
return {
**info,
entries.append({
'id': video_id,
'formats': formats,
'subtitles': subtitles,
}
})
if can_view_post is False:
can_view_post = traverse_obj(attributes, 'current_user_can_view')
comments = None
if can_view_post and info.get('comment_count'):
comments = self.extract_comments(video_id)
if not entries and can_view_post is False:
self.raise_no_formats('You do not have access to this post', video_id=video_id, expected=True)
else:
elif not entries:
self.raise_no_formats('No supported media found in this post', video_id=video_id, expected=True)
elif len(entries) == 1:
info.update(entries[0])
else:
for entry in entries:
entry.update(info)
return self.playlist_result(entries, video_id, **info, __post_extractor=comments)
info['id'] = video_id
info['__post_extractor'] = comments
return info
def _get_comments(self, post_id):

Wyświetl plik

@ -1,5 +1,5 @@
from uuid import uuid4
import json
import uuid
from .common import InfoExtractor
from ..utils import (
@ -51,7 +51,7 @@ class PolsatGoIE(InfoExtractor):
}
def _call_api(self, endpoint, media_id, method, params):
rand_uuid = str(uuid4())
rand_uuid = str(uuid.uuid4())
res = self._download_json(
f'https://b2c-mobile.redefine.pl/rpc/{endpoint}/', media_id,
note=f'Downloading {method} JSON metadata',

Wyświetl plik

@ -1,5 +1,6 @@
import datetime as dt
import json
from urllib.parse import unquote
import urllib.parse
from .common import InfoExtractor
from ..compat import functools
@ -114,7 +115,7 @@ class Pr0grammIE(InfoExtractor):
cookies = self._get_cookies(self.BASE_URL)
if 'me' not in cookies:
self._download_webpage(self.BASE_URL, None, 'Refreshing verification information')
if traverse_obj(cookies, ('me', {lambda x: x.value}, {unquote}, {json.loads}, 'verified')):
if traverse_obj(cookies, ('me', {lambda x: x.value}, {urllib.parse.unquote}, {json.loads}, 'verified')):
flags |= 0b00110
return flags
@ -196,6 +197,7 @@ class Pr0grammIE(InfoExtractor):
'like_count': ('up', {int}),
'dislike_count': ('down', {int}),
'timestamp': ('created', {int}),
'upload_date': ('created', {int}, {dt.date.fromtimestamp}, {lambda x: x.strftime('%Y%m%d')}),
'thumbnail': ('thumb', {lambda x: urljoin('https://thumb.pr0gramm.com', x)})
}),
}

Wyświetl plik

@ -1,6 +1,6 @@
import hashlib
import re
from hashlib import sha1
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
@ -42,7 +42,7 @@ class ProSiebenSat1BaseIE(InfoExtractor):
'Downloading protocols JSON',
headers=self.geo_verification_headers(), query={
'access_id': self._ACCESS_ID,
'client_token': sha1((raw_ct).encode()).hexdigest(),
'client_token': hashlib.sha1((raw_ct).encode()).hexdigest(),
'video_id': clip_id,
}, fatal=False, expected_status=(403,)) or {}
error = protocols.get('error') or {}
@ -53,7 +53,7 @@ class ProSiebenSat1BaseIE(InfoExtractor):
urls = (self._download_json(
self._V4_BASE_URL + 'urls', clip_id, 'Downloading urls JSON', query={
'access_id': self._ACCESS_ID,
'client_token': sha1((raw_ct + server_token + self._SUPPORTED_PROTOCOLS).encode()).hexdigest(),
'client_token': hashlib.sha1((raw_ct + server_token + self._SUPPORTED_PROTOCOLS).encode()).hexdigest(),
'protocols': self._SUPPORTED_PROTOCOLS,
'server_token': server_token,
'video_id': clip_id,
@ -77,7 +77,7 @@ class ProSiebenSat1BaseIE(InfoExtractor):
if not formats:
source_ids = [compat_str(source['id']) for source in video['sources']]
client_id = self._SALT[:2] + sha1(''.join([clip_id, self._SALT, self._TOKEN, client_location, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
client_id = self._SALT[:2] + hashlib.sha1(''.join([clip_id, self._SALT, self._TOKEN, client_location, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
sources = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources' % clip_id,
@ -96,7 +96,7 @@ class ProSiebenSat1BaseIE(InfoExtractor):
return (bitrate // 1000) if bitrate % 1000 == 0 else bitrate
for source_id in source_ids:
client_id = self._SALT[:2] + sha1(''.join([self._SALT, clip_id, self._TOKEN, server_id, client_location, source_id, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
client_id = self._SALT[:2] + hashlib.sha1(''.join([self._SALT, clip_id, self._TOKEN, server_id, client_location, source_id, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
urls = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources/url' % clip_id,
clip_id, 'Downloading urls JSON', fatal=False, query={

Wyświetl plik

@ -1,18 +1,14 @@
from .common import InfoExtractor
from ..utils import (
clean_html,
traverse_obj,
unescapeHTML,
)
import itertools
from urllib.parse import urlencode
import urllib.parse
from .common import InfoExtractor
from ..utils import clean_html, traverse_obj, unescapeHTML
class RadioKapitalBaseIE(InfoExtractor):
def _call_api(self, resource, video_id, note='Downloading JSON metadata', qs={}):
return self._download_json(
f'https://www.radiokapital.pl/wp-json/kapital/v1/{resource}?{urlencode(qs)}',
f'https://www.radiokapital.pl/wp-json/kapital/v1/{resource}?{urllib.parse.urlencode(qs)}',
video_id, note=note)
def _parse_episode(self, data):

Wyświetl plik

@ -1,8 +1,8 @@
import datetime as dt
import itertools
import json
import re
import urllib.parse
from datetime import datetime
from .common import InfoExtractor, SearchInfoExtractor
from ..utils import (
@ -156,7 +156,7 @@ class RokfinIE(InfoExtractor):
self.raise_login_required('This video is only available to premium users', True, method='cookies')
elif scheduled:
self.raise_no_formats(
f'Stream is offline; scheduled for {datetime.fromtimestamp(scheduled).strftime("%Y-%m-%d %H:%M:%S")}',
f'Stream is offline; scheduled for {dt.datetime.fromtimestamp(scheduled).strftime("%Y-%m-%d %H:%M:%S")}',
video_id=video_id, expected=True)
uploader = traverse_obj(metadata, ('createdBy', 'username'), ('creator', 'username'))

Wyświetl plik

@ -1,4 +1,4 @@
import datetime
import datetime as dt
from .common import InfoExtractor
from .redge import RedCDNLivxIE
@ -13,16 +13,16 @@ from ..utils.traversal import traverse_obj
def is_dst(date):
last_march = datetime.datetime(date.year, 3, 31)
last_october = datetime.datetime(date.year, 10, 31)
last_sunday_march = last_march - datetime.timedelta(days=last_march.isoweekday() % 7)
last_sunday_october = last_october - datetime.timedelta(days=last_october.isoweekday() % 7)
last_march = dt.datetime(date.year, 3, 31)
last_october = dt.datetime(date.year, 10, 31)
last_sunday_march = last_march - dt.timedelta(days=last_march.isoweekday() % 7)
last_sunday_october = last_october - dt.timedelta(days=last_october.isoweekday() % 7)
return last_sunday_march.replace(hour=2) <= date <= last_sunday_october.replace(hour=3)
def rfc3339_to_atende(date):
date = datetime.datetime.fromisoformat(date)
date = date + datetime.timedelta(hours=1 if is_dst(date) else 0)
date = dt.datetime.fromisoformat(date)
date = date + dt.timedelta(hours=1 if is_dst(date) else 0)
return int((date.timestamp() - 978307200) * 1000)

Wyświetl plik

@ -0,0 +1,112 @@
import json
import urllib.parse
from .common import InfoExtractor
from ..utils import determine_ext, int_or_none, url_or_none
from ..utils.traversal import traverse_obj
class SharePointIE(InfoExtractor):
_BASE_URL_RE = r'https?://[\w-]+\.sharepoint\.com/'
_VALID_URL = [
rf'{_BASE_URL_RE}:v:/[a-z]/(?:[^/?#]+/)*(?P<id>[^/?#]{{46}})/?(?:$|[?#])',
rf'{_BASE_URL_RE}(?!:v:)(?:[^/?#]+/)*stream\.aspx\?(?:[^#]+&)?id=(?P<id>[^&#]+)',
]
_TESTS = [{
'url': 'https://lut-my.sharepoint.com/:v:/g/personal/juha_eerola_student_lab_fi/EUrAmrktb4ZMhUcY9J2PqMEBD_9x_l0DyYWVgAvp-TTOMw?e=ZpQOOw',
'md5': '2950821d0d4937a0a76373782093b435',
'info_dict': {
'id': '01EQRS7EKKYCNLSLLPQZGIKRYY6SOY7KGB',
'display_id': 'EUrAmrktb4ZMhUcY9J2PqMEBD_9x_l0DyYWVgAvp-TTOMw',
'ext': 'mp4',
'title': 'CmvpJST',
'duration': 54.567,
'thumbnail': r're:https://.+/thumbnail',
'uploader_id': '8dcec565-a956-4b91-95e5-bacfb8bc015f',
},
}, {
'url': 'https://greaternyace.sharepoint.com/:v:/s/acementornydrive/ETski5eAfNVEoPRZUAyy1wEBpLgVFYWso5bjbZjfBLlPUg?e=PQUfVb',
'md5': 'c496a01644223273bff12e93e501afd1',
'info_dict': {
'id': '01QI4AVTZ3ESFZPAD42VCKB5CZKAGLFVYB',
'display_id': 'ETski5eAfNVEoPRZUAyy1wEBpLgVFYWso5bjbZjfBLlPUg',
'ext': 'mp4',
'title': '930103681233985536',
'duration': 3797.326,
'thumbnail': r're:https://.+/thumbnail',
},
}, {
'url': 'https://lut-my.sharepoint.com/personal/juha_eerola_student_lab_fi/_layouts/15/stream.aspx?id=%2Fpersonal%2Fjuha_eerola_student_lab_fi%2FDocuments%2FM-DL%2FCmvpJST.mp4&ga=1&referrer=StreamWebApp.Web&referrerScenario=AddressBarCopied.view',
'info_dict': {
'id': '01EQRS7EKKYCNLSLLPQZGIKRYY6SOY7KGB',
'display_id': '/personal/juha_eerola_student_lab_fi/Documents/M-DL/CmvpJST.mp4',
'ext': 'mp4',
'title': 'CmvpJST',
'duration': 54.567,
'thumbnail': r're:https://.+/thumbnail',
'uploader_id': '8dcec565-a956-4b91-95e5-bacfb8bc015f',
},
'skip': 'Session cookies needed',
}, {
'url': 'https://izoobasisschool.sharepoint.com/:v:/g/Eaqleq8COVBIvIPvod0U27oBypC6aWOkk8ptuDpmJ6arHw',
'only_matching': True,
}, {
'url': 'https://uskudaredutr-my.sharepoint.com/:v:/g/personal/songul_turkaydin_uskudar_edu_tr/EbTf-VRUIbtGuIN73tx1MuwBCHBOmNcWNqSLw61Fd2_o0g?e=n5Vkof',
'only_matching': True,
}, {
'url': 'https://epam-my.sharepoint.com/:v:/p/dzmitry_tamashevich/Ec4ZOs-rATZHjFYZWVxjczEB649FCoYFKDV_x3RxZiWAGA?e=4hswgA',
'only_matching': True,
}, {
'url': 'https://microsoft.sharepoint.com/:v:/t/MicrosoftSPARKRecordings-MSFTInternal/EWCyeqByVWBAt8wDvNZdV-UB0BvU5YVbKm0UHgdrUlI6dg?e=QbPck6',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = urllib.parse.unquote(self._match_id(url))
webpage, urlh = self._download_webpage_handle(url, display_id)
if urllib.parse.urlparse(urlh.url).hostname == 'login.microsoftonline.com':
self.raise_login_required(
'Session cookies are required for this URL and can be passed '
'with the --cookies option. The --cookies-from-browser option will not work', method=None)
video_data = self._search_json(r'g_fileInfo\s*=', webpage, 'player config', display_id)
video_id = video_data['VroomItemId']
parsed_url = urllib.parse.urlparse(video_data['.transformUrl'])
base_media_url = urllib.parse.urlunparse(parsed_url._replace(
path=urllib.parse.urljoin(f'{parsed_url.path}/', '../videomanifest'),
query=urllib.parse.urlencode({
**urllib.parse.parse_qs(parsed_url.query),
'cTag': video_data['.ctag'],
'action': 'Access',
'part': 'index',
}, doseq=True)))
# Web player adds more params to the format URLs but we still get all formats without them
formats = self._extract_mpd_formats(
base_media_url, video_id, mpd_id='dash', query={'format': 'dash'}, fatal=False)
for hls_type in ('hls', 'hls-vnext'):
formats.extend(self._extract_m3u8_formats(
base_media_url, video_id, 'mp4', m3u8_id=hls_type,
query={'format': hls_type}, fatal=False, quality=-2))
if video_url := traverse_obj(video_data, ('downloadUrl', {url_or_none})):
formats.append({
'url': video_url,
'ext': determine_ext(video_data.get('extension') or video_data.get('name')),
'quality': 1,
'format_id': 'source',
'filesize': int_or_none(video_data.get('size')),
'vcodec': 'none' if video_data.get('isAudio') is True else None,
})
return {
'id': video_id,
'formats': formats,
'title': video_data.get('title') or video_data.get('displayName'),
'display_id': display_id,
'uploader_id': video_data.get('authorId'),
'duration': traverse_obj(video_data, (
'MediaServiceFastMetadata', {json.loads}, 'media', 'duration', {lambda x: x / 10000000})),
'thumbnail': url_or_none(video_data.get('thumbnailUrl')),
}

Wyświetl plik

@ -1,4 +1,5 @@
import datetime
import datetime as dt
import itertools
import json
import math
import random
@ -12,8 +13,8 @@ from ..utils import (
int_or_none,
jwt_decode_hs256,
try_call,
try_get,
)
from ..utils.traversal import traverse_obj
class SonyLIVIE(InfoExtractor):
@ -93,7 +94,7 @@ class SonyLIVIE(InfoExtractor):
'mobileNumber': username,
'channelPartnerID': 'MSMIND',
'country': 'IN',
'timestamp': datetime.datetime.now().strftime('%Y-%m-%dT%H:%M:%S.%MZ'),
'timestamp': dt.datetime.now().strftime('%Y-%m-%dT%H:%M:%S.%MZ'),
'otpSize': 6,
'loginType': 'REGISTERORSIGNIN',
'isMobileMandatory': True,
@ -110,7 +111,7 @@ class SonyLIVIE(InfoExtractor):
'otp': self._get_tfa_info('OTP'),
'dmaId': 'IN',
'ageConfirmation': True,
'timestamp': datetime.datetime.now().strftime('%Y-%m-%dT%H:%M:%S.%MZ'),
'timestamp': dt.datetime.now().strftime('%Y-%m-%dT%H:%M:%S.%MZ'),
'isMobileMandatory': True,
}).encode())
if otp_verify_json['resultCode'] == 'KO':
@ -183,17 +184,21 @@ class SonyLIVIE(InfoExtractor):
class SonyLIVSeriesIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?sonyliv\.com/shows/[^/?#&]+-(?P<id>\d{10})$'
_VALID_URL = r'https?://(?:www\.)?sonyliv\.com/shows/[^/?#&]+-(?P<id>\d{10})/?(?:$|[?#])'
_TESTS = [{
'url': 'https://www.sonyliv.com/shows/adaalat-1700000091',
'playlist_mincount': 456,
'playlist_mincount': 452,
'info_dict': {
'id': '1700000091',
},
}, {
'url': 'https://www.sonyliv.com/shows/beyhadh-1700000007/',
'playlist_mincount': 358,
'info_dict': {
'id': '1700000007',
},
}]
_API_SHOW_URL = "https://apiv2.sonyliv.com/AGL/1.9/R/ENG/WEB/IN/DL/DETAIL/{}?kids_safe=false&from=0&to=49"
_API_EPISODES_URL = "https://apiv2.sonyliv.com/AGL/1.4/R/ENG/WEB/IN/CONTENT/DETAIL/BUNDLE/{}?from=0&to=1000&orderBy=episodeNumber&sortOrder=asc"
_API_SECURITY_URL = 'https://apiv2.sonyliv.com/AGL/1.4/A/ENG/WEB/ALL/GETTOKEN'
_API_BASE = 'https://apiv2.sonyliv.com/AGL'
def _entries(self, show_id):
headers = {
@ -201,19 +206,34 @@ class SonyLIVSeriesIE(InfoExtractor):
'Referer': 'https://www.sonyliv.com',
}
headers['security_token'] = self._download_json(
self._API_SECURITY_URL, video_id=show_id, headers=headers,
note='Downloading security token')['resultObj']
seasons = try_get(
self._download_json(self._API_SHOW_URL.format(show_id), video_id=show_id, headers=headers),
lambda x: x['resultObj']['containers'][0]['containers'], list)
for season in seasons or []:
season_id = season['id']
episodes = try_get(
self._download_json(self._API_EPISODES_URL.format(season_id), video_id=season_id, headers=headers),
lambda x: x['resultObj']['containers'][0]['containers'], list)
for episode in episodes or []:
video_id = episode.get('id')
yield self.url_result('sonyliv:%s' % video_id, ie=SonyLIVIE.ie_key(), video_id=video_id)
f'{self._API_BASE}/1.4/A/ENG/WEB/ALL/GETTOKEN', show_id,
'Downloading security token', headers=headers)['resultObj']
seasons = traverse_obj(self._download_json(
f'{self._API_BASE}/1.9/R/ENG/WEB/IN/DL/DETAIL/{show_id}', show_id,
'Downloading series JSON', headers=headers, query={
'kids_safe': 'false',
'from': '0',
'to': '49',
}), ('resultObj', 'containers', 0, 'containers', lambda _, v: int_or_none(v['id'])))
for season in seasons:
season_id = str(season['id'])
note = traverse_obj(season, ('metadata', 'title', {str})) or 'season'
cursor = 0
for page_num in itertools.count(1):
episodes = traverse_obj(self._download_json(
f'{self._API_BASE}/1.4/R/ENG/WEB/IN/CONTENT/DETAIL/BUNDLE/{season_id}',
season_id, f'Downloading {note} page {page_num} JSON', headers=headers, query={
'from': str(cursor),
'to': str(cursor + 99),
'orderBy': 'episodeNumber',
'sortOrder': 'asc',
}), ('resultObj', 'containers', 0, 'containers', lambda _, v: int_or_none(v['id'])))
if not episodes:
break
for episode in episodes:
video_id = str(episode['id'])
yield self.url_result(f'sonyliv:{video_id}', SonyLIVIE, video_id)
cursor += 100
def _real_extract(self, url):
show_id = self._match_id(url)

Wyświetl plik

@ -1,30 +1,27 @@
import itertools
import re
import json
# import random
import re
from .common import (
InfoExtractor,
SearchInfoExtractor
)
from .common import InfoExtractor, SearchInfoExtractor
from ..compat import compat_str
from ..networking import HEADRequest, Request
from ..networking import HEADRequest
from ..networking.exceptions import HTTPError
from ..utils import (
error_to_compat_str,
KNOWN_EXTENSIONS,
ExtractorError,
error_to_compat_str,
float_or_none,
int_or_none,
KNOWN_EXTENSIONS,
mimetype2ext,
parse_qs,
str_or_none,
try_get,
try_call,
unified_timestamp,
update_url_query,
url_or_none,
urlhandle_detect_ext,
)
from ..utils.traversal import traverse_obj
class SoundcloudEmbedIE(InfoExtractor):
@ -54,7 +51,6 @@ class SoundcloudBaseIE(InfoExtractor):
_API_AUTH_QUERY_TEMPLATE = '?client_id=%s'
_API_AUTH_URL_PW = 'https://api-auth.soundcloud.com/web-auth/sign-in/password%s'
_API_VERIFY_AUTH_TOKEN = 'https://api-auth.soundcloud.com/connect/session%s'
_access_token = None
_HEADERS = {}
_IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg'
@ -112,21 +108,31 @@ class SoundcloudBaseIE(InfoExtractor):
def _initialize_pre_login(self):
self._CLIENT_ID = self.cache.load('soundcloud', 'client_id') or 'a3e059563d7fd3372b49b37f00a00bcf'
def _perform_login(self, username, password):
if username != 'oauth':
self.report_warning(
'Login using username and password is not currently supported. '
'Use "--username oauth --password <oauth_token>" to login using an oauth token')
self._access_token = password
query = self._API_AUTH_QUERY_TEMPLATE % self._CLIENT_ID
payload = {'session': {'access_token': self._access_token}}
token_verification = Request(self._API_VERIFY_AUTH_TOKEN % query, json.dumps(payload).encode('utf-8'))
response = self._download_json(token_verification, None, note='Verifying login token...', fatal=False)
if response is not False:
self._HEADERS = {'Authorization': 'OAuth ' + self._access_token}
def _verify_oauth_token(self, token):
if self._request_webpage(
self._API_VERIFY_AUTH_TOKEN % (self._API_AUTH_QUERY_TEMPLATE % self._CLIENT_ID),
None, note='Verifying login token...', fatal=False,
data=json.dumps({'session': {'access_token': token}}).encode()):
self._HEADERS['Authorization'] = f'OAuth {token}'
self.report_login()
else:
self.report_warning('Provided authorization token seems to be invalid. Continue as guest')
self.report_warning('Provided authorization token is invalid. Continuing as guest')
def _real_initialize(self):
if self._HEADERS:
return
if token := try_call(lambda: self._get_cookies(self._BASE_URL)['oauth_token'].value):
self._verify_oauth_token(token)
def _perform_login(self, username, password):
if username != 'oauth':
raise ExtractorError(
'Login using username and password is not currently supported. '
'Use "--username oauth --password <oauth_token>" to login using an oauth token, '
f'or else {self._login_hint(method="cookies")}', expected=True)
if self._HEADERS:
return
self._verify_oauth_token(password)
r'''
def genDevId():
@ -147,14 +153,17 @@ class SoundcloudBaseIE(InfoExtractor):
'user_agent': self._USER_AGENT
}
query = self._API_AUTH_QUERY_TEMPLATE % self._CLIENT_ID
login = sanitized_Request(self._API_AUTH_URL_PW % query, json.dumps(payload).encode('utf-8'))
response = self._download_json(login, None)
self._access_token = response.get('session').get('access_token')
if not self._access_token:
self.report_warning('Unable to get access token, login may has failed')
else:
self._HEADERS = {'Authorization': 'OAuth ' + self._access_token}
response = self._download_json(
self._API_AUTH_URL_PW % (self._API_AUTH_QUERY_TEMPLATE % self._CLIENT_ID),
None, note='Verifying login token...', fatal=False,
data=json.dumps(payload).encode())
if token := traverse_obj(response, ('session', 'access_token', {str})):
self._HEADERS['Authorization'] = f'OAuth {token}'
self.report_login()
return
raise ExtractorError('Unable to get access token, login may have failed', expected=True)
'''
# signature generation
@ -217,6 +226,7 @@ class SoundcloudBaseIE(InfoExtractor):
'filesize': int_or_none(urlh.headers.get('Content-Length')),
'url': format_url,
'quality': 10,
'format_note': 'Original',
})
def invalid_url(url):
@ -233,9 +243,13 @@ class SoundcloudBaseIE(InfoExtractor):
format_id_list.append(protocol)
ext = f.get('ext')
if ext == 'aac':
f['abr'] = '256'
f.update({
'abr': 256,
'quality': 5,
'format_note': 'Premium',
})
for k in ('ext', 'abr'):
v = f.get(k)
v = str_or_none(f.get(k))
if v:
format_id_list.append(v)
preview = is_preview or re.search(r'/(?:preview|playlist)/0/30/', f['url'])
@ -256,16 +270,25 @@ class SoundcloudBaseIE(InfoExtractor):
formats.append(f)
# New API
transcodings = try_get(
info, lambda x: x['media']['transcodings'], list) or []
for t in transcodings:
if not isinstance(t, dict):
continue
format_url = url_or_none(t.get('url'))
if not format_url:
continue
stream = None if extract_flat else self._download_json(
format_url, track_id, query=query, fatal=False, headers=self._HEADERS)
for t in traverse_obj(info, ('media', 'transcodings', lambda _, v: url_or_none(v['url']))):
if extract_flat:
break
format_url = t['url']
stream = None
for retry in self.RetryManager(fatal=False):
try:
stream = self._download_json(format_url, track_id, query=query, headers=self._HEADERS)
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 429:
self.report_warning(
'You have reached the API rate limit, which is ~600 requests per '
'10 minutes. Use the --extractor-retries and --retry-sleep options '
'to configure an appropriate retry count and wait time', only_once=True)
retry.error = e.cause
else:
self.report_warning(e.msg)
if not isinstance(stream, dict):
continue
stream_url = url_or_none(stream.get('url'))
@ -338,7 +361,7 @@ class SoundcloudBaseIE(InfoExtractor):
'like_count': extract_count('favoritings') or extract_count('likes'),
'comment_count': extract_count('comment'),
'repost_count': extract_count('reposts'),
'genre': info.get('genre'),
'genres': traverse_obj(info, ('genre', {str}, {lambda x: x or None}, all)),
'formats': formats if not extract_flat else None
}
@ -372,10 +395,10 @@ class SoundcloudIE(SoundcloudBaseIE):
_TESTS = [
{
'url': 'http://soundcloud.com/ethmusic/lostin-powers-she-so-heavy',
'md5': 'ebef0a451b909710ed1d7787dddbf0d7',
'md5': 'de9bac153e7427a7333b4b0c1b6a18d2',
'info_dict': {
'id': '62986583',
'ext': 'mp3',
'ext': 'opus',
'title': 'Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1',
'description': 'No Downloads untill we record the finished version this weekend, i was too pumped n i had to post it , earl is prolly gonna b hella p.o\'d',
'uploader': 'E.T. ExTerrestrial Music',
@ -388,6 +411,9 @@ class SoundcloudIE(SoundcloudBaseIE):
'like_count': int,
'comment_count': int,
'repost_count': int,
'thumbnail': 'https://i1.sndcdn.com/artworks-000031955188-rwb18x-original.jpg',
'uploader_url': 'https://soundcloud.com/ethmusic',
'genres': [],
}
},
# geo-restricted
@ -395,7 +421,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'url': 'https://soundcloud.com/the-concept-band/goldrushed-mastered?in=the-concept-band/sets/the-royal-concept-ep',
'info_dict': {
'id': '47127627',
'ext': 'mp3',
'ext': 'opus',
'title': 'Goldrushed',
'description': 'From Stockholm Sweden\r\nPovel / Magnus / Filip / David\r\nwww.theroyalconcept.com',
'uploader': 'The Royal Concept',
@ -408,6 +434,9 @@ class SoundcloudIE(SoundcloudBaseIE):
'like_count': int,
'comment_count': int,
'repost_count': int,
'uploader_url': 'https://soundcloud.com/the-concept-band',
'thumbnail': 'https://i1.sndcdn.com/artworks-v8bFHhXm7Au6-0-original.jpg',
'genres': ['Alternative'],
},
},
# private link
@ -429,6 +458,9 @@ class SoundcloudIE(SoundcloudBaseIE):
'like_count': int,
'comment_count': int,
'repost_count': int,
'uploader_url': 'https://soundcloud.com/jaimemf',
'thumbnail': 'https://a1.sndcdn.com/images/default_avatar_large.png',
'genres': ['youtubedl'],
},
},
# private link (alt format)
@ -450,6 +482,9 @@ class SoundcloudIE(SoundcloudBaseIE):
'like_count': int,
'comment_count': int,
'repost_count': int,
'uploader_url': 'https://soundcloud.com/jaimemf',
'thumbnail': 'https://a1.sndcdn.com/images/default_avatar_large.png',
'genres': ['youtubedl'],
},
},
# downloadable song
@ -459,6 +494,21 @@ class SoundcloudIE(SoundcloudBaseIE):
'info_dict': {
'id': '343609555',
'ext': 'wav',
'title': 'The Following',
'description': '',
'uploader': '80M',
'uploader_id': '312384765',
'uploader_url': 'https://soundcloud.com/the80m',
'upload_date': '20170922',
'timestamp': 1506120436,
'duration': 397.228,
'thumbnail': 'https://i1.sndcdn.com/artworks-000243916348-ktoo7d-original.jpg',
'license': 'all-rights-reserved',
'like_count': int,
'comment_count': int,
'repost_count': int,
'view_count': int,
'genres': ['Dance & EDM'],
},
},
# private link, downloadable format
@ -480,6 +530,9 @@ class SoundcloudIE(SoundcloudBaseIE):
'like_count': int,
'comment_count': int,
'repost_count': int,
'thumbnail': 'https://i1.sndcdn.com/artworks-000240712245-kedn4p-original.jpg',
'uploader_url': 'https://soundcloud.com/oriuplift',
'genres': ['Trance'],
},
},
# no album art, use avatar pic for thumbnail
@ -502,6 +555,8 @@ class SoundcloudIE(SoundcloudBaseIE):
'like_count': int,
'comment_count': int,
'repost_count': int,
'uploader_url': 'https://soundcloud.com/garyvee',
'genres': [],
},
'params': {
'skip_download': True,
@ -509,13 +564,13 @@ class SoundcloudIE(SoundcloudBaseIE):
},
{
'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer',
'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7',
'md5': '8227c3473a4264df6b02ad7e5b7527ac',
'info_dict': {
'id': '583011102',
'ext': 'mp3',
'ext': 'opus',
'title': 'Mezzo Valzer',
'description': 'md5:4138d582f81866a530317bae316e8b61',
'uploader': 'Micronie',
'description': 'md5:f4d5f39d52e0ccc2b4f665326428901a',
'uploader': 'Giovanni Sarani',
'uploader_id': '3352531',
'timestamp': 1551394171,
'upload_date': '20190228',
@ -526,6 +581,8 @@ class SoundcloudIE(SoundcloudBaseIE):
'like_count': int,
'comment_count': int,
'repost_count': int,
'genres': ['Piano'],
'uploader_url': 'https://soundcloud.com/giovannisarani',
},
},
{

Wyświetl plik

@ -174,7 +174,7 @@ class TheaterComplexTownBaseIE(StacommuBaseIE):
class TheaterComplexTownVODIE(TheaterComplexTownBaseIE):
_VALID_URL = r'https?://(?:www\.)?theater-complex\.town/(?:en/)?videos/episodes/(?P<id>\w+)'
_VALID_URL = r'https?://(?:www\.)?theater-complex\.town/(?:(?:en|ja)/)?videos/episodes/(?P<id>\w+)'
IE_NAME = 'theatercomplextown:vod'
_TESTS = [{
'url': 'https://www.theater-complex.town/videos/episodes/hoxqidYNoAn7bP92DN6p78',
@ -195,6 +195,9 @@ class TheaterComplexTownVODIE(TheaterComplexTownBaseIE):
}, {
'url': 'https://www.theater-complex.town/en/videos/episodes/6QT7XYwM9dJz5Gf9VB6K5y',
'only_matching': True,
}, {
'url': 'https://www.theater-complex.town/ja/videos/episodes/hoxqidYNoAn7bP92DN6p78',
'only_matching': True,
}]
_API_PATH = 'videoEpisodes'
@ -204,7 +207,7 @@ class TheaterComplexTownVODIE(TheaterComplexTownBaseIE):
class TheaterComplexTownPPVIE(TheaterComplexTownBaseIE):
_VALID_URL = r'https?://(?:www\.)?theater-complex\.town/(?:en/)?ppv/(?P<id>\w+)'
_VALID_URL = r'https?://(?:www\.)?theater-complex\.town/(?:(?:en|ja)/)?ppv/(?P<id>\w+)'
IE_NAME = 'theatercomplextown:ppv'
_TESTS = [{
'url': 'https://www.theater-complex.town/ppv/wytW3X7khrjJBUpKuV3jen',
@ -223,6 +226,9 @@ class TheaterComplexTownPPVIE(TheaterComplexTownBaseIE):
}, {
'url': 'https://www.theater-complex.town/en/ppv/wytW3X7khrjJBUpKuV3jen',
'only_matching': True,
}, {
'url': 'https://www.theater-complex.town/ja/ppv/qwUVmLmGEiZ3ZW6it9uGys',
'only_matching': True,
}]
_API_PATH = 'events'

Wyświetl plik

@ -41,7 +41,7 @@ class STVPlayerIE(InfoExtractor):
ptype, video_id = self._match_valid_url(url).groups()
webpage = self._download_webpage(url, video_id, fatal=False) or ''
props = self._search_nextjs_data(webpage, video_id, default='{}').get('props') or {}
props = self._search_nextjs_data(webpage, video_id, default={}).get('props') or {}
player_api_cache = try_get(
props, lambda x: x['initialReduxState']['playerApiCache']) or {}

Wyświetl plik

@ -1,8 +1,7 @@
from __future__ import annotations
import functools
import json
from functools import partial
from textwrap import dedent
import textwrap
from .common import InfoExtractor
from ..utils import ExtractorError, format_field, int_or_none, parse_iso8601
@ -10,7 +9,7 @@ from ..utils.traversal import traverse_obj
def _fmt_url(url):
return partial(format_field, template=url, default=None)
return functools.partial(format_field, template=url, default=None)
class TelewebionIE(InfoExtractor):
@ -88,7 +87,7 @@ class TelewebionIE(InfoExtractor):
if not video_id.startswith('0x'):
video_id = hex(int(video_id))
episode_data = self._call_graphql_api('getEpisodeDetail', video_id, dedent('''
episode_data = self._call_graphql_api('getEpisodeDetail', video_id, textwrap.dedent('''
queryEpisode(filter: {EpisodeID: $EpisodeId}, first: 1) {
title
program {
@ -127,7 +126,7 @@ class TelewebionIE(InfoExtractor):
'formats': (
'channel', 'descriptor', {str},
{_fmt_url(f'https://cdna.telewebion.com/%s/episode/{video_id}/playlist.m3u8')},
{partial(self._extract_m3u8_formats, video_id=video_id, ext='mp4', m3u8_id='hls')}),
{functools.partial(self._extract_m3u8_formats, video_id=video_id, ext='mp4', m3u8_id='hls')}),
}))
info_dict['id'] = video_id
return info_dict

Wyświetl plik

@ -1,7 +1,7 @@
import base64
import datetime as dt
import functools
import itertools
from datetime import datetime
from .common import InfoExtractor
from ..networking import HEADRequest
@ -70,7 +70,7 @@ class TenPlayIE(InfoExtractor):
username, password = self._get_login_info()
if username is None or password is None:
self.raise_login_required('Your 10play account\'s details must be provided with --username and --password.')
_timestamp = datetime.now().strftime('%Y%m%d000000')
_timestamp = dt.datetime.now().strftime('%Y%m%d000000')
_auth_header = base64.b64encode(_timestamp.encode('ascii')).decode('ascii')
data = self._download_json('https://10play.com.au/api/user/auth', video_id, 'Getting bearer token', headers={
'X-Network-Ten-Auth': _auth_header,

Some files were not shown because too many files have changed in this diff Show More