Merge branch 'yt-dlp:master' into biliSearchPageIE

This commit is contained in:
N/Ame 2024-07-02 17:31:58 +12:00 committed by GitHub
commit a9ac7d7f99
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
61 changed files with 1492 additions and 642 deletions

View File

@ -525,6 +525,10 @@ jobs:
# make sure SHA sums are also printed to stdout
sha256sum -- * | tee ../SHA2-256SUMS
sha512sum -- * | tee ../SHA2-512SUMS
# also print as permanent annotations to the summary page
while read -r shasum; do
echo "::notice title=${shasum##* }::sha256: ${shasum% *}"
done < ../SHA2-256SUMS
- name: Make Update spec
run: |

View File

@ -127,7 +127,7 @@ ### Are you willing to share account details if needed?
### Is the website primarily used for piracy?
We follow [youtube-dl's policy](https://github.com/ytdl-org/youtube-dl#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free) to not support services that is primarily used for infringing copyright. Additionally, it has been decided to not to support porn sites that specialize in fakes. We also cannot support any service that serves only [DRM protected content](https://en.wikipedia.org/wiki/Digital_rights_management).
We follow [youtube-dl's policy](https://github.com/ytdl-org/youtube-dl#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free) to not support services that is primarily used for infringing copyright. Additionally, it has been decided to not to support porn sites that specialize in fakes. We also cannot support any service that serves only [DRM protected content](https://en.wikipedia.org/wiki/Digital_rights_management).
@ -215,8 +215,8 @@ ## Adding support for a new site
```python
from .common import InfoExtractor
class YourExtractorIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?yourextractor\.com/watch/(?P<id>[0-9]+)'
_TESTS = [{
@ -244,7 +244,7 @@ ## Adding support for a new site
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
# TODO more code goes here, for example ...
title = self._html_search_regex(r'<h1>(.+?)</h1>', webpage, 'title')
@ -320,7 +320,7 @@ #### Example
```python
meta = self._download_json(url, video_id)
```
Assume at this point `meta`'s layout is:
```python
@ -750,7 +750,7 @@ ### Use convenience conversion and parsing functions
Use `traverse_obj` and `try_call` (superseeds `dict_get` and `try_get`) for safe metadata extraction from parsed JSON.
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
Explore [`yt_dlp/utils/`](yt_dlp/utils/) for more useful convenience functions.

View File

@ -631,3 +631,16 @@ voidful
vtexier
WyohKnott
trueauracoral
ASertacAkkaya
axpauls
chilinux
hafeoz
JSubelj
jucor
megumintyan
mgedmin
Niluge-KiWi
peisenwang
TheZ3ro
tippfehlr
varunchopra

View File

@ -4,6 +4,87 @@ # Changelog
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
-->
### 2024.07.01
#### Important changes
- Security: [[CVE-2024-10123](https://nvd.nist.gov/vuln/detail/CVE-2024-10123)] [Properly sanitize file-extension to prevent file system modification and RCE](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-79w7-vh3h-8g4j)
- Unsafe extensions are now blocked from being downloaded
#### Core changes
- [Add `playlist_channel` and `playlist_channel_id` fields](https://github.com/yt-dlp/yt-dlp/commit/55e3e6fd21e741ec5ae3d8624de5e5ea345810eb) ([#10266](https://github.com/yt-dlp/yt-dlp/issues/10266)) by [bashonly](https://github.com/bashonly)
- [Disallow unsafe extensions (CVE-2024-38519)](https://github.com/yt-dlp/yt-dlp/commit/5ce582448ececb8d9c30c8c31f58330090ced03a) by [Grub4K](https://github.com/Grub4K)
- **cookies**: [Fix `--cookies-from-browser` DE detection on Linux](https://github.com/yt-dlp/yt-dlp/commit/a8520244b8642880e4d35925e9e49eff94d548de) ([#10237](https://github.com/yt-dlp/yt-dlp/issues/10237)) by [peisenwang](https://github.com/peisenwang)
#### Extractor changes
- **afreecatv**
- [Support browser impersonation](https://github.com/yt-dlp/yt-dlp/commit/e8352ad6599de7b5371dc39a1a1edc7890aaedb4) ([#10174](https://github.com/yt-dlp/yt-dlp/issues/10174)) by [hui1601](https://github.com/hui1601)
- catchstory: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/054a3ba7d1293f9fbe21800d62d1e5ddcbded238) ([#10235](https://github.com/yt-dlp/yt-dlp/issues/10235)) by [hui1601](https://github.com/hui1601)
- **bilibili**: [Support legacy formats](https://github.com/yt-dlp/yt-dlp/commit/1d6ab17d0752ee9cf19e3e63c7dec7b600d3f228) ([#9117](https://github.com/yt-dlp/yt-dlp/issues/9117)) by [c-basalt](https://github.com/c-basalt), [GD-Slime](https://github.com/GD-Slime)
- **bitchute**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/5b1a2aa978d0074cee278e7659f32f52ecc4ab53) ([#10301](https://github.com/yt-dlp/yt-dlp/issues/10301)) by [seproDev](https://github.com/seproDev)
- **brightcove**: [Upgrade requests to HTTPS](https://github.com/yt-dlp/yt-dlp/commit/90c3721a322756bb7f4ca10ceb73744500bee37e) ([#10202](https://github.com/yt-dlp/yt-dlp/issues/10202)) by [bashonly](https://github.com/bashonly)
- **cloudflarestream**: [Fix `_VALID_URL` and embed extraction](https://github.com/yt-dlp/yt-dlp/commit/7aa322c02cec54eb77154a89da7e400194f0bd03) ([#10215](https://github.com/yt-dlp/yt-dlp/issues/10215)) by [bashonly](https://github.com/bashonly)
- **cloudycdn**: [Fix formats extraction](https://github.com/yt-dlp/yt-dlp/commit/b758877afa225747fba81c8a580e27583a231734) ([#10271](https://github.com/yt-dlp/yt-dlp/issues/10271)) by [Caesim404](https://github.com/Caesim404)
- **digitalconcerthall**: [Rework extractor](https://github.com/yt-dlp/yt-dlp/commit/2a4f2e82dbeeb0c9130883c83dac689d5260c871) ([#10152](https://github.com/yt-dlp/yt-dlp/issues/10152)) by [seproDev](https://github.com/seproDev), [tippfehlr](https://github.com/tippfehlr)
- **facebook**: reel: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/8ca1d57ed08d00efa117820a5a82f763b20e2d1d) ([#10232](https://github.com/yt-dlp/yt-dlp/issues/10232)) by [bashonly](https://github.com/bashonly)
- **francetv**
- [Detect and raise errors for DRM](https://github.com/yt-dlp/yt-dlp/commit/3690c2f59827c79a1bbe388a7c1ae75db7477db2) ([#10165](https://github.com/yt-dlp/yt-dlp/issues/10165)) by [bashonly](https://github.com/bashonly)
- [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/081708d6074dfbb907e25af61ba530bba0d4b31d) ([#10177](https://github.com/yt-dlp/yt-dlp/issues/10177)) by [bashonly](https://github.com/bashonly)
- **generic**: [Add `key_query` extractor-arg](https://github.com/yt-dlp/yt-dlp/commit/5dbac313ae4e3e8521dfe2e1a6a048a98ff4b4fe) by [bashonly](https://github.com/bashonly)
- **graspop**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/1d369b4096d79233e0ac2c93762746a64d7a69c8) ([#10268](https://github.com/yt-dlp/yt-dlp/issues/10268)) by [Niluge-KiWi](https://github.com/Niluge-KiWi)
- **jiocinema**: series: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/61714f46956f61612032bba857aed7ad1387eccd) ([#10139](https://github.com/yt-dlp/yt-dlp/issues/10139)) by [varunchopra](https://github.com/varunchopra)
- **khanacademy**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/4093eb1fcc29a0e2aea9adfcba479787d9ae0c0c) ([#9136](https://github.com/yt-dlp/yt-dlp/issues/9136)) by [c-basalt](https://github.com/c-basalt)
- **laracasts**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/b8da8a98f897599095d4ef1644b8c5fd39921118) ([#10055](https://github.com/yt-dlp/yt-dlp/issues/10055)) by [ASertacAkkaya](https://github.com/ASertacAkkaya), [seproDev](https://github.com/seproDev)
- **matchtv**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/f3411af12e209bc5624e1ac31271b8aabe2d3c90) ([#10190](https://github.com/yt-dlp/yt-dlp/issues/10190)) by [megumintyan](https://github.com/megumintyan)
- **mediasite**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/0953209a857c51648aee89d205c086b0e1dd3864) ([#10273](https://github.com/yt-dlp/yt-dlp/issues/10273)) by [bashonly](https://github.com/bashonly)
- **microsoftembed**: [Add extractors for dev materials](https://github.com/yt-dlp/yt-dlp/commit/9200bc70c94546b2191bb6fbfc9cea98a919cc56) ([#9177](https://github.com/yt-dlp/yt-dlp/issues/9177)) by [c-basalt](https://github.com/c-basalt)
- **mlbtv**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/61edf57f8f13f6dfd81154174e647eb5fdd26089) ([#10296](https://github.com/yt-dlp/yt-dlp/issues/10296)) by [bashonly](https://github.com/bashonly)
- **neteasemusic**: [Extract more formats from new API](https://github.com/yt-dlp/yt-dlp/commit/7a03f88c40b80d3cf54f68edd9d4bdd6aa527570) ([#10258](https://github.com/yt-dlp/yt-dlp/issues/10258)) by [hafeoz](https://github.com/hafeoz)
- **nhkradiru**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/b8e2a5e0e1030076f833917906e19bb6c7b318f6) ([#10106](https://github.com/yt-dlp/yt-dlp/issues/10106)) by [garret1317](https://github.com/garret1317)
- **nuum**: [Fix formats extraction](https://github.com/yt-dlp/yt-dlp/commit/aefede25561a06cba398d4f593eee2fbe942693b) ([#10316](https://github.com/yt-dlp/yt-dlp/issues/10316)) by [DmitryScaletta](https://github.com/DmitryScaletta)
- **orf**
- on
- [Add `prefer_segments_playlist` extractor-arg](https://github.com/yt-dlp/yt-dlp/commit/e6a22834df1776ec4e486526f6df2bf53cb7e06f) ([#10314](https://github.com/yt-dlp/yt-dlp/issues/10314)) by [seproDev](https://github.com/seproDev)
- [Support segmented episodes](https://github.com/yt-dlp/yt-dlp/commit/8b46ad4d8b8ee8c5472af0cde863baa89ca3f425) ([#10053](https://github.com/yt-dlp/yt-dlp/issues/10053)) by [seproDev](https://github.com/seproDev)
- **patreoncampaign**: [Fix `campaign_id` extraction](https://github.com/yt-dlp/yt-dlp/commit/2e5a47da400b645aadbda6afd1156bd89c744f48) ([#10070](https://github.com/yt-dlp/yt-dlp/issues/10070)) by [bashonly](https://github.com/bashonly)
- **podbayfm**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/d4b52ce3fcb8d9578ed12365648eaba8718c603e) ([#10195](https://github.com/yt-dlp/yt-dlp/issues/10195)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
- **pokergo**: [Make metadata extraction non-fatal](https://github.com/yt-dlp/yt-dlp/commit/36e8dd832579b5375a0f6626af4268b86b4eb21a) ([#10319](https://github.com/yt-dlp/yt-dlp/issues/10319)) by [axpauls](https://github.com/axpauls)
- **qqmusic**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/4f5d7be3c5590bb257d8ff521572aee9839ab754) ([#9768](https://github.com/yt-dlp/yt-dlp/issues/9768)) by [c-basalt](https://github.com/c-basalt)
- **rtvslo.si**: show: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/92a1c4abaeeba9a69d611c57b73555cb1a1f00ad) ([#8418](https://github.com/yt-dlp/yt-dlp/issues/8418)) by [JSubelj](https://github.com/JSubelj), [seproDev](https://github.com/seproDev)
- **soundcloud**: [Fix `download` format extraction](https://github.com/yt-dlp/yt-dlp/commit/e53e56b73543799638fa6abb0c78f8b091aa84e1) ([#10125](https://github.com/yt-dlp/yt-dlp/issues/10125)) by [bashonly](https://github.com/bashonly)
- **sproutvideo**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/d6c2c2bc84f1434255be5c73baeb17d893d2c0d4) ([#10098](https://github.com/yt-dlp/yt-dlp/issues/10098)) by [bashonly](https://github.com/bashonly), [TheZ3ro](https://github.com/TheZ3ro)
- **tiktok**
- [Detect and raise when login is required](https://github.com/yt-dlp/yt-dlp/commit/ea88129784fcbb6987161df9ba05909325d8e2e9) ([#10124](https://github.com/yt-dlp/yt-dlp/issues/10124)) by [bashonly](https://github.com/bashonly)
- [Fix API extraction](https://github.com/yt-dlp/yt-dlp/commit/96472d72f29550c25c5dcedcde02c38c192b0011) ([#10216](https://github.com/yt-dlp/yt-dlp/issues/10216)) by [bashonly](https://github.com/bashonly)
- **tubitv**
- [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/bef9a9e5361fd7a72e21d0f1a8c8afb70d89e8c5) ([#9975](https://github.com/yt-dlp/yt-dlp/issues/9975)) by [chilinux](https://github.com/chilinux)
- series: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/d7d861811c15585a4f7ec9d5ae68d2ac28de28a0) ([#10116](https://github.com/yt-dlp/yt-dlp/issues/10116)) by [bashonly](https://github.com/bashonly)
- **vimeo**: [Support browser impersonation](https://github.com/yt-dlp/yt-dlp/commit/d4b99a233314bf31f9c842035ea9884673d5313a) ([#10327](https://github.com/yt-dlp/yt-dlp/issues/10327)) by [bashonly](https://github.com/bashonly)
- **youtube**
- [Extract all formats from multi-language m3u8s](https://github.com/yt-dlp/yt-dlp/commit/9bd85019931927a99b0fe0dc58ac51acca9fbe72) ([#9875](https://github.com/yt-dlp/yt-dlp/issues/9875)) by [bashonly](https://github.com/bashonly), [clienthax](https://github.com/clienthax)
- [Skip formats if nsig decoding fails](https://github.com/yt-dlp/yt-dlp/commit/800ec085ccf98420584d8bb38c20a2c079669b09) ([#10223](https://github.com/yt-dlp/yt-dlp/issues/10223)) by [bashonly](https://github.com/bashonly)
- [Suppress "Unavailable videos are hidden" warning](https://github.com/yt-dlp/yt-dlp/commit/24f3097ea9a470a984d0454dc013cafa2325f5f8) ([#10159](https://github.com/yt-dlp/yt-dlp/issues/10159)) by [mgedmin](https://github.com/mgedmin)
- tab: [Fix channel metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/a0d9967f6822fc279e86bce33464194985148727) ([#10071](https://github.com/yt-dlp/yt-dlp/issues/10071)) by [bashonly](https://github.com/bashonly), [shoxie007](https://github.com/shoxie007)
#### Downloader changes
- **hls**: [Apply `extra_param_to_key_url` from info dict](https://github.com/yt-dlp/yt-dlp/commit/ca8885edd93bdf8912af6c22ee335b6222cb9ba9) by [bashonly](https://github.com/bashonly)
#### Postprocessor changes
- **embedthumbnail**: [Fix postprocessor](https://github.com/yt-dlp/yt-dlp/commit/f2a4ea1794718e4dc0148bc172cb877f1080903b) ([#10248](https://github.com/yt-dlp/yt-dlp/issues/10248)) by [Grub4K](https://github.com/Grub4K)
#### Networking changes
- **Request Handler**: requests: [Bump minimum `requests` version to 2.32.2](https://github.com/yt-dlp/yt-dlp/commit/db50f19d76c6870a5a13d0cab9287d684fd7449a) ([#10079](https://github.com/yt-dlp/yt-dlp/issues/10079)) by [bashonly](https://github.com/bashonly)
#### Misc. changes
- **build**
- [Bump Pyinstaller to `>=6.7.0` for all builds](https://github.com/yt-dlp/yt-dlp/commit/5fdd13006a1c5d78642c8d3c4c7df0448273c2ae) ([#10069](https://github.com/yt-dlp/yt-dlp/issues/10069)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
- [Cache dependencies for `macos` job](https://github.com/yt-dlp/yt-dlp/commit/46c1b7cfec1d0e6155083ca7e6948674c64ecb97) ([#10088](https://github.com/yt-dlp/yt-dlp/issues/10088)) by [bashonly](https://github.com/bashonly)
- [Use `macos-12` image for `yt-dlp_macos`](https://github.com/yt-dlp/yt-dlp/commit/03334d639d5282cd4107edb32c623ba400262fc4) ([#10063](https://github.com/yt-dlp/yt-dlp/issues/10063)) by [bashonly](https://github.com/bashonly)
- **cleanup**
- [Add more ruff rules](https://github.com/yt-dlp/yt-dlp/commit/add96eb9f84cfffe85682bf2fb85135746994ee8) ([#10149](https://github.com/yt-dlp/yt-dlp/issues/10149)) by [seproDev](https://github.com/seproDev)
- [Bump ruff to 0.5.x](https://github.com/yt-dlp/yt-dlp/commit/7814c50948a2b9a4c746441ecbc509ae563d5d1f) ([#10282](https://github.com/yt-dlp/yt-dlp/issues/10282)) by [seproDev](https://github.com/seproDev)
- Miscellaneous: [6aaf96a](https://github.com/yt-dlp/yt-dlp/commit/6aaf96a3d6e7d0d426e97e11a2fcf52fda00e733) by [bashonly](https://github.com/bashonly), [c-basalt](https://github.com/c-basalt), [jucor](https://github.com/jucor), [seproDev](https://github.com/seproDev)
- **test**: download: [Raise on network errors](https://github.com/yt-dlp/yt-dlp/commit/54a63e80af82791d2f0985bd0176bb182963fd5f) ([#10283](https://github.com/yt-dlp/yt-dlp/issues/10283)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
### 2024.05.27
#### Extractor changes

View File

@ -61,3 +61,10 @@ ## [Grub4K](https://github.com/Grub4K)
* Reworked internals like `traverse_obj`, various core refactors and bugs fixes
* Implemented proper progress reporting for parallel downloads
* Improved/fixed/added Bundestag, crunchyroll, pr0gramm, Twitter, WrestleUniverse etc
## [sepro](https://github.com/seproDev)
* UX improvements: Warn when ffmpeg is missing, warn when double-clicking exe
* Code cleanup: Remove dead extractors, mark extractors as broken, enable/apply ruff rules
* Improved/fixed/added ArdMediathek, DRTV, Floatplane, MagentaMusik, Naver, Nebula, OnDemandKorea, Vbox7 etc

View File

@ -141,7 +141,7 @@ ## UPDATE
If you [installed with pip](https://github.com/yt-dlp/yt-dlp/wiki/Installation#with-pip), simply re-run the same command that was used to install the program
For other third-party package managers, see [the wiki](https://github.com/yt-dlp/yt-dlp/wiki/Installation#third-party-package-managers) or refer their documentation
For other third-party package managers, see [the wiki](https://github.com/yt-dlp/yt-dlp/wiki/Installation#third-party-package-managers) or refer to their documentation
<a id="update-channels"></a>
@ -184,10 +184,10 @@ ## DEPENDENCIES
### Strongly recommended
* [**ffmpeg** and **ffprobe**](https://www.ffmpeg.org) - Required for [merging separate video and audio files](#format-selection) as well as for various [post-processing](#post-processing-options) tasks. License [depends on the build](https://www.ffmpeg.org/legal.html)
* [**ffmpeg** and **ffprobe**](https://www.ffmpeg.org) - Required for [merging separate video and audio files](#format-selection), as well as for various [post-processing](#post-processing-options) tasks. License [depends on the build](https://www.ffmpeg.org/legal.html)
There are bugs in ffmpeg that cause various issues when used alongside yt-dlp. Since ffmpeg is such an important dependency, we provide [custom builds](https://github.com/yt-dlp/FFmpeg-Builds#ffmpeg-static-auto-builds) with patches for some of these issues at [yt-dlp/FFmpeg-Builds](https://github.com/yt-dlp/FFmpeg-Builds). See [the readme](https://github.com/yt-dlp/FFmpeg-Builds#patches-applied) for details on the specific issues solved by these builds
**Important**: What you need is ffmpeg *binary*, **NOT** [the Python package of the same name](https://pypi.org/project/ffmpeg)
### Networking
@ -198,7 +198,7 @@ ### Networking
#### Impersonation
The following provide support for impersonating browser requests. This may be required for some sites that employ TLS fingerprinting.
The following provide support for impersonating browser requests. This may be required for some sites that employ TLS fingerprinting.
* [**curl_cffi**](https://github.com/yifeikong/curl_cffi) (recommended) - Python binding for [curl-impersonate](https://github.com/lwthiker/curl-impersonate). Provides impersonation targets for Chrome, Edge and Safari. Licensed under [MIT](https://github.com/yifeikong/curl_cffi/blob/main/LICENSE)
* Can be installed with the `curl-cffi` group, e.g. `pip install "yt-dlp[default,curl-cffi]"`
@ -275,7 +275,7 @@ ### Standalone Py2Exe Builds (Windows)
### Related scripts
* **`devscripts/install_deps.py`** - Install dependencies for yt-dlp.
* **`devscripts/update-version.py`** - Update the version number based on current date.
* **`devscripts/update-version.py`** - Update the version number based on the current date.
* **`devscripts/set-variant.py`** - Set the build variant of the executable.
* **`devscripts/make_changelog.py`** - Create a markdown changelog using short commit messages and update `CONTRIBUTORS` file.
* **`devscripts/make_lazy_extractors.py`** - Create lazy extractors. Running this before building the binaries (any variant) will improve their startup performance. Set the environment variable `YTDLP_NO_LAZY_EXTRACTORS=1` if you wish to forcefully disable lazy extractor loading.
@ -456,8 +456,8 @@ ## Video Selection:
is not present, and "&" to check multiple
conditions. Use a "\" to escape "&" or
quotes if needed. If used multiple times,
the filter matches if atleast one of the
conditions are met. E.g. --match-filter
the filter matches if at least one of the
conditions is met. E.g. --match-filter
!is_live --match-filter "like_count>?100 &
description~='(?i)\bcats \& dogs\b'" matches
only videos that are not live OR those that
@ -674,7 +674,7 @@ ## Filesystem Options:
PROFILE to load cookies from, and the
CONTAINER name (if Firefox) ("none" for no
container) can be given with their
respective seperators. By default, all
respective separators. By default, all
containers of the most recently accessed
profile are used. Currently supported
keyrings are: basictext, gnomekeyring,
@ -1036,7 +1036,7 @@ ## Post-Processing Options:
--print/--output), "before_dl" (before each
video download), "post_process" (after each
video download; default), "after_move"
(after moving video file to it's final
(after moving video file to its final
locations), "after_video" (after downloading
and processing all formats of a video), or
"playlist" (at end of playlist). This option
@ -1125,7 +1125,7 @@ # CONFIGURATION
* `/etc/yt-dlp/config`
* `/etc/yt-dlp/config.txt`
E.g. with the following configuration file yt-dlp will always extract the audio, not copy the mtime, use a proxy and save all videos under `YouTube` directory in your home directory:
E.g. with the following configuration file, yt-dlp will always extract the audio, not copy the mtime, use a proxy and save all videos under `YouTube` directory in your home directory:
```
# Lines starting with # are comments
@ -1142,7 +1142,7 @@ # Save all videos under YouTube directory in your home directory
-o ~/YouTube/%(title)s.%(ext)s
```
**Note**: Options in configuration file are just the same options aka switches used in regular command line calls; thus there **must be no whitespace** after `-` or `--`, e.g. `-o` or `--proxy` but not `- o` or `-- proxy`. They must also be quoted when necessary as-if it were a UNIX shell.
**Note**: Options in configuration file are just the same options aka switches used in regular command line calls; thus there **must be no whitespace** after `-` or `--`, e.g. `-o` or `--proxy` but not `- o` or `-- proxy`. They must also be quoted when necessary, as if it were a UNIX shell.
You can use `--ignore-config` if you want to disable all configuration files for a particular yt-dlp run. If `--ignore-config` is found inside any configuration file, no further configuration will be loaded. For example, having the option in the portable configuration file prevents loading of home, user, and system configurations. Additionally, (for backward compatibility) if `--ignore-config` is found inside the system configuration file, the user configuration is not loaded.
@ -1154,12 +1154,12 @@ ### Configuration file encoding
### Authentication with netrc
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every yt-dlp execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](https://stackoverflow.com/tags/.netrc/info) on a per-extractor basis. For that you will need to create a `.netrc` file in `--netrc-location` and restrict permissions to read/write by only you:
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every yt-dlp execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](https://stackoverflow.com/tags/.netrc/info) on a per-extractor basis. For that, you will need to create a `.netrc` file in `--netrc-location` and restrict permissions to read/write by only you:
```
touch ${HOME}/.netrc
chmod a-rwx,u+rw ${HOME}/.netrc
```
After that you can add credentials for an extractor in the following format, where *extractor* is the name of the extractor in lowercase:
After that, you can add credentials for an extractor in the following format, where *extractor* is the name of the extractor in lowercase:
```
machine <extractor> login <username> password <password>
```
@ -1201,7 +1201,7 @@ # OUTPUT TEMPLATE
The field names themselves (the part inside the parenthesis) can also have some special formatting:
1. **Object traversal**: The dictionaries and lists available in metadata can be traversed by using a dot `.` separator; e.g. `%(tags.0)s`, `%(subtitles.en.-1.ext)s`. You can do Python slicing with colon `:`; E.g. `%(id.3:7:-1)s`, `%(formats.:.format_id)s`. Curly braces `{}` can be used to build dictionaries with only specific keys; e.g. `%(formats.:.{format_id,height})#j`. An empty field name `%()s` refers to the entire infodict; e.g. `%(.{id,title})s`. Note that all the fields that become available using this method are not listed below. Use `-j` to see such fields
1. **Object traversal**: The dictionaries and lists available in metadata can be traversed by using a dot `.` separator; e.g. `%(tags.0)s`, `%(subtitles.en.-1.ext)s`. You can do Python slicing with colon `:`; E.g. `%(id.3:7)s`, `%(id.6:2:-1)s`, `%(formats.:.format_id)s`. Curly braces `{}` can be used to build dictionaries with only specific keys; e.g. `%(formats.:.{format_id,height})#j`. An empty field name `%()s` refers to the entire infodict; e.g. `%(.{id,title})s`. Note that all the fields that become available using this method are not listed below. Use `-j` to see such fields
1. **Arithmetic**: Simple arithmetic can be done on numeric fields using `+`, `-` and `*`. E.g. `%(playlist_index+10)03d`, `%(n_entries+1-playlist_index)d`
@ -1222,7 +1222,7 @@ # OUTPUT TEMPLATE
%(name[.keys][addition][>strf][,alternate][&replacement][|default])[flags][width][.precision][length]type
```
Additionally, you can set different output templates for the various metadata files separately from the general output template by specifying the type of file followed by the template separated by a colon `:`. The different file types supported are `subtitle`, `thumbnail`, `description`, `annotation` (deprecated), `infojson`, `link`, `pl_thumbnail`, `pl_description`, `pl_infojson`, `chapter`, `pl_video`. E.g. `-o "%(title)s.%(ext)s" -o "thumbnail:%(title)s\%(title)s.%(ext)s"` will put the thumbnails in a folder with the same name as the video. If any of the templates is empty, that type of file will not be written. E.g. `--write-thumbnail -o "thumbnail:"` will write thumbnails only for playlists and not for video.
Additionally, you can set different output templates for the various metadata files separately from the general output template by specifying the type of file followed by the template separated by a colon `:`. The different file types supported are `subtitle`, `thumbnail`, `description`, `annotation` (deprecated), `infojson`, `link`, `pl_thumbnail`, `pl_description`, `pl_infojson`, `chapter`, `pl_video`. E.g. `-o "%(title)s.%(ext)s" -o "thumbnail:%(title)s\%(title)s.%(ext)s"` will put the thumbnails in a folder with the same name as the video. If any of the templates is empty, that type of file will not be written. E.g. `--write-thumbnail -o "thumbnail:"` will write thumbnails only for playlists and not for video.
<a id="outtmpl-postprocess-note"></a>
@ -1282,13 +1282,15 @@ # OUTPUT TEMPLATE
- `n_entries` (numeric): Total number of extracted items in the playlist
- `playlist_id` (string): Identifier of the playlist that contains the video
- `playlist_title` (string): Name of the playlist that contains the video
- `playlist` (string): `playlist_id` or `playlist_title`
- `playlist` (string): `playlist_title` if available or else `playlist_id`
- `playlist_count` (numeric): Total number of items in the playlist. May not be known if entire playlist is not extracted
- `playlist_index` (numeric): Index of the video in the playlist padded with leading zeros according the final index
- `playlist_autonumber` (numeric): Position of the video in the playlist download queue padded with leading zeros according to the total length of the playlist
- `playlist_uploader` (string): Full name of the playlist uploader
- `playlist_uploader_id` (string): Nickname or id of the playlist uploader
- `webpage_url` (string): A URL to the video webpage which if given to yt-dlp should allow to get the same result again
- `playlist_channel` (string): Display name of the channel that uploaded the playlist
- `playlist_channel_id` (string): Identifier of the channel that uploaded the playlist
- `webpage_url` (string): A URL to the video webpage which, if given to yt-dlp, should yield the same result again
- `webpage_url_basename` (string): The basename of the webpage URL
- `webpage_url_domain` (string): The domain of the webpage URL
- `original_url` (string): The URL given by the user (or same as `webpage_url` for playlist entries)
@ -1304,10 +1306,10 @@ # OUTPUT TEMPLATE
- `chapter_number` (numeric): Number of the chapter the video belongs to
- `chapter_id` (string): Id of the chapter the video belongs to
Available for the video that is an episode of some series or programme:
Available for the video that is an episode of some series or program:
- `series` (string): Title of the series or programme the video episode belongs to
- `series_id` (string): Id of the series or programme the video episode belongs to
- `series` (string): Title of the series or program the video episode belongs to
- `series_id` (string): Id of the series or program the video episode belongs to
- `season` (string): Title of the season the video episode belongs to
- `season_number` (numeric): Number of the season the video episode belongs to
- `season_id` (string): Id of the season the video episode belongs to
@ -1347,9 +1349,9 @@ # OUTPUT TEMPLATE
- `thumbnails_table` (table): The thumbnail format table as printed by `--list-thumbnails`
- `subtitles_table` (table): The subtitle format table as printed by `--list-subs`
- `automatic_captions_table` (table): The automatic subtitle format table as printed by `--list-subs`
Available only after the video is downloaded (`post_process`/`after_move`):
- `filepath`: Actual path of downloaded video file
Available only in `--sponsorblock-chapter-title`:
@ -1364,7 +1366,7 @@ # OUTPUT TEMPLATE
Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. E.g. for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `yt-dlp test video` and id `BaW_jenozKc`, this will result in a `yt-dlp test video-BaW_jenozKc.mp4` file created in the current directory.
**Note**: Some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with placeholder value provided with `--output-na-placeholder` (`NA` by default).
**Note**: Some of the sequences are not guaranteed to be present, since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with placeholder value provided with `--output-na-placeholder` (`NA` by default).
**Tip**: Look at the `-j` output to identify which fields are available for the particular URL
@ -1442,7 +1444,7 @@ # FORMAT SELECTION
- `all`: Select **all formats** separately
- `mergeall`: Select and **merge all formats** (Must be used with `--audio-multistreams`, `--video-multistreams` or both)
- `b*`, `best*`: Select the best quality format that **contains either** a video or an audio or both (ie; `vcodec!=none or acodec!=none`)
- `b*`, `best*`: Select the best quality format that **contains either** a video or an audio or both (i.e.; `vcodec!=none or acodec!=none`)
- `b`, `best`: Select the best quality format that **contains both** video and audio. Equivalent to `best*[vcodec!=none][acodec!=none]`
- `bv`, `bestvideo`: Select the best quality **video-only** format. Equivalent to `best*[acodec=none]`
- `bv*`, `bestvideo*`: Select the best quality format that **contains video**. It may also contain audio. Equivalent to `best*[vcodec!=none]`
@ -1455,7 +1457,7 @@ # FORMAT SELECTION
- `wa`, `worstaudio`: Select the worst quality audio-only format. Equivalent to `worst*[vcodec=none]`
- `wa*`, `worstaudio*`: Select the worst quality format that contains audio. It may also contain video. Equivalent to `worst*[acodec!=none]`
For example, to download the worst quality video-only format you can use `-f worstvideo`. It is however recommended not to use `worst` and related options. When your format selector is `worst`, the format which is worst in all respects is selected. Most of the time, what you actually want is the video with the smallest filesize instead. So it is generally better to use `-S +size` or more rigorously, `-S +size,+br,+res,+fps` instead of `-f worst`. See [Sorting Formats](#sorting-formats) for more details.
For example, to download the worst quality video-only format you can use `-f worstvideo`. It is, however, recommended not to use `worst` and related options. When your format selector is `worst`, the format which is worst in all respects is selected. Most of the time, what you actually want is the video with the smallest filesize instead. So it is generally better to use `-S +size` or more rigorously, `-S +size,+br,+res,+fps` instead of `-f worst`. See [Sorting Formats](#sorting-formats) for more details.
You can select the n'th best format of a type by using `best<type>.<n>`. For example, `best.2` will select the 2nd best combined format. Similarly, `bv*.3` will select the 3rd best format that contains a video stream.
@ -1505,7 +1507,7 @@ ## Filtering Formats
Any string comparison may be prefixed with negation `!` in order to produce an opposite comparison, e.g. `!*=` (does not contain). The comparand of a string comparison needs to be quoted with either double or single quotes if it contains spaces or special characters other than `._-`.
**Note**: None of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the website. Any other field made available by the extractor can also be used for filtering.
**Note**: None of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by the particular extractor, i.e. the metadata offered by the website. Any other field made available by the extractor can also be used for filtering.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "bv[height<=?720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 kbps. You can also use the filters with `all` to download all formats that satisfy the filter, e.g. `-f "all[vcodec=none]"` selects all audio-only formats.
@ -1544,14 +1546,14 @@ ## Sorting Formats
- `abr`: Average audio bitrate in [kbps](## "1000 bits/sec")
- `br`: Average bitrate in [kbps](## "1000 bits/sec"), `tbr`/`vbr`/`abr`
- `asr`: Audio sample rate in Hz
**Deprecation warning**: Many of these fields have (currently undocumented) aliases, that may be removed in a future version. It is recommended to use only the documented field names.
All fields, unless specified otherwise, are sorted in descending order. To reverse this, prefix the field with a `+`. E.g. `+res` prefers format with the smallest resolution. Additionally, you can suffix a preferred value for the fields, separated by a `:`. E.g. `res:720` prefers larger videos, but no larger than 720p and the smallest video if there are no videos less than 720p. For `codec` and `ext`, you can provide two preferred values, the first for video and the second for audio. E.g. `+codec:avc:m4a` (equivalent to `+vcodec:avc,+acodec:m4a`) sets the video codec preference to `h264` > `h265` > `vp9` > `vp9.2` > `av01` > `vp8` > `h263` > `theora` and audio codec preference to `mp4a` > `aac` > `vorbis` > `opus` > `mp3` > `ac3` > `dts`. You can also make the sorting prefer the nearest values to the provided by using `~` as the delimiter. E.g. `filesize~1G` prefers the format with filesize closest to 1 GiB.
The fields `hasvid` and `ie_pref` are always given highest priority in sorting, irrespective of the user-defined order. This behaviour can be changed by using `--format-sort-force`. Apart from these, the default order used is: `lang,quality,res,fps,hdr:12,vcodec:vp9.2,channels,acodec,size,br,asr,proto,ext,hasaud,source,id`. The extractors may override this default order, but they cannot override the user-provided order.
The fields `hasvid` and `ie_pref` are always given highest priority in sorting, irrespective of the user-defined order. This behavior can be changed by using `--format-sort-force`. Apart from these, the default order used is: `lang,quality,res,fps,hdr:12,vcodec:vp9.2,channels,acodec,size,br,asr,proto,ext,hasaud,source,id`. The extractors may override this default order, but they cannot override the user-provided order.
Note that the default has `vcodec:vp9.2`; i.e. `av1` is not preferred. Similarly, the default for hdr is `hdr:12`; i.e. dolby vision is not preferred. These choices are made since DV and AV1 formats are not yet fully compatible with most devices. This may be changed in the future as more devices become capable of smoothly playing back these formats.
Note that the default has `vcodec:vp9.2`; i.e. `av1` is not preferred. Similarly, the default for hdr is `hdr:12`; i.e. Dolby Vision is not preferred. These choices are made since DV and AV1 formats are not yet fully compatible with most devices. This may be changed in the future as more devices become capable of smoothly playing back these formats.
If your format selector is `worst`, the last item is selected after sorting. This means it will select the format that is worst in all respects. Most of the time, what you actually want is the video with the smallest filesize instead. So it is generally better to use `-f best -S +size,+br,+res,+fps`.
@ -1763,7 +1765,7 @@ # EXTRACTOR ARGUMENTS
#### youtube
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music`, `_embedded`, `_embedscreen`, `_creator` (e.g. `web_embedded`); and `mweb`, `mweb_embedscreen` and `tv_embedded` (agegate bypass) with no variants. By default, `ios,web` is used, but `tv_embedded` and `creator` variants are added as required for age-gated videos. Similarly, the music variants are added for `music.youtube.com` urls. The `android` clients will always be given lowest priority since their formats are broken. You can use `all` to use all the clients, and `default` for the default clients.
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music`, `_embedded`, `_embedscreen`, `_creator` (e.g. `web_embedded`); and `mediaconnect`, `mweb`, `mweb_embedscreen` and `tv_embedded` (agegate bypass) with no variants. By default, `ios,web` is used, but `tv_embedded` and `creator` variants are added as required for age-gated videos. Similarly, the music variants are added for `music.youtube.com` urls. The `android` clients will always be given lowest priority since their formats are broken. You can use `all` to use all the clients, and `default` for the default clients.
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
@ -1849,7 +1851,13 @@ #### afreecatvlive
* `cdn`: One or more CDN IDs to use with the API call for stream URLs, e.g. `gcp_cdn`, `gs_cdn_pc_app`, `gs_cdn_mobile_web`, `gs_cdn_pc_web`
#### soundcloud
* `formats`: Formats to request from the API. Requested values should be in the format of `{protocol}_{extension}` (omitting the bitrate), e.g. `hls_opus,http_aac`. The `*` character functions as a wildcard, e.g. `*_mp3`, and can passed by itself to request all formats. Known protocols include `http`, `hls` and `hls-aes`; known extensions include `aac`, `opus` and `mp3`. Original `download` formats are always extracted. Default is `http_aac,hls_aac,http_opus,hls_opus,http_mp3,hls_mp3`
* `formats`: Formats to request from the API. Requested values should be in the format of `{protocol}_{extension}` (omitting the bitrate), e.g. `hls_opus,http_aac`. The `*` character functions as a wildcard, e.g. `*_mp3`, and can be passed by itself to request all formats. Known protocols include `http`, `hls` and `hls-aes`; known extensions include `aac`, `opus` and `mp3`. Original `download` formats are always extracted. Default is `http_aac,hls_aac,http_opus,hls_opus,http_mp3,hls_mp3`
#### orfon (orf:on)
* `prefer_segments_playlist`: Prefer a playlist of program segments instead of a single complete video when available. If individual segments are desired, use `--concat-playlist never --extractor-args "orfon:prefer_segments_playlist"`
#### bilibili
* `prefer_multi_flv`: Prefer extracting flv formats over mp4 for older videos that still provide legacy formats
**Note**: These options may be changed/removed in the future without concern for backward compatibility
@ -1860,16 +1868,16 @@ # PLUGINS
Note that **all** plugins are imported even if not invoked, and that **there are no checks** performed on plugin code. **Use plugins at your own risk and only if you trust the code!**
Plugins can be of `<type>`s `extractor` or `postprocessor`.
- Extractor plugins do not need to be enabled from the CLI and are automatically invoked when the input URL is suitable for it.
- Extractor plugins take priority over builtin extractors.
Plugins can be of `<type>`s `extractor` or `postprocessor`.
- Extractor plugins do not need to be enabled from the CLI and are automatically invoked when the input URL is suitable for it.
- Extractor plugins take priority over built-in extractors.
- Postprocessor plugins can be invoked using `--use-postprocessor NAME`.
Plugins are loaded from the namespace packages `yt_dlp_plugins.extractor` and `yt_dlp_plugins.postprocessor`.
In other words, the file structure on the disk looks something like:
yt_dlp_plugins/
extractor/
myplugin.py
@ -1917,7 +1925,7 @@ ## Developing Plugins
See the [yt-dlp-sample-plugins](https://github.com/yt-dlp/yt-dlp-sample-plugins) repo for a template plugin package and the [Plugin Development](https://github.com/yt-dlp/yt-dlp/wiki/Plugin-Development) section of the wiki for a plugin development guide.
All public classes with a name ending in `IE`/`PP` are imported from each file for extractors and postprocessors repectively. This respects underscore prefix (e.g. `_MyBasePluginIE` is private) and `__all__`. Modules can similarly be excluded by prefixing the module name with an underscore (e.g. `_myplugin.py`).
All public classes with a name ending in `IE`/`PP` are imported from each file for extractors and postprocessors respectively. This respects underscore prefix (e.g. `_MyBasePluginIE` is private) and `__all__`. Modules can similarly be excluded by prefixing the module name with an underscore (e.g. `_myplugin.py`).
To replace an existing extractor with a subclass of one, set the `plugin_name` class keyword argument (e.g. `class MyPluginIE(ABuiltInIE, plugin_name='myplugin')` will replace `ABuiltInIE` with `MyPluginIE`). Since the extractor replaces the parent, you should exclude the subclass extractor from being imported separately by making it private using one of the methods described above.
@ -1929,7 +1937,7 @@ # EMBEDDING YT-DLP
yt-dlp makes the best effort to be a good command-line program, and thus should be callable from any programming language.
Your program should avoid parsing the normal stdout since they may change in future versions. Instead they should use options such as `-J`, `--print`, `--progress-template`, `--exec` etc to create console output that you can reliably reproduce and parse.
Your program should avoid parsing the normal stdout since they may change in future versions. Instead, they should use options such as `-J`, `--print`, `--progress-template`, `--exec` etc to create console output that you can reliably reproduce and parse.
From a Python program, you can embed yt-dlp in a more powerful fashion, like this:
@ -2221,6 +2229,14 @@ ### Differences in default behavior
* `--compat-options 2022`: Same as `--compat-options 2023,playlist-match-filter,no-external-downloader-progress,prefer-legacy-http-handler,manifest-filesize-approx`
* `--compat-options 2023`: Currently does nothing. Use this to enable all future compat options
The following compat options restore vulnerable behavior from before security patches:
* `--compat-options allow-unsafe-ext`: Allow files with any extension (including unsafe ones) to be downloaded ([GHSA-79w7-vh3h-8g4j](<https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-79w7-vh3h-8g4j>))
> :warning: Only use if a valid file download is rejected because its extension is detected as uncommon
>
> **This option can enable remote code execution! Consider [opening an issue](<https://github.com/yt-dlp/yt-dlp/issues/new/choose>) instead!**
### Deprecated options
These are all the deprecated options and the current alternative to achieve the same effect

View File

@ -169,5 +169,16 @@
"when": "5c019f6328ad40d66561eac3c4de0b3cd070d0f6",
"short": "[cleanup] Misc (#9765)",
"authors": ["bashonly", "Grub4K", "seproDev"]
},
{
"action": "change",
"when": "e6a22834df1776ec4e486526f6df2bf53cb7e06f",
"short": "[ie/orf:on] Add `prefer_segments_playlist` extractor-arg (#10314)",
"authors": ["seproDev"]
},
{
"action": "add",
"when": "6aaf96a3d6e7d0d426e97e11a2fcf52fda00e733",
"short": "[priority] Security: [[CVE-2024-10123](https://nvd.nist.gov/vuln/detail/CVE-2024-10123)] [Properly sanitize file-extension to prevent file system modification and RCE](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-79w7-vh3h-8g4j)\n - Unsafe extensions are now blocked from being downloaded"
}
]

2
devscripts/cli_to_api.py Normal file → Executable file
View File

@ -1,3 +1,5 @@
#!/usr/bin/env python3
# Allow direct execution
import os
import sys

View File

@ -299,7 +299,7 @@ banned-from = [
"string",
"sys",
"time",
"urllib",
"urllib.parse",
"uuid",
"xml",
]

View File

@ -46,6 +46,7 @@ # Supported sites
- **aenetworks:show**
- **AeonCo**
- **afreecatv**: [*afreecatv*](## "netrc machine") afreecatv.com
- **afreecatv:catchstory**: [*afreecatv*](## "netrc machine") afreecatv.com catch story
- **afreecatv:live**: [*afreecatv*](## "netrc machine") afreecatv.com livestreams
- **afreecatv:user**
- **AirTV**
@ -544,6 +545,7 @@ # Supported sites
- **Goshgay**
- **GoToStage**
- **GPUTechConf**
- **Graspop**
- **Gronkh**
- **gronkh:feed**
- **gronkh:vods**
@ -680,6 +682,8 @@ # Supported sites
- **la7.it**
- **la7.it:pod:episode**
- **la7.it:podcast**
- **laracasts**
- **laracasts:series**
- **LastFM**
- **LastFMPlaylist**
- **LastFMUser**
@ -777,7 +781,12 @@ # Supported sites
- **MelonVOD**
- **Metacritic**
- **mewatch**
- **MicrosoftBuild**
- **MicrosoftEmbed**
- **MicrosoftLearnEpisode**
- **MicrosoftLearnPlaylist**
- **MicrosoftLearnSession**
- **MicrosoftMedius**
- **microsoftstream**: Microsoft Stream
- **mildom**: Record ongoing live by specific user in Mildom
- **mildom:clip**: Clip in Mildom
@ -840,8 +849,6 @@ # Supported sites
- **MusicdexArtist**
- **MusicdexPlaylist**
- **MusicdexSong**
- **mva**: Microsoft Virtual Academy videos
- **mva:course**: Microsoft Virtual Academy courses
- **Mx3**
- **Mx3Neo**
- **Mx3Volksmusik**
@ -1133,6 +1140,7 @@ # Supported sites
- **QingTing**
- **qqmusic**: QQ音乐
- **qqmusic:album**: QQ音乐 - 专辑
- **qqmusic:mv**: QQ音乐 - MV
- **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜
@ -1239,6 +1247,7 @@ # Supported sites
- **rtve.es:television**
- **RTVS**
- **rtvslo.si**
- **rtvslo.si:show**
- **RudoVideo**
- **Rule34Video**
- **Rumble**
@ -1362,6 +1371,7 @@ # Supported sites
- **SpreakerShowPage**
- **SpringboardPlatform**
- **Sprout**
- **SproutVideo**
- **sr:mediathek**: Saarländischer Rundfunk (**Currently broken**)
- **SRGSSR**
- **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites
@ -1496,8 +1506,8 @@ # Supported sites
- **Tube8**: (**Currently broken**)
- **TubeTuGraz**: [*tubetugraz*](## "netrc machine") tube.tugraz.at
- **TubeTuGrazSeries**: [*tubetugraz*](## "netrc machine")
- **TubiTv**: [*tubitv*](## "netrc machine")
- **TubiTvShow**
- **tubitv**: [*tubitv*](## "netrc machine")
- **tubitv:series**
- **Tumblr**: [*tumblr*](## "netrc machine")
- **TuneInPodcast**
- **TuneInPodcastEpisode**
@ -1609,6 +1619,7 @@ # Supported sites
- **VidioPremier**: [*vidio*](## "netrc machine")
- **VidLii**
- **Vidly**
- **vids.io**
- **viewlift**
- **viewlift:embed**
- **Viidea**

View File

@ -92,6 +92,7 @@ def test_operators(self):
self._test('function f(){return 0 && 1 || 2;}', 2)
self._test('function f(){return 0 ?? 42;}', 0)
self._test('function f(){return "life, the universe and everything" < 42;}', False)
self._test('function f(){return 0 - 7 * - 6;}', 42)
def test_array_access(self):
self._test('function f(){var x = [1,2,3]; x[0] = 4; x[0] = 5; x[2.0] = 7; return x;}', [5, 2, 7])

View File

@ -130,6 +130,7 @@
xpath_text,
xpath_with_ns,
)
from yt_dlp.utils._utils import _UnsafeExtensionError
from yt_dlp.utils.networking import (
HTTPHeaderDict,
escape_rfc3986,
@ -281,6 +282,13 @@ def env(var):
finally:
os.environ['HOME'] = old_home or ''
_uncommon_extensions = [
('exe', 'abc.exe.ext'),
('de', 'abc.de.ext'),
('../.mp4', None),
('..\\.mp4', None),
]
def test_prepend_extension(self):
self.assertEqual(prepend_extension('abc.ext', 'temp'), 'abc.temp.ext')
self.assertEqual(prepend_extension('abc.ext', 'temp', 'ext'), 'abc.temp.ext')
@ -289,6 +297,19 @@ def test_prepend_extension(self):
self.assertEqual(prepend_extension('.abc', 'temp'), '.abc.temp')
self.assertEqual(prepend_extension('.abc.ext', 'temp'), '.abc.temp.ext')
# Test uncommon extensions
self.assertEqual(prepend_extension('abc.ext', 'bin'), 'abc.bin.ext')
for ext, result in self._uncommon_extensions:
with self.assertRaises(_UnsafeExtensionError):
prepend_extension('abc', ext)
if result:
self.assertEqual(prepend_extension('abc.ext', ext, 'ext'), result)
else:
with self.assertRaises(_UnsafeExtensionError):
prepend_extension('abc.ext', ext, 'ext')
with self.assertRaises(_UnsafeExtensionError):
prepend_extension('abc.unexpected_ext', ext, 'ext')
def test_replace_extension(self):
self.assertEqual(replace_extension('abc.ext', 'temp'), 'abc.temp')
self.assertEqual(replace_extension('abc.ext', 'temp', 'ext'), 'abc.temp')
@ -297,6 +318,16 @@ def test_replace_extension(self):
self.assertEqual(replace_extension('.abc', 'temp'), '.abc.temp')
self.assertEqual(replace_extension('.abc.ext', 'temp'), '.abc.temp')
# Test uncommon extensions
self.assertEqual(replace_extension('abc.ext', 'bin'), 'abc.unknown_video')
for ext, _ in self._uncommon_extensions:
with self.assertRaises(_UnsafeExtensionError):
replace_extension('abc', ext)
with self.assertRaises(_UnsafeExtensionError):
replace_extension('abc.ext', ext, 'ext')
with self.assertRaises(_UnsafeExtensionError):
replace_extension('abc.unexpected_ext', ext, 'ext')
def test_subtitles_filename(self):
self.assertEqual(subtitles_filename('abc.ext', 'en', 'vtt'), 'abc.en.vtt')
self.assertEqual(subtitles_filename('abc.ext', 'en', 'vtt', 'ext'), 'abc.en.vtt')

View File

@ -163,6 +163,10 @@
'https://www.youtube.com/s/player/b7910ca8/player_ias.vflset/en_US/base.js',
'_hXMCwMt9qE310D', 'LoZMgkkofRMCZQ',
),
(
'https://www.youtube.com/s/player/590f65a6/player_ias.vflset/en_US/base.js',
'1tm7-g_A9zsI8_Lay_', 'xI4Vem4Put_rOg',
),
]

View File

@ -4,6 +4,7 @@
import datetime as dt
import errno
import fileinput
import functools
import http.cookiejar
import io
import itertools
@ -24,7 +25,7 @@
import unicodedata
from .cache import Cache
from .compat import functools, urllib # isort: split
from .compat import urllib # isort: split
from .compat import compat_os_name, urllib_req_to_req
from .cookies import LenientSimpleCookie, load_cookies
from .downloader import FFmpegFD, get_suitable_downloader, shorten_protocol_name
@ -158,7 +159,7 @@
write_json_file,
write_string,
)
from .utils._utils import _YDLLogger
from .utils._utils import _UnsafeExtensionError, _YDLLogger
from .utils.networking import (
HTTPHeaderDict,
clean_headers,
@ -171,6 +172,20 @@
import ctypes
def _catch_unsafe_extension_error(func):
@functools.wraps(func)
def wrapper(self, *args, **kwargs):
try:
return func(self, *args, **kwargs)
except _UnsafeExtensionError as error:
self.report_error(
f'The extracted extension ({error.extension!r}) is unusual '
'and will be skipped for safety reasons. '
f'If you believe this is an error{bug_reports_message(",")}')
return wrapper
class YoutubeDL:
"""YoutubeDL class.
@ -453,8 +468,9 @@ class YoutubeDL:
Set the value to 'native' to use the native downloader
compat_opts: Compatibility options. See "Differences in default behavior".
The following options do not work when used through the API:
filename, abort-on-error, multistreams, no-live-chat, format-sort
no-clean-infojson, no-playlist-metafiles, no-keep-subs, no-attach-info-json.
filename, abort-on-error, multistreams, no-live-chat,
format-sort, no-clean-infojson, no-playlist-metafiles,
no-keep-subs, no-attach-info-json, allow-unsafe-ext.
Refer __init__.py for their implementation
progress_template: Dictionary of templates for progress outputs.
Allowed keys are 'download', 'postprocess',
@ -1399,6 +1415,7 @@ def evaluate_outtmpl(self, outtmpl, info_dict, *args, **kwargs):
outtmpl, info_dict = self.prepare_outtmpl(outtmpl, info_dict, *args, **kwargs)
return self.escape_outtmpl(outtmpl) % info_dict
@_catch_unsafe_extension_error
def _prepare_filename(self, info_dict, *, outtmpl=None, tmpl_type=None):
assert None in (outtmpl, tmpl_type), 'outtmpl and tmpl_type are mutually exclusive'
if outtmpl is None:
@ -1926,6 +1943,8 @@ def _playlist_infodict(ie_result, strict=False, **kwargs):
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_channel': ie_result.get('channel'),
'playlist_channel_id': ie_result.get('channel_id'),
**kwargs,
}
if strict:
@ -3189,6 +3208,7 @@ def existing_file(self, filepaths, *, default_overwrite=True):
os.remove(file)
return None
@_catch_unsafe_extension_error
def process_info(self, info_dict):
"""Process a single resolved IE result. (Modifies it in-place)"""

View File

@ -64,6 +64,7 @@
write_string,
)
from .utils.networking import std_headers
from .utils._utils import _UnsafeExtensionError
from .YoutubeDL import YoutubeDL
_IN_CLI = False
@ -593,6 +594,13 @@ def report_deprecation(val, old, new=None):
if opts.ap_username is not None and opts.ap_password is None:
opts.ap_password = getpass.getpass('Type TV provider account password and press [Return]: ')
# compat option changes global state destructively; only allow from cli
if 'allow-unsafe-ext' in opts.compat_opts:
warnings.append(
'Using allow-unsafe-ext opens you up to potential attacks. '
'Use with great care!')
_UnsafeExtensionError.sanitize_extension = lambda x: x
return warnings, deprecation_warnings

View File

@ -2,7 +2,9 @@
import collections
import contextlib
import datetime as dt
import functools
import glob
import hashlib
import http.cookiejar
import http.cookies
import io
@ -17,14 +19,12 @@
import time
import urllib.request
from enum import Enum, auto
from hashlib import pbkdf2_hmac
from .aes import (
aes_cbc_decrypt_bytes,
aes_gcm_decrypt_and_verify_bytes,
unpad_pkcs7,
)
from .compat import functools # isort: split
from .compat import compat_os_name
from .dependencies import (
_SECRETSTORAGE_UNAVAILABLE_REASON,
@ -999,7 +999,7 @@ def _get_windows_v10_key(browser_root, logger):
def pbkdf2_sha1(password, salt, iterations, key_length):
return pbkdf2_hmac('sha1', password, salt, iterations, key_length)
return hashlib.pbkdf2_hmac('sha1', password, salt, iterations, key_length)
def _decrypt_aes_cbc_multi(ciphertext, keys, logger, initialization_vector=b' ' * 16):

View File

@ -1,4 +1,5 @@
import enum
import functools
import json
import os
import re
@ -9,7 +10,6 @@
import uuid
from .fragment import FragmentFD
from ..compat import functools
from ..networking import Request
from ..postprocessor.ffmpeg import EXT_TO_OUT_FORMATS, FFmpegPostProcessor
from ..utils import (

View File

@ -782,6 +782,7 @@
from .goshgay import GoshgayIE
from .gotostage import GoToStageIE
from .gputechconf import GPUTechConfIE
from .graspop import GraspopIE
from .gronkh import (
GronkhFeedIE,
GronkhIE,
@ -972,6 +973,10 @@
LA7PodcastEpisodeIE,
LA7PodcastIE,
)
from .laracasts import (
LaracastsIE,
LaracastsPlaylistIE,
)
from .lastfm import (
LastFMIE,
LastFMPlaylistIE,
@ -1116,12 +1121,15 @@
from .melonvod import MelonVODIE
from .metacritic import MetacriticIE
from .mgtv import MGTVIE
from .microsoftembed import MicrosoftEmbedIE
from .microsoftstream import MicrosoftStreamIE
from .microsoftvirtualacademy import (
MicrosoftVirtualAcademyCourseIE,
MicrosoftVirtualAcademyIE,
from .microsoftembed import (
MicrosoftBuildIE,
MicrosoftEmbedIE,
MicrosoftLearnEpisodeIE,
MicrosoftLearnPlaylistIE,
MicrosoftLearnSessionIE,
MicrosoftMediusIE,
)
from .microsoftstream import MicrosoftStreamIE
from .mildom import (
MildomClipIE,
MildomIE,
@ -1606,6 +1614,7 @@
QQMusicPlaylistIE,
QQMusicSingerIE,
QQMusicToplistIE,
QQMusicVideoIE,
)
from .r7 import (
R7IE,

View File

@ -4,6 +4,7 @@
from ..utils import (
extract_attributes,
int_or_none,
join_nonempty,
parse_iso8601,
try_get,
)
@ -136,7 +137,7 @@ def _real_extract(self, url):
else:
vbr = int_or_none(s.get('bitrate'))
formats.append({
'format_id': f'{stream_type}-{vbr}' if vbr else stream_type,
'format_id': join_nonempty(stream_type, vbr),
'vbr': vbr,
'width': int_or_none(s.get('width')),
'height': int_or_none(s.get('height')),

View File

@ -131,8 +131,8 @@ def _real_extract(self, url):
formats.extend(self._extract_f4m_formats(
href, video_id, f4m_id='hds', fatal=False))
elif mime_type == 'application/dash+xml':
formats.extend(self._extract_f4m_formats(
href, video_id, f4m_id='hds', fatal=False))
formats.extend(self._extract_mpd_formats(
href, video_id, mpd_id='dash', fatal=False))
elif mime_type == 'application/vnd.ms-sstr+xml':
formats.extend(self._extract_ism_formats(
href, video_id, ism_id='mss', fatal=False))

View File

@ -41,7 +41,7 @@ class BandcampIE(InfoExtractor):
'uploader_id': 'youtube-dl',
'thumbnail': 'https://f4.bcbits.com/img/a3216802731_5.jpg',
},
'_skip': 'There is a limit of 200 free downloads / month for the test song',
'skip': 'There is a limit of 200 free downloads / month for the test song',
}, {
# free download
'url': 'http://benprunty.bandcamp.com/track/lanius-battle',

View File

@ -31,12 +31,12 @@
mimetype2ext,
parse_count,
parse_qs,
parse_resolution,
qualities,
smuggle_url,
srt_subtitles_timecode,
str_or_none,
traverse_obj,
try_call,
unified_timestamp,
unsmuggle_url,
url_or_none,
@ -47,6 +47,23 @@
class BilibiliBaseIE(InfoExtractor):
_FORMAT_ID_RE = re.compile(r'-(\d+)\.m4s\?')
_WBI_KEY_CACHE_TIMEOUT = 30 # exact expire timeout is unclear, use 30s for one session
_wbi_key_cache = {}
@property
def is_logged_in(self):
return bool(self._get_cookies('https://api.bilibili.com').get('SESSDATA'))
def _check_missing_formats(self, play_info, formats):
parsed_qualities = set(traverse_obj(formats, (..., 'quality')))
missing_formats = join_nonempty(*[
traverse_obj(fmt, 'new_description', 'display_desc', 'quality')
for fmt in traverse_obj(play_info, (
'support_formats', lambda _, v: v['quality'] not in parsed_qualities))], delim=', ')
if missing_formats:
self.to_screen(
f'Format(s) {missing_formats} are missing; you have to login or '
f'become a premium member to download them. {self._login_hint()}')
def extract_formats(self, play_info):
format_names = {
@ -86,18 +103,75 @@ def extract_formats(self, play_info):
'format': format_names.get(video.get('id')),
} for video in traverse_obj(play_info, ('dash', 'video', ...)))
missing_formats = format_names.keys() - set(traverse_obj(formats, (..., 'quality')))
if missing_formats:
self.to_screen(f'Format(s) {", ".join(format_names[i] for i in missing_formats)} are missing; '
f'you have to login or become premium member to download them. {self._login_hint()}')
if formats:
self._check_missing_formats(play_info, formats)
fragments = traverse_obj(play_info, ('durl', lambda _, v: url_or_none(v['url']), {
'url': ('url', {url_or_none}),
'duration': ('length', {functools.partial(float_or_none, scale=1000)}),
'filesize': ('size', {int_or_none}),
}))
if fragments:
formats.append({
'url': fragments[0]['url'],
'filesize': sum(traverse_obj(fragments, (..., 'filesize'))),
**({
'fragments': fragments,
'protocol': 'http_dash_segments',
} if len(fragments) > 1 else {}),
**traverse_obj(play_info, {
'quality': ('quality', {int_or_none}),
'format_id': ('quality', {str_or_none}),
'format_note': ('quality', {lambda x: format_names.get(x)}),
'duration': ('timelength', {functools.partial(float_or_none, scale=1000)}),
}),
**parse_resolution(format_names.get(play_info.get('quality'))),
})
return formats
def _download_playinfo(self, video_id, cid, headers=None):
def _get_wbi_key(self, video_id):
if time.time() < self._wbi_key_cache.get('ts', 0) + self._WBI_KEY_CACHE_TIMEOUT:
return self._wbi_key_cache['key']
session_data = self._download_json(
'https://api.bilibili.com/x/web-interface/nav', video_id, note='Downloading wbi sign')
lookup = ''.join(traverse_obj(session_data, (
'data', 'wbi_img', ('img_url', 'sub_url'),
{lambda x: x.rpartition('/')[2].partition('.')[0]})))
# from getMixinKey() in the vendor js
mixin_key_enc_tab = [
46, 47, 18, 2, 53, 8, 23, 32, 15, 50, 10, 31, 58, 3, 45, 35, 27, 43, 5, 49,
33, 9, 42, 19, 29, 28, 14, 39, 12, 38, 41, 13, 37, 48, 7, 16, 24, 55, 40,
61, 26, 17, 0, 1, 60, 51, 30, 4, 22, 25, 54, 21, 56, 59, 6, 63, 57, 62, 11,
36, 20, 34, 44, 52,
]
self._wbi_key_cache.update({
'key': ''.join(lookup[i] for i in mixin_key_enc_tab)[:32],
'ts': time.time(),
})
return self._wbi_key_cache['key']
def _sign_wbi(self, params, video_id):
params['wts'] = round(time.time())
params = {
k: ''.join(filter(lambda char: char not in "!'()*", str(v)))
for k, v in sorted(params.items())
}
query = urllib.parse.urlencode(params)
params['w_rid'] = hashlib.md5(f'{query}{self._get_wbi_key(video_id)}'.encode()).hexdigest()
return params
def _download_playinfo(self, bvid, cid, headers=None, qn=None):
params = {'bvid': bvid, 'cid': cid, 'fnval': 4048}
if qn:
params['qn'] = qn
return self._download_json(
'https://api.bilibili.com/x/player/playurl', video_id,
query={'bvid': video_id, 'cid': cid, 'fnval': 4048},
note=f'Downloading video formats for cid {cid}', headers=headers)['data']
'https://api.bilibili.com/x/player/wbi/playurl', bvid,
query=self._sign_wbi(params, bvid), headers=headers,
note=f'Downloading video formats for cid {cid} {qn or ""}')['data']
def json2srt(self, json_data):
srt_data = ''
@ -115,15 +189,15 @@ def _get_subtitles(self, video_id, cid, aid=None):
}],
}
subtitle_info = traverse_obj(self._download_json(
video_info = self._download_json(
'https://api.bilibili.com/x/player/v2', video_id,
query={'aid': aid, 'cid': cid} if aid else {'bvid': video_id, 'cid': cid},
note=f'Extracting subtitle info {cid}'), ('data', 'subtitle'))
subs_list = traverse_obj(subtitle_info, ('subtitles', lambda _, v: v['subtitle_url'] and v['lan']))
if not subs_list and traverse_obj(subtitle_info, 'allow_submit'):
if not self._get_cookies('https://api.bilibili.com').get('SESSDATA'): # no login session cookie
self.report_warning(f'CC subtitles (if any) are only visible when logged in. {self._login_hint()}', only_once=True)
for s in subs_list:
note=f'Extracting subtitle info {cid}')
if traverse_obj(video_info, ('data', 'need_login_subtitle')):
self.report_warning(
f'Subtitles are only available when logged in. {self._login_hint()}', only_once=True)
for s in traverse_obj(video_info, (
'data', 'subtitle', 'subtitles', lambda _, v: v['subtitle_url'] and v['lan'])):
subtitles.setdefault(s['lan'], []).append({
'ext': 'srt',
'data': self.json2srt(self._download_json(s['subtitle_url'], video_id)),
@ -203,15 +277,15 @@ def _get_divisions(self, video_id, graph_version, edges, edge_id, cid_edges=None
self._get_divisions(video_id, graph_version, edges, choice['edge_id'], cid_edges=cid_edges)
return cid_edges
def _get_interactive_entries(self, video_id, cid, metainfo):
def _get_interactive_entries(self, video_id, cid, metainfo, headers=None):
graph_version = traverse_obj(
self._download_json(
'https://api.bilibili.com/x/player/wbi/v2', video_id,
'Extracting graph version', query={'bvid': video_id, 'cid': cid}),
'Extracting graph version', query={'bvid': video_id, 'cid': cid}, headers=headers),
('data', 'interaction', 'graph_version', {int_or_none}))
cid_edges = self._get_divisions(video_id, graph_version, {1: {'cid': cid}}, 1)
for cid, edges in cid_edges.items():
play_info = self._download_playinfo(video_id, cid)
play_info = self._download_playinfo(video_id, cid, headers=headers)
yield {
**metainfo,
'id': f'{video_id}_{cid}',
@ -243,17 +317,17 @@ class BiliBiliIE(BilibiliBaseIE):
'timestamp': 1488353834,
'like_count': int,
'view_count': int,
'_old_archive_ids': ['bilibili 8903802_part1'],
},
}, {
'note': 'old av URL version',
'url': 'http://www.bilibili.com/video/av1074402/',
'info_dict': {
'thumbnail': r're:^https?://.*\.(jpg|jpeg)$',
'id': 'BV11x411K7CN',
'ext': 'mp4',
'title': '【金坷垃】金泡沫',
'uploader': '菊子桑',
'uploader_id': '156160',
'id': 'BV11x411K7CN',
'title': '【金坷垃】金泡沫',
'duration': 308.36,
'upload_date': '20140420',
'timestamp': 1397983878,
@ -262,6 +336,8 @@ class BiliBiliIE(BilibiliBaseIE):
'comment_count': int,
'view_count': int,
'tags': list,
'thumbnail': r're:^https?://.*\.(jpg|jpeg)$',
'_old_archive_ids': ['bilibili 1074402_part1'],
},
'params': {'skip_download': True},
}, {
@ -288,6 +364,7 @@ class BiliBiliIE(BilibiliBaseIE):
'view_count': int,
'description': 'md5:e3c401cf7bc363118d1783dd74068a68',
'duration': 90.314,
'_old_archive_ids': ['bilibili 498159642_part1'],
},
}],
}, {
@ -308,28 +385,8 @@ class BiliBiliIE(BilibiliBaseIE):
'view_count': int,
'description': 'md5:e3c401cf7bc363118d1783dd74068a68',
'duration': 90.314,
'_old_archive_ids': ['bilibili 498159642_part1'],
},
}, {
'note': 'video has subtitles',
'url': 'https://www.bilibili.com/video/BV12N4y1M7rh',
'info_dict': {
'id': 'BV12N4y1M7rh',
'ext': 'mp4',
'title': 'md5:96e8bb42c2b432c0d4ce3434a61479c1',
'tags': list,
'description': 'md5:afde2b7ba9025c01d9e3dde10de221e4',
'duration': 313.557,
'upload_date': '20220709',
'uploader': '小夫太渴',
'timestamp': 1657347907,
'uploader_id': '1326814124',
'comment_count': int,
'view_count': int,
'like_count': int,
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$',
'subtitles': 'count:2',
},
'params': {'listsubtitles': True},
}, {
'url': 'https://www.bilibili.com/video/av8903802/',
'info_dict': {
@ -347,6 +404,7 @@ class BiliBiliIE(BilibiliBaseIE):
'comment_count': int,
'view_count': int,
'like_count': int,
'_old_archive_ids': ['bilibili 8903802_part1'],
},
'params': {
'skip_download': True,
@ -370,6 +428,7 @@ class BiliBiliIE(BilibiliBaseIE):
'view_count': int,
'like_count': int,
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$',
'_old_archive_ids': ['bilibili 463665680_part1'],
},
'params': {'skip_download': True},
}, {
@ -388,8 +447,8 @@ class BiliBiliIE(BilibiliBaseIE):
'view_count': int,
'like_count': int,
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$',
'_old_archive_ids': ['bilibili 893839363_part1'],
},
'params': {'skip_download': True},
}, {
'note': 'newer festival video',
'url': 'https://www.bilibili.com/festival/2023honkaiimpact3gala?bvid=BV1ay4y1d77f',
@ -406,8 +465,57 @@ class BiliBiliIE(BilibiliBaseIE):
'view_count': int,
'like_count': int,
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$',
'_old_archive_ids': ['bilibili 778246196_part1'],
},
}, {
'note': 'legacy flv/mp4 video',
'url': 'https://www.bilibili.com/video/BV1ms411Q7vw/?p=4',
'info_dict': {
'id': 'BV1ms411Q7vw_p4',
'title': '[搞笑]【动画】云南方言快乐生产线出品 p04 新烧包谷之漫游桃花岛',
'timestamp': 1458222815,
'upload_date': '20160317',
'description': '云南方言快乐生产线出品',
'duration': float,
'uploader': '一笑颠天',
'uploader_id': '3916081',
'view_count': int,
'comment_count': int,
'like_count': int,
'tags': list,
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$',
'_old_archive_ids': ['bilibili 4120229_part4'],
},
'params': {'extractor_args': {'bilibili': {'prefer_multi_flv': ['32']}}},
'playlist_count': 19,
'playlist': [{
'info_dict': {
'id': 'BV1ms411Q7vw_p4_0',
'ext': 'flv',
'title': '[搞笑]【动画】云南方言快乐生产线出品 p04 新烧包谷之漫游桃花岛',
'duration': 399.102,
},
}],
}, {
'note': 'legacy mp4-only video',
'url': 'https://www.bilibili.com/video/BV1nx411u79K',
'info_dict': {
'id': 'BV1nx411u79K',
'ext': 'mp4',
'title': '【练习室】201603声乐练习《No Air》with VigoVan',
'timestamp': 1508893551,
'upload_date': '20171025',
'description': '@ZERO-G伯远\n声乐练习 《No Air》with Vigo Van',
'duration': 80.384,
'uploader': '伯远',
'uploader_id': '10584494',
'comment_count': int,
'view_count': int,
'like_count': int,
'tags': list,
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$',
'_old_archive_ids': ['bilibili 15700301_part1'],
},
'params': {'skip_download': True},
}, {
'note': 'interactive/split-path video',
'url': 'https://www.bilibili.com/video/BV1af4y1H7ga/',
@ -425,6 +533,7 @@ class BiliBiliIE(BilibiliBaseIE):
'view_count': int,
'like_count': int,
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$',
'_old_archive_ids': ['bilibili 292734508_part1'],
},
'playlist_count': 33,
'playlist': [{
@ -443,6 +552,7 @@ class BiliBiliIE(BilibiliBaseIE):
'view_count': int,
'like_count': int,
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$',
'_old_archive_ids': ['bilibili 292734508_part1'],
},
}],
}, {
@ -465,6 +575,29 @@ class BiliBiliIE(BilibiliBaseIE):
'upload_date': '20191021',
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$',
},
}, {
'note': 'video has subtitles, which requires login',
'url': 'https://www.bilibili.com/video/BV12N4y1M7rh',
'info_dict': {
'id': 'BV12N4y1M7rh',
'ext': 'mp4',
'title': 'md5:96e8bb42c2b432c0d4ce3434a61479c1',
'tags': list,
'description': 'md5:afde2b7ba9025c01d9e3dde10de221e4',
'duration': 313.557,
'upload_date': '20220709',
'uploader': '小夫太渴',
'timestamp': 1657347907,
'uploader_id': '1326814124',
'comment_count': int,
'view_count': int,
'like_count': int,
'thumbnail': r're:^https?://.*\.(jpg|jpeg|png)$',
'subtitles': 'count:2', # login required for CC subtitle
'_old_archive_ids': ['bilibili 898179753_part1'],
},
'params': {'listsubtitles': True},
'skip': 'login required for subtitle',
}, {
'url': 'https://www.bilibili.com/video/BV1jL41167ZG/',
'info_dict': {
@ -498,8 +631,9 @@ def _real_extract(self, url):
if not self._match_valid_url(urlh.url):
return self.url_result(urlh.url)
initial_state = self._search_json(r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', video_id)
headers['Referer'] = url
initial_state = self._search_json(r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', video_id)
is_festival = 'videoData' not in initial_state
if is_festival:
video_data = initial_state['videoInfo']
@ -548,7 +682,6 @@ def _real_extract(self, url):
aid = video_data.get('aid')
old_video_id = format_field(aid, None, f'%s_part{part_id or 1}')
cid = traverse_obj(video_data, ('pages', part_id - 1, 'cid')) if part_id else video_data.get('cid')
festival_info = {}
@ -586,18 +719,65 @@ def _real_extract(self, url):
is_interactive = traverse_obj(video_data, ('rights', 'is_stein_gate'))
if is_interactive:
return self.playlist_result(
self._get_interactive_entries(video_id, cid, metainfo), **metainfo,
self._get_interactive_entries(video_id, cid, metainfo, headers=headers), **metainfo,
duration=traverse_obj(initial_state, ('videoData', 'duration', {int_or_none})),
__post_extractor=self.extract_comments(aid))
else:
return {
**metainfo,
'duration': float_or_none(play_info.get('timelength'), scale=1000),
'chapters': self._get_chapters(aid, cid),
'subtitles': self.extract_subtitles(video_id, cid),
'formats': self.extract_formats(play_info),
'__post_extractor': self.extract_comments(aid),
}
formats = self.extract_formats(play_info)
if not traverse_obj(play_info, ('dash')):
# we only have legacy formats and need additional work
has_qn = lambda x: x in traverse_obj(formats, (..., 'quality'))
for qn in traverse_obj(play_info, ('accept_quality', lambda _, v: not has_qn(v), {int})):
formats.extend(traverse_obj(
self.extract_formats(self._download_playinfo(video_id, cid, headers=headers, qn=qn)),
lambda _, v: not has_qn(v['quality'])))
self._check_missing_formats(play_info, formats)
flv_formats = traverse_obj(formats, lambda _, v: v['fragments'])
if flv_formats and len(flv_formats) < len(formats):
# Flv and mp4 are incompatible due to `multi_video` workaround, so drop one
if not self._configuration_arg('prefer_multi_flv'):
dropped_fmts = ', '.join(
f'{f.get("format_note")} ({f.get("format_id")})' for f in flv_formats)
formats = traverse_obj(formats, lambda _, v: not v.get('fragments'))
if dropped_fmts:
self.to_screen(
f'Dropping incompatible flv format(s) {dropped_fmts} since mp4 is available. '
'To extract flv, pass --extractor-args "bilibili:prefer_multi_flv"')
else:
formats = traverse_obj(
# XXX: Filtering by extractor-arg is for testing purposes
formats, lambda _, v: v['quality'] == int(self._configuration_arg('prefer_multi_flv')[0]),
) or [max(flv_formats, key=lambda x: x['quality'])]
if traverse_obj(formats, (0, 'fragments')):
# We have flv formats, which are individual short videos with their own timestamps and metainfo
# Binary concatenation corrupts their timestamps, so we need a `multi_video` workaround
return {
**metainfo,
'_type': 'multi_video',
'entries': [{
'id': f'{metainfo["id"]}_{idx}',
'title': metainfo['title'],
'http_headers': metainfo['http_headers'],
'formats': [{
**fragment,
'format_id': formats[0].get('format_id'),
}],
'subtitles': self.extract_subtitles(video_id, cid) if idx == 0 else None,
'__post_extractor': self.extract_comments(aid) if idx == 0 else None,
} for idx, fragment in enumerate(formats[0]['fragments'])],
'duration': float_or_none(play_info.get('timelength'), scale=1000),
}
else:
return {
**metainfo,
'formats': formats,
'duration': float_or_none(play_info.get('timelength'), scale=1000),
'chapters': self._get_chapters(aid, cid),
'subtitles': self.extract_subtitles(video_id, cid),
'__post_extractor': self.extract_comments(aid),
}
class BiliBiliBangumiIE(BilibiliBaseIE):
@ -968,7 +1148,7 @@ def _real_extract(self, url):
}))
class BilibiliSpaceBaseIE(InfoExtractor):
class BilibiliSpaceBaseIE(BilibiliBaseIE):
def _extract_playlist(self, fetch_page, get_metadata, get_entries):
first_page = fetch_page(0)
metadata = get_metadata(first_page)
@ -988,73 +1168,53 @@ class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
'id': '3985676',
},
'playlist_mincount': 178,
'skip': 'login required',
}, {
'url': 'https://space.bilibili.com/313580179/video',
'info_dict': {
'id': '313580179',
},
'playlist_mincount': 92,
'skip': 'login required',
}]
def _extract_signature(self, playlist_id):
session_data = self._download_json('https://api.bilibili.com/x/web-interface/nav', playlist_id, fatal=False)
key_from_url = lambda x: x[x.rfind('/') + 1:].split('.')[0]
img_key = traverse_obj(
session_data, ('data', 'wbi_img', 'img_url', {key_from_url})) or '34478ba821254d9d93542680e3b86100'
sub_key = traverse_obj(
session_data, ('data', 'wbi_img', 'sub_url', {key_from_url})) or '7e16a90d190a4355a78fd00b32a38de6'
session_key = img_key + sub_key
signature_values = []
for position in (
46, 47, 18, 2, 53, 8, 23, 32, 15, 50, 10, 31, 58, 3, 45, 35, 27, 43, 5, 49, 33, 9, 42, 19, 29, 28, 14, 39,
12, 38, 41, 13, 37, 48, 7, 16, 24, 55, 40, 61, 26, 17, 0, 1, 60, 51, 30, 4, 22, 25, 54, 21, 56, 59, 6, 63,
57, 62, 11, 36, 20, 34, 44, 52,
):
char_at_position = try_call(lambda: session_key[position])
if char_at_position:
signature_values.append(char_at_position)
return ''.join(signature_values)[:32]
def _real_extract(self, url):
playlist_id, is_video_url = self._match_valid_url(url).group('id', 'video')
if not is_video_url:
self.to_screen('A channel URL was given. Only the channel\'s videos will be downloaded. '
'To download audios, add a "/audio" to the URL')
signature = self._extract_signature(playlist_id)
def fetch_page(page_idx):
query = {
'keyword': '',
'mid': playlist_id,
'order': 'pubdate',
'order': traverse_obj(parse_qs(url), ('order', 0)) or 'pubdate',
'order_avoided': 'true',
'platform': 'web',
'pn': page_idx + 1,
'ps': 30,
'tid': 0,
'web_location': 1550101,
'wts': int(time.time()),
}
query['w_rid'] = hashlib.md5(f'{urllib.parse.urlencode(query)}{signature}'.encode()).hexdigest()
try:
response = self._download_json('https://api.bilibili.com/x/space/wbi/arc/search',
playlist_id, note=f'Downloading page {page_idx}', query=query,
headers={'referer': url})
response = self._download_json(
'https://api.bilibili.com/x/space/wbi/arc/search', playlist_id,
query=self._sign_wbi(query, playlist_id),
note=f'Downloading space page {page_idx}', headers={'Referer': url})
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 412:
raise ExtractorError(
'Request is blocked by server (412), please add cookies, wait and try later.', expected=True)
raise
if response['code'] in (-352, -401):
status_code = response['code']
if status_code == -401:
raise ExtractorError(
f'Request is blocked by server ({-response["code"]}), '
'please add cookies, wait and try later.', expected=True)
'Request is blocked by server (401), please add cookies, wait and try later.', expected=True)
elif status_code == -352 and not self.is_logged_in:
self.raise_login_required('Request is rejected, you need to login to access playlist')
elif status_code != 0:
raise ExtractorError(f'Request failed ({status_code}): {response.get("message") or "Unknown error"}')
return response['data']
def get_metadata(page_data):
@ -1280,7 +1440,10 @@ class BilibiliWatchlaterIE(BilibiliSpaceListBaseIE):
_VALID_URL = r'https?://(?:www\.)?bilibili\.com/watchlater/?(?:[?#]|$)'
_TESTS = [{
'url': 'https://www.bilibili.com/watchlater/#/list',
'info_dict': {'id': 'watchlater'},
'info_dict': {
'id': r're:\d+',
'title': '稍后再看',
},
'playlist_mincount': 0,
'skip': 'login required',
}]
@ -1356,14 +1519,19 @@ class BilibiliPlaylistIE(BilibiliSpaceListBaseIE):
'skip': 'redirect url',
}, {
'url': 'https://www.bilibili.com/list/watchlater',
'info_dict': {'id': 'watchlater'},
'info_dict': {
'id': r're:2_\d+',
'title': '稍后再看',
'uploader': str,
'uploader_id': str,
},
'playlist_mincount': 0,
'skip': 'login required',
}, {
'url': 'https://www.bilibili.com/medialist/play/watchlater',
'info_dict': {'id': 'watchlater'},
'playlist_mincount': 0,
'skip': 'login required',
'skip': 'redirect url & login required',
}]
def _extract_medialist(self, query, list_id):
@ -1414,7 +1582,7 @@ def _real_extract(self, url):
'title': ('title', {str}),
'uploader': ('upper', 'name', {str}),
'uploader_id': ('upper', 'mid', {str_or_none}),
'timestamp': ('ctime', {int_or_none}),
'timestamp': ('ctime', {int_or_none}, {lambda x: x or None}),
'thumbnail': ('cover', {url_or_none}),
})),
}
@ -1890,7 +2058,8 @@ def _perform_login(self, username, password):
public_key = Cryptodome.RSA.importKey(key_data['key'])
password_hash = Cryptodome.PKCS1_v1_5.new(public_key).encrypt((key_data['hash'] + password).encode())
login_post = self._download_json(
'https://passport.bilibili.tv/x/intl/passport-login/web/login/password?lang=en-US', None, data=urlencode_postdata({
'https://passport.bilibili.tv/x/intl/passport-login/web/login/password?lang=en-US', None,
data=urlencode_postdata({
'username': username,
'password': base64.b64encode(password_hash).decode('ascii'),
'keep_me': 'true',
@ -2222,7 +2391,8 @@ def _entries(self, series_id):
def _real_extract(self, url):
series_id = self._match_id(url)
series_info = self._call_api(f'/web/v2/ogv/play/season_info?season_id={series_id}&platform=web', series_id).get('season') or {}
series_info = self._call_api(
f'/web/v2/ogv/play/season_info?season_id={series_id}&platform=web', series_id).get('season') or {}
return self.playlist_result(
self._entries(series_id), series_id, series_info.get('title'), series_info.get('description'),
categories=traverse_obj(series_info, ('styles', ..., 'title'), expected_type=str_or_none),

View File

@ -18,6 +18,7 @@
fix_xml_ampersands,
float_or_none,
int_or_none,
join_nonempty,
js_to_json,
mimetype2ext,
parse_iso8601,
@ -538,12 +539,7 @@ def _parse_brightcove_metadata(self, json_data, video_id, headers={}):
})
def build_format_id(kind):
format_id = kind
if tbr:
format_id += f'-{int(tbr)}k'
if height:
format_id += f'-{height}p'
return format_id
return join_nonempty(kind, tbr and f'{int(tbr)}k', height and f'{height}p')
if src or streaming_src:
f.update({

View File

@ -1,6 +1,5 @@
import base64
import re
import urllib.error
import urllib.parse
import zlib

View File

@ -2,6 +2,7 @@
from ..utils import (
determine_protocol,
int_or_none,
join_nonempty,
try_get,
unescapeHTML,
)
@ -52,7 +53,7 @@ def _real_extract(self, url):
is_hls = container == 'M2TS'
protocol = 'm3u8_native' if is_hls else determine_protocol({'url': rendition_url})
formats.append({
'format_id': ('hls' if is_hls else protocol) + (f'-{tbr}' if tbr else ''),
'format_id': join_nonempty('hls' if is_hls else protocol, tbr),
'url': rendition_url,
'width': int_or_none(rendition.get('frameWidth')),
'height': int_or_none(rendition.get('frameHeight')),

View File

@ -1,6 +1,11 @@
from .common import InfoExtractor
from ..networking import Request
from ..utils import float_or_none, int_or_none, parse_iso8601
from ..utils import (
float_or_none,
int_or_none,
join_nonempty,
parse_iso8601,
)
class EitbIE(InfoExtractor):
@ -37,12 +42,9 @@ def _real_extract(self, url):
if not video_url:
continue
tbr = float_or_none(rendition.get('ENCODING_RATE'), 1000)
format_id = 'http'
if tbr:
format_id += f'-{int(tbr)}'
formats.append({
'url': rendition['PMD_URL'],
'format_id': format_id,
'format_id': join_nonempty('http', int_or_none(tbr)),
'width': int_or_none(rendition.get('FRAME_WIDTH')),
'height': int_or_none(rendition.get('FRAME_HEIGHT')),
'tbr': tbr,

View File

@ -29,9 +29,6 @@ class EpornerIE(InfoExtractor):
'view_count': int,
'age_limit': 18,
},
'params': {
'proxy': '127.0.0.1:8118',
},
}, {
# New (May 2016) URL layout
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0/Star-Wars-XXX-Parody/',

View File

@ -5,6 +5,7 @@
ExtractorError,
determine_ext,
int_or_none,
join_nonempty,
parse_age_limit,
remove_end,
remove_start,
@ -287,7 +288,7 @@ def _real_extract(self, url):
if mobj:
height = int(mobj.group(2))
f.update({
'format_id': (f'{format_id}-' if format_id else '') + f'{height}P',
'format_id': join_nonempty(format_id, f'{height}P'),
'width': int(mobj.group(1)),
'height': height,
})

View File

@ -0,0 +1,32 @@
from .common import InfoExtractor
from ..utils import update_url, url_or_none
from ..utils.traversal import traverse_obj
class GraspopIE(InfoExtractor):
_VALID_URL = r'https?://vod\.graspop\.be/[a-z]{2}/(?P<id>\d+)/'
_TESTS = [{
'url': 'https://vod.graspop.be/fr/101556/thy-art-is-murder-concert/',
'info_dict': {
'id': '101556',
'ext': 'mp4',
'title': 'Thy Art Is Murder',
'thumbnail': r're:https://cdn-mds\.pickx\.be/festivals/v3/global/original/.+\.jpg',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
metadata = self._download_json(
f'https://tv.proximus.be/MWC/videocenter/festivals/{video_id}/stream', video_id)
return {
'id': video_id,
'formats': self._extract_m3u8_formats(
# Downgrade manifest request to avoid incomplete certificate chain error
update_url(metadata['source']['assetUri'], scheme='http'), video_id, 'mp4'),
**traverse_obj(metadata, {
'title': ('name', {str}),
'thumbnail': ('source', 'poster', {url_or_none}),
}),
}

View File

@ -3,6 +3,7 @@
from .common import InfoExtractor
from ..utils import (
int_or_none,
join_nonempty,
parse_duration,
urljoin,
xpath_element,
@ -69,7 +70,7 @@ def _extract_info(self, url, display_id):
height = format_info.get('height')
fmt = {
'url': path,
'format_id': 'http{}'.format(f'-{height}p' if height else ''),
'format_id': join_nonempty('http'. height and f'{height}p'),
'width': format_info.get('width'),
'height': height,
}

View File

@ -44,9 +44,6 @@ class HKETVIE(InfoExtractor):
'duration': 907,
'subtitles': {},
},
'params': {
'geo_verification_proxy': '<HK proxy here>',
},
'skip': 'Geo restricted to HK',
}]

View File

@ -453,7 +453,7 @@ def _real_extract(self, url):
else:
self.report_warning('Main webpage is locked behind the login page. Retrying with embed webpage (some metadata might be missing).')
webpage = self._download_webpage(
f'{url}/embed/', video_id, note='Downloading embed webpage', fatal=False)
f'{url}/embed/', video_id, note='Downloading embed webpage', fatal=False) or ''
additional_data = self._search_json(
r'window\.__additionalDataLoaded\s*\(\s*[^,]+,', webpage, 'additional data', video_id, fatal=False)
if not additional_data and not media:

View File

@ -2,7 +2,6 @@
import hashlib
import json
import time
import urllib.error
import urllib.parse
from .common import InfoExtractor

View File

@ -0,0 +1,114 @@
import json
from .common import InfoExtractor
from .vimeo import VimeoIE
from ..utils import (
clean_html,
extract_attributes,
get_element_html_by_id,
int_or_none,
parse_duration,
str_or_none,
unified_strdate,
url_or_none,
urljoin,
)
from ..utils.traversal import traverse_obj
class LaracastsBaseIE(InfoExtractor):
def _get_prop_data(self, url, display_id):
webpage = self._download_webpage(url, display_id)
return traverse_obj(
get_element_html_by_id('app', webpage),
({extract_attributes}, 'data-page', {json.loads}, 'props'))
def _parse_episode(self, episode):
if not traverse_obj(episode, 'vimeoId'):
self.raise_login_required('This video is only available for subscribers.')
return self.url_result(
VimeoIE._smuggle_referrer(
f'https://player.vimeo.com/video/{episode["vimeoId"]}', 'https://laracasts.com/'),
VimeoIE, url_transparent=True,
**traverse_obj(episode, {
'id': ('id', {int}, {str_or_none}),
'webpage_url': ('path', {lambda x: urljoin('https://laracasts.com', x)}),
'title': ('title', {clean_html}),
'season_number': ('chapter', {int_or_none}),
'episode_number': ('position', {int_or_none}),
'description': ('body', {clean_html}),
'thumbnail': ('largeThumbnail', {url_or_none}),
'duration': ('length', {int_or_none}),
'date': ('dateSegments', 'published', {unified_strdate}),
}))
class LaracastsIE(LaracastsBaseIE):
IE_NAME = 'laracasts'
_VALID_URL = r'https?://(?:www\.)?laracasts\.com/series/(?P<id>[\w-]+/episodes/\d+)/?(?:[?#]|$)'
_TESTS = [{
'url': 'https://laracasts.com/series/30-days-to-learn-laravel-11/episodes/1',
'md5': 'c8f5e7b02ad0e438ef9280a08c8493dc',
'info_dict': {
'id': '922040563',
'title': 'Hello, Laravel',
'ext': 'mp4',
'duration': 519,
'date': '20240312',
'thumbnail': 'https://laracasts.s3.amazonaws.com/videos/thumbnails/youtube/30-days-to-learn-laravel-11-1.png',
'description': 'md5:ddd658bb241975871d236555657e1dd1',
'season_number': 1,
'season': 'Season 1',
'episode_number': 1,
'episode': 'Episode 1',
'uploader': 'Laracasts',
'uploader_id': 'user20182673',
'uploader_url': 'https://vimeo.com/user20182673',
},
'expected_warnings': ['Failed to parse XML'], # TODO: Remove when vimeo extractor is fixed
}]
def _real_extract(self, url):
display_id = self._match_id(url)
return self._parse_episode(self._get_prop_data(url, display_id)['lesson'])
class LaracastsPlaylistIE(LaracastsBaseIE):
IE_NAME = 'laracasts:series'
_VALID_URL = r'https?://(?:www\.)?laracasts\.com/series/(?P<id>[\w-]+)/?(?:[?#]|$)'
_TESTS = [{
'url': 'https://laracasts.com/series/30-days-to-learn-laravel-11',
'info_dict': {
'title': '30 Days to Learn Laravel',
'id': '210',
'thumbnail': 'https://laracasts.s3.amazonaws.com/series/thumbnails/social-cards/30-days-to-learn-laravel-11.png?v=2',
'duration': 30600.0,
'modified_date': '20240511',
'description': 'md5:27c260a1668a450984e8f901579912dd',
'categories': ['Frameworks'],
'tags': ['Laravel'],
'display_id': '30-days-to-learn-laravel-11',
},
'playlist_count': 30,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
series = self._get_prop_data(url, display_id)['series']
metadata = {
'display_id': display_id,
**traverse_obj(series, {
'title': ('title', {str}),
'id': ('id', {int}, {str_or_none}),
'description': ('body', {clean_html}),
'thumbnail': (('large_thumbnail', 'thumbnail'), {url_or_none}, any),
'duration': ('runTime', {parse_duration}),
'categories': ('taxonomy', 'name', {str}, {lambda x: x and [x]}),
'tags': ('topics', ..., 'name', {str}),
'modified_date': ('lastUpdated', {unified_strdate}),
}),
}
return self.playlist_result(traverse_obj(
series, ('chapters', ..., 'episodes', lambda _, v: v['vimeoId'], {self._parse_episode})), **metadata)

View File

@ -1,5 +1,14 @@
import re
from .common import InfoExtractor
from ..utils import int_or_none, traverse_obj, unified_timestamp
from ..utils import (
int_or_none,
parse_iso8601,
traverse_obj,
unified_timestamp,
url_basename,
url_or_none,
)
class MicrosoftEmbedIE(InfoExtractor):
@ -63,3 +72,250 @@ def _real_extract(self, url):
'subtitles': subtitles,
'thumbnails': thumbnails,
}
class MicrosoftMediusBaseIE(InfoExtractor):
@staticmethod
def _sub_to_dict(subtitle_list):
subtitles = {}
for sub in subtitle_list:
subtitles.setdefault(sub.pop('tag', 'und'), []).append(sub)
return subtitles
def _extract_ism(self, ism_url, video_id):
formats = self._extract_ism_formats(ism_url, video_id)
for fmt in formats:
if fmt['language'] != 'eng' and 'English' not in fmt['format_id']:
fmt['language_preference'] = -10
return formats
class MicrosoftMediusIE(MicrosoftMediusBaseIE):
_VALID_URL = r'https?://medius\.microsoft\.com/Embed/(?:Video\?id=|video-nc/|VideoDetails/)(?P<id>[\da-f-]+)'
_TESTS = [{
'url': 'https://medius.microsoft.com/Embed/video-nc/9640d86c-f513-4889-959e-5dace86e7d2b',
'info_dict': {
'id': '9640d86c-f513-4889-959e-5dace86e7d2b',
'ext': 'ismv',
'title': 'Rapidly code, test and ship from secure cloud developer environments',
'description': 'md5:33c8e4facadc438613476eea24165f71',
'thumbnail': r're:https://mediusimg\.event\.microsoft\.com/video-\d+/thumbnail\.jpg.*',
'subtitles': 'count:30',
},
}, {
'url': 'https://medius.microsoft.com/Embed/video-nc/81215af5-c813-4dcd-aede-94f4e1a7daa3',
'info_dict': {
'id': '81215af5-c813-4dcd-aede-94f4e1a7daa3',
'ext': 'ismv',
'title': 'Microsoft Build opening',
'description': 'md5:43455096141077a1f23144cab8cec1cb',
'thumbnail': r're:https://mediusimg\.event\.microsoft\.com/video-\d+/thumbnail\.jpg.*',
'subtitles': 'count:31',
},
}, {
'url': 'https://medius.microsoft.com/Embed/VideoDetails/78493569-9b3b-4a85-a409-ee76e789e25c',
'info_dict': {
'id': '78493569-9b3b-4a85-a409-ee76e789e25c',
'ext': 'ismv',
'title': ' Anomaly Detection & Root cause at Edge',
'description': 'md5:f8f1ad93d7918649bfb97fa081b03b83',
'thumbnail': r're:https://mediusdownload.event.microsoft.com/asset.*\.jpg.*',
'subtitles': 'count:17',
},
}, {
'url': 'https://medius.microsoft.com/Embed/Video?id=0dc69bda-079b-4070-a7db-a8da1a06a9c7',
'only_matching': True,
}, {
'url': 'https://medius.microsoft.com/Embed/video-nc/fe823a91-959c-465b-96d4-8f4db624f72c',
'only_matching': True,
}]
def _extract_subtitle(self, webpage, video_id):
captions = traverse_obj(
self._search_json(r'const\s+captionsConfiguration\s*=', webpage, 'captions', video_id, default=None),
('languageList', lambda _, v: url_or_none(v['src']), {
'url': 'src',
'tag': ('srclang', {str}),
'name': ('kind', {str}),
})) or [{'url': url, 'tag': url_basename(url).split('.vtt')[0].split('_')[-1]}
for url in re.findall(r'var\s+file\s+=\s+\{[^}]+\'(https://[^\']+\.vtt\?[^\']+)', webpage)]
return self._sub_to_dict(captions)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(f'https://medius.microsoft.com/Embed/video-nc/{video_id}', video_id)
return {
'id': video_id,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'formats': self._extract_ism(
self._search_regex(r'StreamUrl\s*=\s*"([^"]+manifest)"', webpage, 'ism url'), video_id),
'thumbnail': self._og_search_thumbnail(webpage),
'subtitles': self._extract_subtitle(webpage, video_id),
}
class MicrosoftLearnPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://learn\.microsoft\.com/(?:[\w-]+/)?(?P<type>shows|events)/(?P<id>[\w-]+)/?(?:[?#]|$)'
_TESTS = [{
'url': 'https://learn.microsoft.com/en-us/shows/bash-for-beginners',
'info_dict': {
'id': 'bash-for-beginners',
'title': 'Bash for Beginners',
'description': 'md5:16a91c07222117d1e00912f0dbc02c2c',
},
'playlist_count': 20,
}, {
'url': 'https://learn.microsoft.com/en-us/events/build-2022',
'info_dict': {
'id': 'build-2022',
'title': 'Microsoft Build 2022 - Events',
'description': 'md5:c16b43848027df837b22c6fbac7648d3',
},
'playlist_count': 201,
}]
def _entries(self, url_base, video_id):
skip = 0
while True:
playlist_info = self._download_json(url_base, video_id, f'Downloading entries {skip}', query={
'locale': 'en-us',
'$skip': skip,
})
url_paths = traverse_obj(playlist_info, ('results', ..., 'url', {str}))
for url_path in url_paths:
yield self.url_result(f'https://learn.microsoft.com/en-us{url_path}')
skip += len(url_paths)
if skip >= playlist_info.get('count', 0) or not url_paths:
break
def _real_extract(self, url):
playlist_id, playlist_type = self._match_valid_url(url).group('id', 'type')
webpage = self._download_webpage(url, playlist_id)
metainfo = {
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
}
sub_type = 'episodes' if playlist_type == 'shows' else 'sessions'
url_base = f'https://learn.microsoft.com/api/contentbrowser/search/{playlist_type}/{playlist_id}/{sub_type}'
return self.playlist_result(self._entries(url_base, playlist_id), playlist_id, **metainfo)
class MicrosoftLearnEpisodeIE(MicrosoftMediusBaseIE):
_VALID_URL = r'https?://learn\.microsoft\.com/(?:[\w-]+/)?shows/[\w-]+/(?P<id>[^?#/]+)'
_TESTS = [{
'url': 'https://learn.microsoft.com/en-us/shows/bash-for-beginners/what-is-the-difference-between-a-terminal-and-a-shell-2-of-20-bash-for-beginners/',
'info_dict': {
'id': 'd44e1a03-a0e5-45c2-9496-5c9fa08dc94c',
'ext': 'ismv',
'title': 'What is the Difference Between a Terminal and a Shell? (Part 2 of 20)',
'description': 'md5:7bbbfb593d21c2cf2babc3715ade6b88',
'timestamp': 1676339547,
'upload_date': '20230214',
'thumbnail': r're:https://learn\.microsoft\.com/video/media/.*\.png',
'subtitles': 'count:14',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
entry_id = self._html_search_meta('entryId', webpage, 'entryId', fatal=True)
video_info = self._download_json(
f'https://learn.microsoft.com/api/video/public/v1/entries/{entry_id}', video_id)
return {
'id': entry_id,
'formats': self._extract_ism(video_info['publicVideo']['adaptiveVideoUrl'], video_id),
'subtitles': self._sub_to_dict(traverse_obj(video_info, (
'publicVideo', 'captions', lambda _, v: url_or_none(v['url']), {
'tag': ('language', {str}),
'url': 'url',
}))),
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
**traverse_obj(video_info, {
'timestamp': ('createTime', {parse_iso8601}),
'thumbnails': ('publicVideo', 'thumbnailOtherSizes', ..., {'url': {url_or_none}}),
}),
}
class MicrosoftLearnSessionIE(InfoExtractor):
_VALID_URL = r'https?://learn\.microsoft\.com/(?:[\w-]+/)?events/[\w-]+/(?P<id>[^?#/]+)'
_TESTS = [{
'url': 'https://learn.microsoft.com/en-us/events/build-2022/ts01-rapidly-code-test-ship-from-secure-cloud-developer-environments',
'info_dict': {
'id': '9640d86c-f513-4889-959e-5dace86e7d2b',
'ext': 'ismv',
'title': 'Rapidly code, test and ship from secure cloud developer environments - Events',
'description': 'md5:f26c1a85d41c1cffd27a0279254a25c3',
'timestamp': 1653408600,
'upload_date': '20220524',
'thumbnail': r're:https://mediusimg\.event\.microsoft\.com/video-\d+/thumbnail\.jpg.*',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
metainfo = {
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'timestamp': parse_iso8601(self._html_search_meta('startDate', webpage, 'startDate')),
}
return self.url_result(
self._html_search_meta('externalVideoUrl', webpage, 'videoUrl', fatal=True),
url_transparent=True, ie=MicrosoftMediusIE, **metainfo)
class MicrosoftBuildIE(InfoExtractor):
_VALID_URL = [
r'https?://build\.microsoft\.com/[\w-]+/sessions/(?P<id>[\da-f-]+)',
r'https?://build\.microsoft\.com/[\w-]+/(?P<id>sessions)/?(?:[?#]|$)',
]
_TESTS = [{
'url': 'https://build.microsoft.com/en-US/sessions/b49feb31-afcd-4217-a538-d3ca1d171198?source=sessions',
'info_dict': {
'id': 'aee55fb5-fcf9-4b38-b764-a3527cb57554',
'ext': 'ismv',
'title': 'Microsoft Build opening keynote',
'description': 'md5:d38338f336ef4b6ef9ad2a7466a76655',
'timestamp': 1716307200,
'upload_date': '20240521',
'thumbnail': r're:https://mediusimg\.event\.microsoft\.com/video-\d+/thumbnail\.jpg.*',
},
}, {
'url': 'https://build.microsoft.com/en-US/sessions',
'info_dict': {
'id': 'sessions',
},
'playlist_mincount': 418,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
entries = [
self.url_result(
video_info['onDemand'], ie=MicrosoftMediusIE, url_transparent=True, **traverse_obj(video_info, {
'id': ('sessionId', {str}),
'title': ('title', {str}),
'description': ('description', {str}),
'timestamp': ('startDateTime', {parse_iso8601}),
}))
for video_info in self._download_json(
'https://api-v2.build.microsoft.com/api/session/all/en-US', video_id, 'Downloading video info')
]
if video_id == 'sessions':
return self.playlist_result(entries, video_id)
else:
return traverse_obj(entries, (lambda _, v: v['id'] == video_id), get_all=False)

View File

@ -1,188 +0,0 @@
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
smuggle_url,
unsmuggle_url,
xpath_text,
)
class MicrosoftVirtualAcademyBaseIE(InfoExtractor):
def _extract_base_url(self, course_id, display_id):
return self._download_json(
f'https://api-mlxprod.microsoft.com/services/products/anonymous/{course_id}',
display_id, 'Downloading course base URL')
def _extract_chapter_and_title(self, title):
if not title:
return None, None
m = re.search(r'(?P<chapter>\d+)\s*\|\s*(?P<title>.+)', title)
return (int(m.group('chapter')), m.group('title')) if m else (None, title)
class MicrosoftVirtualAcademyIE(MicrosoftVirtualAcademyBaseIE):
IE_NAME = 'mva'
IE_DESC = 'Microsoft Virtual Academy videos'
_VALID_URL = rf'(?:{IE_NAME}:|https?://(?:mva\.microsoft|(?:www\.)?microsoftvirtualacademy)\.com/[^/]+/training-courses/[^/?#&]+-)(?P<course_id>\d+)(?::|\?l=)(?P<id>[\da-zA-Z]+_\d+)'
_TESTS = [{
'url': 'https://mva.microsoft.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788?l=gfVXISmEB_6804984382',
'md5': '7826c44fc31678b12ad8db11f6b5abb9',
'info_dict': {
'id': 'gfVXISmEB_6804984382',
'ext': 'mp4',
'title': 'Course Introduction',
'formats': 'mincount:3',
'subtitles': {
'en': [{
'ext': 'ttml',
}],
},
},
}, {
'url': 'mva:11788:gfVXISmEB_6804984382',
'only_matching': True,
}]
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
mobj = self._match_valid_url(url)
course_id = mobj.group('course_id')
video_id = mobj.group('id')
base_url = smuggled_data.get('base_url') or self._extract_base_url(course_id, video_id)
settings = self._download_xml(
f'{base_url}/content/content_{video_id}/videosettings.xml?v=1',
video_id, 'Downloading video settings XML')
_, title = self._extract_chapter_and_title(xpath_text(
settings, './/Title', 'title', fatal=True))
formats = []
for sources in settings.findall('.//MediaSources'):
sources_type = sources.get('videoType')
for source in sources.findall('./MediaSource'):
video_url = source.text
if not video_url or not video_url.startswith('http'):
continue
if sources_type == 'smoothstreaming':
formats.extend(self._extract_ism_formats(
video_url, video_id, 'mss', fatal=False))
continue
video_mode = source.get('videoMode')
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', video_mode or '', 'height', default=None))
codec = source.get('codec')
acodec, vcodec = [None] * 2
if codec:
codecs = codec.split(',')
if len(codecs) == 2:
acodec, vcodec = codecs
elif len(codecs) == 1:
vcodec = codecs[0]
formats.append({
'url': video_url,
'format_id': video_mode,
'height': height,
'acodec': acodec,
'vcodec': vcodec,
})
subtitles = {}
for source in settings.findall('.//MarkerResourceSource'):
subtitle_url = source.text
if not subtitle_url:
continue
subtitles.setdefault('en', []).append({
'url': f'{base_url}/{subtitle_url}',
'ext': source.get('type'),
})
return {
'id': video_id,
'title': title,
'subtitles': subtitles,
'formats': formats,
}
class MicrosoftVirtualAcademyCourseIE(MicrosoftVirtualAcademyBaseIE):
IE_NAME = 'mva:course'
IE_DESC = 'Microsoft Virtual Academy courses'
_VALID_URL = rf'(?:{IE_NAME}:|https?://(?:mva\.microsoft|(?:www\.)?microsoftvirtualacademy)\.com/[^/]+/training-courses/(?P<display_id>[^/?#&]+)-)(?P<id>\d+)'
_TESTS = [{
'url': 'https://mva.microsoft.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788',
'info_dict': {
'id': '11788',
'title': 'Microsoft Azure Fundamentals: Virtual Machines',
},
'playlist_count': 36,
}, {
# with emphasized chapters
'url': 'https://mva.microsoft.com/en-US/training-courses/developing-windows-10-games-with-construct-2-16335',
'info_dict': {
'id': '16335',
'title': 'Developing Windows 10 Games with Construct 2',
},
'playlist_count': 10,
}, {
'url': 'https://www.microsoftvirtualacademy.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788',
'only_matching': True,
}, {
'url': 'mva:course:11788',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if MicrosoftVirtualAcademyIE.suitable(url) else super().suitable(url)
def _real_extract(self, url):
mobj = self._match_valid_url(url)
course_id = mobj.group('id')
display_id = mobj.group('display_id')
base_url = self._extract_base_url(course_id, display_id)
manifest = self._download_json(
f'{base_url}/imsmanifestlite.json',
display_id, 'Downloading course manifest JSON')['manifest']
organization = manifest['organizations']['organization'][0]
entries = []
for chapter in organization['item']:
chapter_number, chapter_title = self._extract_chapter_and_title(chapter.get('title'))
chapter_id = chapter.get('@identifier')
for item in chapter.get('item', []):
item_id = item.get('@identifier')
if not item_id:
continue
metadata = item.get('resource', {}).get('metadata') or {}
if metadata.get('learningresourcetype') != 'Video':
continue
_, title = self._extract_chapter_and_title(item.get('title'))
duration = parse_duration(metadata.get('duration'))
description = metadata.get('description')
entries.append({
'_type': 'url_transparent',
'url': smuggle_url(
f'mva:{course_id}:{item_id}', {'base_url': base_url}),
'title': title,
'description': description,
'duration': duration,
'chapter': chapter_title,
'chapter_number': chapter_number,
'chapter_id': chapter_id,
})
title = organization.get('title') or manifest.get('metadata', {}).get('title')
return self.playlist_result(entries, course_id, title)

View File

@ -16,6 +16,7 @@
determine_ext,
float_or_none,
int_or_none,
join_nonempty,
mimetype2ext,
parse_age_limit,
parse_duration,
@ -498,10 +499,8 @@ def _real_extract(self, url):
m3u8_id=format_id, fatal=False))
continue
tbr = int_or_none(va.get('bitrate'), 1000)
if tbr:
format_id += f'-{tbr}'
formats.append({
'format_id': format_id,
'format_id': join_nonempty(format_id, tbr),
'url': public_url,
'width': int_or_none(va.get('width')),
'height': int_or_none(va.get('height')),

View File

@ -2,6 +2,7 @@
from ..utils import (
determine_ext,
int_or_none,
join_nonempty,
parse_duration,
parse_iso8601,
)
@ -41,7 +42,7 @@ def _real_extract(self, url):
else:
height = int_or_none(playback.get('height'))
formats.append({
'format_id': playback.get('name', 'http' + (f'-{height}p' if height else '')),
'format_id': playback.get('name') or join_nonempty('http', height and f'{height}p'),
'url': playback_url,
'width': int_or_none(playback.get('width')),
'height': height,

View File

@ -43,15 +43,17 @@ def _parse_video_data(self, container, extract_formats=True):
is_live = media.get('media_status') == 'RUNNING'
formats, subtitles = None, None
headers = {'Referer': 'https://nuum.ru/'}
if extract_formats:
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
media_url, video_id, 'mp4', live=is_live)
media_url, video_id, 'mp4', live=is_live, headers=headers)
return filter_dict({
'id': video_id,
'is_live': is_live,
'formats': formats,
'subtitles': subtitles,
'http_headers': headers,
**traverse_obj(container, {
'title': ('media_container_name', {str}),
'description': ('media_container_description', {str}),
@ -78,7 +80,7 @@ class NuumMediaIE(NuumBaseIE):
'only_matching': True,
}, {
'url': 'https://nuum.ru/videos/1567547-toxi-hurtz',
'md5': 'f1d9118a30403e32b702a204eb03aca3',
'md5': 'ce28837a5bbffe6952d7bfd3d39811b0',
'info_dict': {
'id': '1567547',
'ext': 'mp4',

View File

@ -550,7 +550,8 @@ def _real_extract(self, url):
return self._extract_video_info(segment_id, selected_segment)
# Even some segmented videos have an unsegmented version available in API response root
if not traverse_obj(api_json, ('sources', ..., ..., 'src', {url_or_none})):
if (self._configuration_arg('prefer_segments_playlist')
or not traverse_obj(api_json, ('sources', ..., ..., 'src', {url_or_none}))):
return self.playlist_result(
(self._extract_video_info(str(segment['id']), segment) for segment in segments),
video_id, **self._parse_metadata(api_json), multi_video=True)

View File

@ -316,7 +316,8 @@ def _real_extract(self, url):
r'(https(?:%3A%2F%2F|://)player\.vimeo\.com.+app_id(?:=|%3D)+\d+)',
traverse_obj(attributes, ('embed', 'html', {str})), 'vimeo url', fatal=False) or '')
if url_or_none(v_url) and self._request_webpage(
v_url, video_id, 'Checking Vimeo embed URL', headers=headers, fatal=False, errnote=False):
v_url, video_id, 'Checking Vimeo embed URL', headers=headers,
fatal=False, errnote=False, expected_status=429): # 429 is TLS fingerprint rejection
entries.append(self.url_result(
VimeoIE._smuggle_referrer(v_url, 'https://patreon.com/'),
VimeoIE, url_transparent=True))

View File

@ -41,7 +41,7 @@ class PelotonIE(InfoExtractor):
}, 'params': {
'skip_download': 'm3u8',
},
'_skip': 'Account needed',
'skip': 'Account needed',
}, {
'url': 'https://members.onepeloton.com/classes/player/26603d53d6bb4de1b340514864a6a6a8',
'info_dict': {
@ -61,7 +61,7 @@ class PelotonIE(InfoExtractor):
}, 'params': {
'skip_download': 'm3u8',
},
'_skip': 'Account needed',
'skip': 'Account needed',
}]
_MANIFEST_URL_TEMPLATE = '%s?hdnea=%s'
@ -199,7 +199,7 @@ class PelotonLiveIE(InfoExtractor):
'params': {
'skip_download': 'm3u8',
},
'_skip': 'Account needed',
'skip': 'Account needed',
}
def _real_extract(self, url):

View File

@ -1,5 +1,5 @@
from .common import InfoExtractor
from ..utils import int_or_none
from ..utils import int_or_none, join_nonempty
class PerformGroupIE(InfoExtractor):
@ -50,11 +50,8 @@ def _real_extract(self, url):
if not c_url:
continue
tbr = int_or_none(c.get('bitrate'), 1000)
format_id = 'http'
if tbr:
format_id += f'-{tbr}'
formats.append({
'format_id': format_id,
'format_id': join_nonempty('http', tbr),
'url': c_url,
'tbr': tbr,
'width': int_or_none(c.get('width')),

View File

@ -5,6 +5,7 @@
ExtractorError,
try_get,
)
from ..utils.traversal import traverse_obj
class PokerGoBaseIE(InfoExtractor):
@ -65,7 +66,7 @@ def _real_extract(self, url):
'width': image.get('width'),
'height': image.get('height'),
} for image in data_json.get('images') or [] if image.get('url')]
series_json = next(dct for dct in data_json.get('show_tags') or [] if dct.get('video_id') == video_id) or {}
series_json = traverse_obj(data_json, ('show_tags', lambda _, v: v['video_id'] == video_id, any)) or {}
return {
'_type': 'url_transparent',

View File

@ -1,9 +1,9 @@
import datetime as dt
import functools
import json
import urllib.parse
from .common import InfoExtractor
from ..compat import functools
from ..utils import (
ExtractorError,
float_or_none,

View File

@ -7,6 +7,7 @@
determine_ext,
float_or_none,
int_or_none,
join_nonempty,
merge_dicts,
unified_strdate,
)
@ -147,13 +148,13 @@ def fix_bitrate(bitrate):
'page_url': 'http://www.prosieben.de',
'tbr': tbr,
'ext': 'flv',
'format_id': 'rtmp{}'.format(f'-{tbr}' if tbr else ''),
'format_id': join_nonempty('rtmp', tbr),
})
else:
formats.append({
'url': source_url,
'tbr': tbr,
'format_id': 'http{}'.format(f'-{tbr}' if tbr else ''),
'format_id': join_nonempty('http', tbr),
})
return {

View File

@ -1,48 +1,125 @@
import base64
import functools
import json
import random
import re
import time
from .common import InfoExtractor
from ..utils import (
ExtractorError,
OnDemandPagedList,
clean_html,
int_or_none,
join_nonempty,
js_to_json,
str_or_none,
strip_jsonp,
traverse_obj,
unescapeHTML,
url_or_none,
urljoin,
)
class QQMusicIE(InfoExtractor):
class QQMusicBaseIE(InfoExtractor):
def _get_cookie(self, key, default=None):
return getattr(self._get_cookies('https://y.qq.com').get(key), 'value', default)
def _get_g_tk(self):
n = 5381
for c in self._get_cookie('qqmusic_key', ''):
n += (n << 5) + ord(c)
return n & 2147483647
def _get_uin(self):
return int_or_none(self._get_cookie('uin')) or 0
@property
def is_logged_in(self):
return bool(self._get_uin() and self._get_cookie('fqm_pvqid'))
# Reference: m_r_GetRUin() in top_player.js
# http://imgcache.gtimg.cn/music/portal_v3/y/top_player.js
@staticmethod
def _m_r_get_ruin():
cur_ms = int(time.time() * 1000) % 1000
return int(round(random.random() * 2147483647) * cur_ms % 1E10)
def _download_init_data(self, url, mid, fatal=True):
webpage = self._download_webpage(url, mid, fatal=fatal)
return self._search_json(r'window\.__INITIAL_DATA__\s*=', webpage,
'init data', mid, transform_source=js_to_json, fatal=fatal)
def _make_fcu_req(self, req_dict, mid, headers={}, **kwargs):
return self._download_json(
'https://u.y.qq.com/cgi-bin/musicu.fcg', mid, data=json.dumps({
'comm': {
'cv': 0,
'ct': 24,
'format': 'json',
'uin': self._get_uin(),
},
**req_dict,
}, separators=(',', ':')).encode(), headers=headers, **kwargs)
class QQMusicIE(QQMusicBaseIE):
IE_NAME = 'qqmusic'
IE_DESC = 'QQ音乐'
_VALID_URL = r'https?://y\.qq\.com/n/yqq/song/(?P<id>[0-9A-Za-z]+)\.html'
_VALID_URL = r'https?://y\.qq\.com/n/ryqq/songDetail/(?P<id>[0-9A-Za-z]+)'
_TESTS = [{
'url': 'https://y.qq.com/n/yqq/song/004295Et37taLD.html',
'url': 'https://y.qq.com/n/ryqq/songDetail/004Ti8rT003TaZ',
'md5': 'd7adc5c438d12e2cb648cca81593fd47',
'info_dict': {
'id': '004Ti8rT003TaZ',
'ext': 'mp3',
'title': '永夜のパレード (永夜的游行)',
'album': '幻想遊園郷 -Fantastic Park-',
'release_date': '20111230',
'duration': 281,
'creators': ['ケーキ姫', 'JUMA'],
'genres': ['Pop'],
'description': 'md5:b5261f3d595657ae561e9e6aee7eb7d9',
'size': 4501244,
'thumbnail': r're:^https?://.*\.jpg(?:$|[#?])',
'subtitles': 'count:1',
},
}, {
'url': 'https://y.qq.com/n/ryqq/songDetail/004295Et37taLD',
'md5': '5f1e6cea39e182857da7ffc5ef5e6bb8',
'info_dict': {
'id': '004295Et37taLD',
'ext': 'mp3',
'title': '可惜没如果',
'release_date': '20141227',
'creator': '林俊杰',
'description': 'md5:d85afb3051952ecc50a1ee8a286d1eac',
'thumbnail': r're:^https?://.*\.jpg$',
'album': '新地球 - 人 (Special Edition)',
'release_date': '20150129',
'duration': 298,
'creators': ['林俊杰'],
'genres': ['Pop'],
'description': 'md5:f568421ff618d2066e74b65a04149c4e',
'thumbnail': r're:^https?://.*\.jpg(?:$|[#?])',
},
'skip': 'premium member only',
}, {
'note': 'There is no mp3-320 version of this song.',
'url': 'https://y.qq.com/n/yqq/song/004MsGEo3DdNxV.html',
'md5': 'fa3926f0c585cda0af8fa4f796482e3e',
'url': 'https://y.qq.com/n/ryqq/songDetail/004MsGEo3DdNxV',
'md5': '028aaef1ae13d8a9f4861a92614887f9',
'info_dict': {
'id': '004MsGEo3DdNxV',
'ext': 'mp3',
'title': '如果',
'album': '新传媒电视连续剧金曲系列II',
'release_date': '20050626',
'creator': '李季美',
'description': 'md5:46857d5ed62bc4ba84607a805dccf437',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 220,
'creators': ['李季美'],
'genres': [],
'description': 'md5:fc711212aa623b28534954dc4bd67385',
'size': 3535730,
'thumbnail': r're:^https?://.*\.jpg(?:$|[#?])',
},
}, {
'note': 'lyrics not in .lrc format',
'url': 'https://y.qq.com/n/yqq/song/001JyApY11tIp6.html',
'url': 'https://y.qq.com/n/ryqq/songDetail/001JyApY11tIp6',
'info_dict': {
'id': '001JyApY11tIp6',
'ext': 'mp3',
@ -50,185 +127,193 @@ class QQMusicIE(InfoExtractor):
'release_date': '19970225',
'creator': 'Dark Funeral',
'description': 'md5:c9b20210587cbcd6836a1c597bab4525',
'thumbnail': r're:^https?://.*\.jpg$',
},
'params': {
'skip_download': True,
'thumbnail': r're:^https?://.*\.jpg(?:$|[#?])',
},
'params': {'skip_download': True},
'skip': 'no longer available',
}]
_FORMATS = {
'mp3-320': {'prefix': 'M800', 'ext': 'mp3', 'preference': 40, 'abr': 320},
'mp3-128': {'prefix': 'M500', 'ext': 'mp3', 'preference': 30, 'abr': 128},
'm4a': {'prefix': 'C200', 'ext': 'm4a', 'preference': 10},
'F000': {'name': 'flac', 'prefix': 'F000', 'ext': 'flac', 'preference': 60},
'A000': {'name': 'ape', 'prefix': 'A000', 'ext': 'ape', 'preference': 50},
'M800': {'name': '320mp3', 'prefix': 'M800', 'ext': 'mp3', 'preference': 40, 'abr': 320},
'M500': {'name': '128mp3', 'prefix': 'M500', 'ext': 'mp3', 'preference': 30, 'abr': 128},
'C400': {'name': '96aac', 'prefix': 'C400', 'ext': 'm4a', 'preference': 20, 'abr': 96},
'C200': {'name': '48aac', 'prefix': 'C200', 'ext': 'm4a', 'preference': 20, 'abr': 48},
}
# Reference: m_r_GetRUin() in top_player.js
# http://imgcache.gtimg.cn/music/portal_v3/y/top_player.js
@staticmethod
def m_r_get_ruin():
cur_ms = int(time.time() * 1000) % 1000
return int(round(random.random() * 2147483647) * cur_ms % 1E10)
def _real_extract(self, url):
mid = self._match_id(url)
detail_info_page = self._download_webpage(
f'http://s.plcloud.music.qq.com/fcgi-bin/fcg_yqq_song_detail_info.fcg?songmid={mid}&play=0',
mid, note='Download song detail info',
errnote='Unable to get song detail info', encoding='gbk')
init_data = self._download_init_data(url, mid, fatal=False)
info_data = self._make_fcu_req({'info': {
'module': 'music.pf_song_detail_svr',
'method': 'get_song_detail_yqq',
'param': {
'song_mid': mid,
'song_type': 0,
},
}}, mid, note='Downloading song info')['info']['data']['track_info']
song_name = self._html_search_regex(
r"songname:\s*'([^']+)'", detail_info_page, 'song name')
media_mid = info_data['file']['media_mid']
publish_time = self._html_search_regex(
r'发行时间:(\d{4}-\d{2}-\d{2})', detail_info_page,
'publish time', default=None)
if publish_time:
publish_time = publish_time.replace('-', '')
singer = self._html_search_regex(
r"singer:\s*'([^']+)", detail_info_page, 'singer', default=None)
lrc_content = self._html_search_regex(
r'<div class="content" id="lrc_content"[^<>]*>([^<>]+)</div>',
detail_info_page, 'LRC lyrics', default=None)
if lrc_content:
lrc_content = lrc_content.replace('\\n', '\n')
thumbnail_url = None
albummid = self._search_regex(
[r'albummid:\'([0-9a-zA-Z]+)\'', r'"albummid":"([0-9a-zA-Z]+)"'],
detail_info_page, 'album mid', default=None)
if albummid:
thumbnail_url = f'http://i.gtimg.cn/music/photo/mid_album_500/{albummid[-2:-1]}/{albummid[-1]}/{albummid}.jpg'
guid = self.m_r_get_ruin()
vkey = self._download_json(
f'http://base.music.qq.com/fcgi-bin/fcg_musicexpress.fcg?json=3&guid={guid}',
mid, note='Retrieve vkey', errnote='Unable to get vkey',
transform_source=strip_jsonp)['key']
data = self._make_fcu_req({
'req_1': {
'module': 'vkey.GetVkeyServer',
'method': 'CgiGetVkey',
'param': {
'guid': str(self._m_r_get_ruin()),
'songmid': [mid] * len(self._FORMATS),
'songtype': [0] * len(self._FORMATS),
'uin': str(self._get_uin()),
'loginflag': 1,
'platform': '20',
'filename': [f'{f["prefix"]}{media_mid}.{f["ext"]}' for f in self._FORMATS.values()],
},
},
'req_2': {
'module': 'music.musichallSong.PlayLyricInfo',
'method': 'GetPlayLyricInfo',
'param': {'songMID': mid},
},
}, mid, note='Downloading formats and lyric', headers=self.geo_verification_headers())
code = traverse_obj(data, ('req_1', 'code', {int}))
if code != 0:
raise ExtractorError(f'Failed to download format info, error code {code or "unknown"}')
formats = []
for format_id, details in self._FORMATS.items():
for media_info in traverse_obj(data, (
'req_1', 'data', 'midurlinfo', lambda _, v: v['songmid'] == mid and v['purl']),
):
format_key = traverse_obj(media_info, ('filename', {str}, {lambda x: x[:4]}))
format_info = self._FORMATS.get(format_key) or {}
format_id = format_info.get('name')
formats.append({
'url': 'http://cc.stream.qqmusic.qq.com/{}{}.{}?vkey={}&guid={}&fromtag=0'.format(
details['prefix'], mid, details['ext'], vkey, guid),
'url': urljoin('https://dl.stream.qqmusic.qq.com', media_info['purl']),
'format': format_id,
'format_id': format_id,
'quality': details['preference'],
'abr': details.get('abr'),
'size': traverse_obj(info_data, ('file', f'size_{format_id}', {int_or_none})),
'quality': format_info.get('preference'),
'abr': format_info.get('abr'),
'ext': format_info.get('ext'),
'vcodec': 'none',
})
self._check_formats(formats, mid)
actual_lrc_lyrics = ''.join(
line + '\n' for line in re.findall(
r'(?m)^(\[[0-9]{2}:[0-9]{2}(?:\.[0-9]{2,})?\][^\n]*|\[[^\]]*\])', lrc_content))
if not formats and not self.is_logged_in:
self.raise_login_required()
if traverse_obj(data, ('req_2', 'code')):
self.report_warning(f'Failed to download lyric, error {data["req_2"]["code"]!r}')
lrc_content = traverse_obj(data, ('req_2', 'data', 'lyric', {lambda x: base64.b64decode(x).decode('utf-8')}))
info_dict = {
'id': mid,
'formats': formats,
'title': song_name,
'release_date': publish_time,
'creator': singer,
'description': lrc_content,
'thumbnail': thumbnail_url,
**traverse_obj(info_data, {
'title': ('title', {str}),
'album': ('album', 'title', {str}, {lambda x: x or None}),
'release_date': ('time_public', {lambda x: x.replace('-', '') or None}),
'creators': ('singer', ..., 'name', {str}),
'alt_title': ('subtitle', {str}, {lambda x: x or None}),
'duration': ('interval', {int_or_none}),
}),
**traverse_obj(init_data, ('detail', {
'thumbnail': ('picurl', {url_or_none}),
'description': ('info', 'intro', 'content', ..., 'value', {str}),
'genres': ('info', 'genre', 'content', ..., 'value', {str}, all),
}), get_all=False),
}
if actual_lrc_lyrics:
info_dict['subtitles'] = {
'origin': [{
'ext': 'lrc',
'data': actual_lrc_lyrics,
}],
}
if lrc_content:
info_dict['subtitles'] = {'origin': [{'ext': 'lrc', 'data': lrc_content}]}
info_dict['description'] = join_nonempty(info_dict.get('description'), lrc_content, delim='\n')
return info_dict
class QQPlaylistBaseIE(InfoExtractor):
@staticmethod
def qq_static_url(category, mid):
return f'http://y.qq.com/y/static/{category}/{mid[-2]}/{mid[-1]}/{mid}.html'
def get_singer_all_songs(self, singmid, num):
return self._download_webpage(
r'https://c.y.qq.com/v8/fcg-bin/fcg_v8_singer_track_cp.fcg', singmid,
query={
'format': 'json',
'inCharset': 'utf8',
'outCharset': 'utf-8',
'platform': 'yqq',
'needNewCode': 0,
'singermid': singmid,
'order': 'listen',
'begin': 0,
'num': num,
'songstatus': 1,
})
def get_entries_from_page(self, singmid):
entries = []
default_num = 1
json_text = self.get_singer_all_songs(singmid, default_num)
json_obj_all_songs = self._parse_json(json_text, singmid)
if json_obj_all_songs['code'] == 0:
total = json_obj_all_songs['data']['total']
json_text = self.get_singer_all_songs(singmid, total)
json_obj_all_songs = self._parse_json(json_text, singmid)
for item in json_obj_all_songs['data']['list']:
if item['musicData'].get('songmid') is not None:
songmid = item['musicData']['songmid']
entries.append(self.url_result(
rf'https://y.qq.com/n/yqq/song/{songmid}.html', 'QQMusic', songmid))
return entries
class QQMusicSingerIE(QQPlaylistBaseIE):
class QQMusicSingerIE(QQMusicBaseIE):
IE_NAME = 'qqmusic:singer'
IE_DESC = 'QQ音乐 - 歌手'
_VALID_URL = r'https?://y\.qq\.com/n/yqq/singer/(?P<id>[0-9A-Za-z]+)\.html'
_TEST = {
'url': 'https://y.qq.com/n/yqq/singer/001BLpXF2DyJe2.html',
_VALID_URL = r'https?://y\.qq\.com/n/ryqq/singer/(?P<id>[0-9A-Za-z]+)'
_TESTS = [{
'url': 'https://y.qq.com/n/ryqq/singer/001BLpXF2DyJe2',
'info_dict': {
'id': '001BLpXF2DyJe2',
'title': '林俊杰',
'description': 'md5:870ec08f7d8547c29c93010899103751',
'description': 'md5:10624ce73b06fa400bc846f59b0305fa',
'thumbnail': r're:^https?://.*\.jpg(?:$|[#?])',
},
'playlist_mincount': 12,
}
'playlist_mincount': 100,
}, {
'url': 'https://y.qq.com/n/ryqq/singer/000Q00f213YzNV',
'info_dict': {
'id': '000Q00f213YzNV',
'title': '桃几OvO',
'description': '小破站小唱见~希望大家喜欢听我唱歌~',
'thumbnail': r're:^https?://.*\.jpg(?:$|[#?])',
},
'playlist_count': 12,
'playlist': [{
'info_dict': {
'id': '0016cvsy02mmCl',
'ext': 'mp3',
'title': '群青',
'album': '桃几2021年翻唱集',
'release_date': '20210913',
'duration': 248,
'creators': ['桃几OvO'],
'genres': ['Pop'],
'description': 'md5:4296005a04edcb5cdbe0889d5055a7ae',
'size': 3970822,
'thumbnail': r're:^https?://.*\.jpg(?:$|[#?])',
},
}],
}]
_PAGE_SIZE = 50
def _fetch_page(self, mid, page_size, page_num):
data = self._make_fcu_req({'req_1': {
'module': 'music.web_singer_info_svr',
'method': 'get_singer_detail_info',
'param': {
'sort': 5,
'singermid': mid,
'sin': page_num * page_size,
'num': page_size,
}}}, mid, note=f'Downloading page {page_num}')
yield from traverse_obj(data, ('req_1', 'data', 'songlist', ..., {lambda x: self.url_result(
f'https://y.qq.com/n/ryqq/songDetail/{x["mid"]}', QQMusicIE, x['mid'], x.get('title'))}))
def _real_extract(self, url):
mid = self._match_id(url)
init_data = self._download_init_data(url, mid, fatal=False)
entries = self.get_entries_from_page(mid)
singer_page = self._download_webpage(url, mid, 'Download singer page')
singer_name = self._html_search_regex(
r"singername\s*:\s*'(.*?)'", singer_page, 'singer name', default=None)
singer_desc = None
return self.playlist_result(
OnDemandPagedList(functools.partial(self._fetch_page, mid, self._PAGE_SIZE), self._PAGE_SIZE),
mid, **traverse_obj(init_data, ('singerDetail', {
'title': ('basic_info', 'name', {str}),
'description': ('ex_info', 'desc', {str}),
'thumbnail': ('pic', 'pic', {url_or_none}),
})))
if mid:
singer_desc_page = self._download_xml(
'http://s.plcloud.music.qq.com/fcgi-bin/fcg_get_singer_desc.fcg', mid,
'Donwload singer description XML',
query={'utf8': 1, 'outCharset': 'utf-8', 'format': 'xml', 'singermid': mid},
headers={'Referer': 'https://y.qq.com/n/yqq/singer/'})
singer_desc = singer_desc_page.find('./data/info/desc').text
return self.playlist_result(entries, mid, singer_name, singer_desc)
class QQPlaylistBaseIE(InfoExtractor):
def _extract_entries(self, info_json, path):
for song in traverse_obj(info_json, path):
song_mid = song.get('songmid')
if not song_mid:
continue
yield self.url_result(
f'https://y.qq.com/n/ryqq/songDetail/{song_mid}',
QQMusicIE, song_mid, song.get('songname'))
class QQMusicAlbumIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:album'
IE_DESC = 'QQ音乐 - 专辑'
_VALID_URL = r'https?://y\.qq\.com/n/yqq/album/(?P<id>[0-9A-Za-z]+)\.html'
_VALID_URL = r'https?://y\.qq\.com/n/ryqq/albumDetail/(?P<id>[0-9A-Za-z]+)'
_TESTS = [{
'url': 'https://y.qq.com/n/yqq/album/000gXCTb2AhRR1.html',
'url': 'https://y.qq.com/n/ryqq/albumDetail/000gXCTb2AhRR1',
'info_dict': {
'id': '000gXCTb2AhRR1',
'title': '我们都是这样长大的',
@ -236,10 +321,10 @@ class QQMusicAlbumIE(QQPlaylistBaseIE):
},
'playlist_count': 4,
}, {
'url': 'https://y.qq.com/n/yqq/album/002Y5a3b3AlCu3.html',
'url': 'https://y.qq.com/n/ryqq/albumDetail/002Y5a3b3AlCu3',
'info_dict': {
'id': '002Y5a3b3AlCu3',
'title': '그리고...',
'title': '그리고',
'description': 'md5:a48823755615508a95080e81b51ba729',
},
'playlist_count': 8,
@ -248,49 +333,45 @@ class QQMusicAlbumIE(QQPlaylistBaseIE):
def _real_extract(self, url):
mid = self._match_id(url)
album = self._download_json(
f'http://i.y.qq.com/v8/fcg-bin/fcg_v8_album_info_cp.fcg?albummid={mid}&format=json',
mid, 'Download album page')['data']
album_json = self._download_json(
'http://i.y.qq.com/v8/fcg-bin/fcg_v8_album_info_cp.fcg',
mid, 'Download album page',
query={'albummid': mid, 'format': 'json'})['data']
entries = [
self.url_result(
'https://y.qq.com/n/yqq/song/' + song['songmid'] + '.html', 'QQMusic', song['songmid'],
) for song in album['list']
]
album_name = album.get('name')
album_detail = album.get('desc')
if album_detail is not None:
album_detail = album_detail.strip()
entries = self._extract_entries(album_json, ('list', ...))
return self.playlist_result(entries, mid, album_name, album_detail)
return self.playlist_result(entries, mid, **traverse_obj(album_json, {
'title': ('name', {str}),
'description': ('desc', {str.strip}),
}))
class QQMusicToplistIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:toplist'
IE_DESC = 'QQ音乐 - 排行榜'
_VALID_URL = r'https?://y\.qq\.com/n/yqq/toplist/(?P<id>[0-9]+)\.html'
_VALID_URL = r'https?://y\.qq\.com/n/ryqq/toplist/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'https://y.qq.com/n/yqq/toplist/123.html',
'url': 'https://y.qq.com/n/ryqq/toplist/123',
'info_dict': {
'id': '123',
'title': '美国iTunes榜',
'description': 'md5:89db2335fdbb10678dee2d43fe9aba08',
'title': r're:美国热门音乐榜 \d{4}-\d{2}-\d{2}',
'description': '美国热门音乐榜,每周一更新。',
},
'playlist_count': 100,
'playlist_count': 95,
}, {
'url': 'https://y.qq.com/n/yqq/toplist/3.html',
'url': 'https://y.qq.com/n/ryqq/toplist/3',
'info_dict': {
'id': '3',
'title': '巅峰榜·欧美',
'description': 'md5:5a600d42c01696b26b71f8c4d43407da',
'title': r're:巅峰榜·欧美 \d{4}-\d{2}-\d{2}',
'description': 'md5:4def03b60d3644be4c9a36f21fd33857',
},
'playlist_count': 100,
}, {
'url': 'https://y.qq.com/n/yqq/toplist/106.html',
'url': 'https://y.qq.com/n/ryqq/toplist/106',
'info_dict': {
'id': '106',
'title': '韩国Mnet榜',
'title': r're:韩国Mnet榜 \d{4}-\d{2}-\d{2}',
'description': 'md5:cb84b325215e1d21708c615cac82a6e7',
},
'playlist_count': 50,
@ -304,33 +385,20 @@ def _real_extract(self, url):
note='Download toplist page',
query={'type': 'toplist', 'topid': list_id, 'format': 'json'})
entries = [self.url_result(
'https://y.qq.com/n/yqq/song/' + song['data']['songmid'] + '.html', 'QQMusic',
song['data']['songmid'])
for song in toplist_json['songlist']]
topinfo = toplist_json.get('topinfo', {})
list_name = topinfo.get('ListName')
list_description = topinfo.get('info')
return self.playlist_result(entries, list_id, list_name, list_description)
return self.playlist_result(
self._extract_entries(toplist_json, ('songlist', ..., 'data')), list_id,
playlist_title=join_nonempty(*traverse_obj(
toplist_json, ((('topinfo', 'ListName'), 'update_time'), None)), delim=' '),
playlist_description=traverse_obj(toplist_json, ('topinfo', 'info')))
class QQMusicPlaylistIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:playlist'
IE_DESC = 'QQ音乐 - 歌单'
_VALID_URL = r'https?://y\.qq\.com/n/yqq/playlist/(?P<id>[0-9]+)\.html'
_VALID_URL = r'https?://y\.qq\.com/n/ryqq/playlist/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://y.qq.com/n/yqq/playlist/3462654915.html',
'info_dict': {
'id': '3462654915',
'title': '韩国5月新歌精选下旬',
'description': 'md5:d2c9d758a96b9888cf4fe82f603121d4',
},
'playlist_count': 40,
'skip': 'playlist gone',
}, {
'url': 'https://y.qq.com/n/yqq/playlist/1374105607.html',
'url': 'https://y.qq.com/n/ryqq/playlist/1374105607',
'info_dict': {
'id': '1374105607',
'title': '易入人心的华语民谣',
@ -346,19 +414,83 @@ def _real_extract(self, url):
'http://i.y.qq.com/qzone-music/fcg-bin/fcg_ucc_getcdinfo_byids_cp.fcg',
list_id, 'Download list page',
query={'type': 1, 'json': 1, 'utf8': 1, 'onlysong': 0, 'disstid': list_id},
transform_source=strip_jsonp)
transform_source=strip_jsonp, headers={'Referer': url})
if not len(list_json.get('cdlist', [])):
if list_json.get('code'):
raise ExtractorError(
'QQ Music said: error %d in fetching playlist info' % list_json['code'],
expected=True)
raise ExtractorError('Unable to get playlist info')
raise ExtractorError(join_nonempty(
'Unable to get playlist info',
join_nonempty('code', 'subcode', from_dict=list_json),
list_json.get('msg'), delim=': '))
cdlist = list_json['cdlist'][0]
entries = [self.url_result(
'https://y.qq.com/n/yqq/song/' + song['songmid'] + '.html', 'QQMusic', song['songmid'])
for song in cdlist['songlist']]
entries = self._extract_entries(list_json, ('cdlist', 0, 'songlist', ...))
list_name = cdlist.get('dissname')
list_description = clean_html(unescapeHTML(cdlist.get('desc')))
return self.playlist_result(entries, list_id, list_name, list_description)
return self.playlist_result(entries, list_id, **traverse_obj(list_json, ('cdlist', 0, {
'title': ('dissname', {str}),
'description': ('desc', {unescapeHTML}, {clean_html}),
})))
class QQMusicVideoIE(QQMusicBaseIE):
IE_NAME = 'qqmusic:mv'
IE_DESC = 'QQ音乐 - MV'
_VALID_URL = r'https?://y\.qq\.com/n/ryqq/mv/(?P<id>[0-9A-Za-z]+)'
_TESTS = [{
'url': 'https://y.qq.com/n/ryqq/mv/002Vsarh3SVU8K',
'info_dict': {
'id': '002Vsarh3SVU8K',
'ext': 'mp4',
'title': 'The Chant (Extended Mix / Audio)',
'description': '',
'thumbnail': r're:^https?://.*\.jpg(?:$|[#?])',
'release_timestamp': 1688918400,
'release_date': '20230709',
'duration': 313,
'creators': ['Duke Dumont'],
'view_count': int,
},
}]
def _parse_url_formats(self, url_data):
return traverse_obj(url_data, ('mp4', lambda _, v: v['freeflow_url'], {
'url': ('freeflow_url', 0, {url_or_none}),
'filesize': ('fileSize', {int_or_none}),
'format_id': ('newFileType', {str_or_none}),
}))
def _real_extract(self, url):
video_id = self._match_id(url)
video_info = self._make_fcu_req({
'mvInfo': {
'module': 'music.video.VideoData',
'method': 'get_video_info_batch',
'param': {
'vidlist': [video_id],
'required': [
'vid', 'type', 'sid', 'cover_pic', 'duration', 'singers',
'video_pay', 'hint', 'code', 'msg', 'name', 'desc',
'playcnt', 'pubdate', 'play_forbid_reason'],
},
},
'mvUrl': {
'module': 'music.stream.MvUrlProxy',
'method': 'GetMvUrls',
'param': {'vids': [video_id]},
},
}, video_id, headers=self.geo_verification_headers())
if traverse_obj(video_info, ('mvInfo', 'data', video_id, 'play_forbid_reason')) == 3:
self.raise_geo_restricted()
return {
'id': video_id,
'formats': self._parse_url_formats(traverse_obj(video_info, ('mvUrl', 'data', video_id))),
**traverse_obj(video_info, ('mvInfo', 'data', video_id, {
'title': ('name', {str}),
'description': ('desc', {str}),
'thumbnail': ('cover_pic', {url_or_none}),
'release_timestamp': ('pubdate', {int_or_none}),
'duration': ('duration', {int_or_none}),
'creators': ('singers', ..., 'name', {str}),
'view_count': ('playcnt', {int_or_none}),
})),
}

View File

@ -21,7 +21,7 @@ def _perform_login(self, username, password):
if not urlh:
return
content, urlh = self._download_webpage_handle(
response = self._download_webpage_handle(
urlh.url, None, fatal=False, headers={'referer': urlh.url},
note='logging in', errnote='unable to log in',
data=urlencode_postdata({
@ -30,7 +30,11 @@ def _perform_login(self, username, password):
'j_username': username,
'j_password': password,
}))
if not urlh or urlh.url == 'https://tube.tugraz.at/paella/ui/index.html':
if not response:
return
content, urlh = response
if urlh.url == 'https://tube.tugraz.at/paella/ui/index.html':
return
if not self._html_search_regex(
@ -39,7 +43,7 @@ def _perform_login(self, username, password):
self.report_warning('unable to login: incorrect password')
return
content, urlh = self._download_webpage_handle(
urlh = self._request_webpage(
urlh.url, None, fatal=False, headers={'referer': urlh.url},
note='logging in with TFA', errnote='unable to log in with TFA',
data=urlencode_postdata({

View File

@ -14,6 +14,7 @@
float_or_none,
format_field,
int_or_none,
join_nonempty,
make_archive_id,
remove_end,
str_or_none,
@ -107,7 +108,7 @@ def _extract_variant_formats(self, variant, video_id):
tbr = int_or_none(dict_get(variant, ('bitrate', 'bit_rate')), 1000) or None
f = {
'url': variant_url,
'format_id': 'http' + (f'-{tbr}' if tbr else ''),
'format_id': join_nonempty('http', tbr),
'tbr': tbr,
}
self._search_dimensions_in_video_url(f, variant_url)

View File

@ -5,6 +5,7 @@
from ..utils import (
ExtractorError,
int_or_none,
join_nonempty,
parse_age_limit,
traverse_obj,
)
@ -120,7 +121,7 @@ def _real_extract(self, url):
'height', default=None))
formats.append({
'url': video_asset_url,
'format_id': 'http{}'.format(f'-{bitrate}' if bitrate else ''),
'format_id': join_nonempty('http', bitrate),
'tbr': bitrate,
'height': height,
'vcodec': video_asset.get('codec'),

View File

@ -829,21 +829,33 @@ def _real_extract(self, url):
url = 'https://vimeo.com/' + video_id
self._try_album_password(url)
is_secure = urllib.parse.urlparse(url).scheme == 'https'
try:
# Retrieve video webpage to extract further information
webpage, urlh = self._download_webpage_handle(
url, video_id, headers=headers)
url, video_id, headers=headers, impersonate=is_secure)
redirect_url = urlh.url
except ExtractorError as ee:
if isinstance(ee.cause, HTTPError) and ee.cause.status == 403:
errmsg = ee.cause.response.read()
if b'Because of its privacy settings, this video cannot be played here' in errmsg:
raise ExtractorError(
'Cannot download embed-only video without embedding '
'URL. Please call yt-dlp with the URL of the page '
'that embeds this video.',
expected=True)
raise
except ExtractorError as error:
if not isinstance(error.cause, HTTPError) or error.cause.status not in (403, 429):
raise
errmsg = error.cause.response.read()
if b'Because of its privacy settings, this video cannot be played here' in errmsg:
raise ExtractorError(
'Cannot download embed-only video without embedding URL. Please call yt-dlp '
'with the URL of the page that embeds this video.', expected=True)
# 403 == vimeo.com TLS fingerprint or DC IP block; 429 == player.vimeo.com TLS FP block
status = error.cause.status
dcip_msg = 'If you are using a data center IP or VPN/proxy, your IP may be blocked'
if target := error.cause.response.extensions.get('impersonate'):
raise ExtractorError(
f'Got HTTP Error {status} when using impersonate target "{target}". {dcip_msg}')
elif not is_secure:
raise ExtractorError(f'Got HTTP Error {status}. {dcip_msg}', expected=True)
raise ExtractorError(
'This request has been blocked due to its TLS fingerprint. Install a '
'required impersonation dependency if possible, or else if you are okay with '
f'{self._downloader._format_err("compromising your security/cookies", "light red")}, '
f'try replacing "https:" with "http:" in the input URL. {dcip_msg}.', expected=True)
if '://player.vimeo.com/video/' in url:
config = self._search_json(

View File

@ -52,6 +52,7 @@ def _update_visitor_cookies(self, visitor_url, video_id):
})
def _weibo_download_json(self, url, video_id, *args, fatal=True, note='Downloading JSON metadata', **kwargs):
# XXX: Always fatal; _download_webpage_handle only returns False (not a tuple) on error
webpage, urlh = self._download_webpage_handle(url, video_id, *args, fatal=fatal, note=note, **kwargs)
if urllib.parse.urlparse(urlh.url).netloc == 'passport.weibo.com':
self._update_visitor_cookies(urlh.url, video_id)

View File

@ -2,6 +2,7 @@
from ..utils import (
float_or_none,
int_or_none,
join_nonempty,
unified_strdate,
)
@ -76,7 +77,7 @@ def _real_extract(self, url):
tbr = int_or_none(v.get('bitrate'))
formats.append({
'url': mp4_url,
'format_id': 'http' + (f'-{tbr}' if tbr else ''),
'format_id': join_nonempty('http', tbr),
'tbr': tbr,
'width': int_or_none(v.get('width')),
'height': int_or_none(v.get('height')),

View File

@ -8,6 +8,7 @@
ExtractorError,
clean_html,
int_or_none,
join_nonempty,
mimetype2ext,
parse_iso8601,
traverse_obj,
@ -213,7 +214,7 @@ def _extract_yahoo_video(self, video_id, country):
tbr = int_or_none(s.get('bitrate'))
formats.append({
'url': s_url,
'format_id': fmt + (f'-{tbr}' if tbr else ''),
'format_id': join_nonempty(fmt, tbr),
'width': int_or_none(s.get('width')),
'height': int_or_none(s.get('height')),
'tbr': tbr,
@ -371,12 +372,13 @@ def _extract_formats(self, json_data, content_id):
url, content_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
bitrate = int_or_none(vid.get('bitrate'))
formats.append({
'url': url,
'format_id': f'http-{vid.get("bitrate")}',
'format_id': join_nonempty('http', bitrate),
'height': int_or_none(vid.get('height')),
'width': int_or_none(vid.get('width')),
'tbr': int_or_none(vid.get('bitrate')),
'tbr': bitrate,
})
self._remove_duplicate_formats(formats)

View File

@ -5,6 +5,7 @@
determine_ext,
float_or_none,
int_or_none,
join_nonempty,
mimetype2ext,
try_get,
urljoin,
@ -116,12 +117,9 @@ def call_api(action):
else:
size = video.get('size') or {}
height = int_or_none(size.get('height'))
format_id = 'hls'
if height:
format_id += f'-{height}p'
formats.append({
'ext': 'mp4',
'format_id': format_id,
'format_id': join_nonempty('hls', height and f'{height}p'),
'height': height,
'protocol': 'm3u8_native',
'url': format_url,

View File

@ -4,6 +4,7 @@
import copy
import datetime as dt
import enum
import functools
import hashlib
import itertools
import json
@ -20,7 +21,6 @@
from .common import InfoExtractor, SearchInfoExtractor
from .openload import PhantomJSwrapper
from ..compat import functools
from ..jsinterp import JSInterpreter
from ..networking.exceptions import HTTPError, network_exceptions
from ..utils import (

View File

@ -1,6 +1,7 @@
from __future__ import annotations
import contextlib
import functools
import io
import logging
import ssl
@ -22,7 +23,6 @@
TransportError,
)
from .websocket import WebSocketRequestHandler, WebSocketResponse
from ..compat import functools
from ..dependencies import websockets
from ..socks import ProxyError as SocksProxyError
from ..utils import int_or_none

View File

@ -474,7 +474,7 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
'no-attach-info-json', 'embed-thumbnail-atomicparsley', 'no-external-downloader-progress',
'embed-metadata', 'seperate-video-versions', 'no-clean-infojson', 'no-keep-subs', 'no-certifi',
'no-youtube-channel-redirect', 'no-youtube-unavailable-videos', 'no-youtube-prefer-utc-upload-date',
'prefer-legacy-http-handler', 'manifest-filesize-approx',
'prefer-legacy-http-handler', 'manifest-filesize-approx', 'allow-unsafe-ext',
}, 'aliases': {
'youtube-dl': ['all', '-multistreams', '-playlist-match-filter', '-manifest-filesize-approx'],
'youtube-dlc': ['all', '-no-youtube-channel-redirect', '-no-live-chat', '-playlist-match-filter', '-manifest-filesize-approx'],
@ -646,7 +646,7 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
'You can also simply specify a field to match if the field is present, '
'use "!field" to check if the field is not present, and "&" to check multiple conditions. '
'Use a "\\" to escape "&" or quotes if needed. If used multiple times, '
'the filter matches if atleast one of the conditions are met. E.g. --match-filter '
'the filter matches if at least one of the conditions is met. E.g. --match-filter '
'!is_live --match-filter "like_count>?100 & description~=\'(?i)\\bcats \\& dogs\\b\'" '
'matches only videos that are not live OR those that have a like count more than 100 '
'(or the like field is not available) and also has a description '
@ -1479,7 +1479,7 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
'Optionally, the KEYRING used for decrypting Chromium cookies on Linux, '
'the name/path of the PROFILE to load cookies from, '
'and the CONTAINER name (if Firefox) ("none" for no container) '
'can be given with their respective seperators. '
'can be given with their respective separators. '
'By default, all containers of the most recently accessed profile are used. '
f'Currently supported keyrings are: {", ".join(map(str.lower, sorted(SUPPORTED_KEYRINGS)))}'))
filesystem.add_option(
@ -1781,7 +1781,7 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
'It can be one of "pre_process" (after video extraction), "after_filter" (after video passes filter), '
'"video" (after --format; before --print/--output), "before_dl" (before each video download), '
'"post_process" (after each video download; default), '
'"after_move" (after moving video file to it\'s final locations), '
'"after_move" (after moving video file to its final locations), '
'"after_video" (after downloading and processing all formats of a video), '
'or "playlist" (at end of playlist). '
'This option can be used multiple times to add different postprocessors'))

View File

@ -1,5 +1,6 @@
import collections
import contextvars
import functools
import itertools
import json
import os
@ -8,7 +9,7 @@
import time
from .common import PostProcessor
from ..compat import functools, imghdr
from ..compat import imghdr
from ..utils import (
MEDIA_EXTENSIONS,
ISO639Utils,

View File

@ -2085,17 +2085,20 @@ def parse_duration(s):
(days, 86400), (hours, 3600), (mins, 60), (secs, 1), (ms, 1)))
def prepend_extension(filename, ext, expected_real_ext=None):
def _change_extension(prepend, filename, ext, expected_real_ext=None):
name, real_ext = os.path.splitext(filename)
return (
f'{name}.{ext}{real_ext}'
if not expected_real_ext or real_ext[1:] == expected_real_ext
else f'{filename}.{ext}')
if not expected_real_ext or real_ext[1:] == expected_real_ext:
filename = name
if prepend and real_ext:
_UnsafeExtensionError.sanitize_extension(ext, prepend=True)
return f'{filename}.{ext}{real_ext}'
return f'{filename}.{_UnsafeExtensionError.sanitize_extension(ext)}'
def replace_extension(filename, ext, expected_real_ext=None):
name, real_ext = os.path.splitext(filename)
return f'{name if not expected_real_ext or real_ext[1:] == expected_real_ext else filename}.{ext}'
prepend_extension = functools.partial(_change_extension, True)
replace_extension = functools.partial(_change_extension, False)
def check_executable(exe, args=[]):
@ -5035,6 +5038,101 @@ def items_(self):
KNOWN_EXTENSIONS = (*MEDIA_EXTENSIONS.video, *MEDIA_EXTENSIONS.audio, *MEDIA_EXTENSIONS.manifests)
class _UnsafeExtensionError(Exception):
"""
Mitigation exception for uncommon/malicious file extensions
This should be caught in YoutubeDL.py alongside a warning
Ref: https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-79w7-vh3h-8g4j
"""
ALLOWED_EXTENSIONS = frozenset([
# internal
'description',
'json',
'meta',
'orig',
'part',
'temp',
'uncut',
'unknown_video',
'ytdl',
# video
*MEDIA_EXTENSIONS.video,
'avif',
'ismv',
'm2ts',
'm4s',
'mng',
'mpeg',
'qt',
'swf',
'ts',
'vp9',
'wvm',
# audio
*MEDIA_EXTENSIONS.audio,
'isma',
'mid',
'mpga',
'ra',
# image
*MEDIA_EXTENSIONS.thumbnails,
'bmp',
'gif',
'heic',
'ico',
'jng',
'jpeg',
'jxl',
'svg',
'tif',
'wbmp',
# subtitle
*MEDIA_EXTENSIONS.subtitles,
'dfxp',
'fs',
'ismt',
'sami',
'scc',
'ssa',
'tt',
'ttml',
# others
*MEDIA_EXTENSIONS.manifests,
*MEDIA_EXTENSIONS.storyboards,
'desktop',
'ism',
'm3u',
'sbv',
'url',
'webloc',
'xml',
])
def __init__(self, extension, /):
super().__init__(f'unsafe file extension: {extension!r}')
self.extension = extension
@classmethod
def sanitize_extension(cls, extension, /, *, prepend=False):
if '/' in extension or '\\' in extension:
raise cls(extension)
if not prepend:
_, _, last = extension.rpartition('.')
if last == 'bin':
extension = last = 'unknown_video'
if last.lower() not in cls.ALLOWED_EXTENSIONS:
raise cls(extension)
return extension
class RetryManager:
"""Usage:
for retry in RetryManager(...):

View File

@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py
__version__ = '2024.05.27'
__version__ = '2024.07.01'
RELEASE_GIT_HEAD = '12b248ce60be1aa1362edd839d915bba70dbee4b'
RELEASE_GIT_HEAD = '5ce582448ececb8d9c30c8c31f58330090ced03a'
VARIANT = None
@ -12,4 +12,4 @@
ORIGIN = 'yt-dlp/yt-dlp'
_pkg_version = '2024.05.27'
_pkg_version = '2024.07.01'