-
Notifications
You must be signed in to change notification settings - Fork 17
YouTube Music provider #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ec8cbc6
to
a50561b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for tackling this! Before I get into any implementation details, I wanted to share a few general thoughts.
There were two things I was worried about regarding the YouTube.js library:
- Code bloat:
- I ran
deno info lookup.ts | head
and temporarily commented out theYouTubeMusicProvider
import inproviders/mod.ts
to get a feeling for the impact. - without YouTube Music provider: 2.12MB, 225 unique dependencies
- with YouTube Music provider: 4.13MB, 982 unique dependencies
- This looks like a lot, but if I include the whole web app with
deno info server/main.ts | head
, it shouldn't matter too much: 14.92MB, 1398 unique dependencies (YouTube included). - I've also tried to replicate the relevant requests by hand with
fetch
and failed to make them work, the API expects a couple of headers and a ton of cruft inside the POST body to even consider the request. So we should really use the library and spare us the extra hassle.
- I ran
- Permalinks and caching won't work: Lookups for all other providers are cached and used for 24 hours, older snapshots can be accessed using permalinks which contain a timestamp.
- We can't use the usual provider caching mechanism (
SnapStorage
) because we don't have control over individual requests when we use the library. - While it is possible to pass a custom
fetch
function to Innertube, we can't proxy the requests tofetchSnapshot
/fetchJSON
(which use Harmony's cache) because we would have to pass the timestamp well. - Additionally, as opposed to any proper public API, the URL is always
https://www.youtube.com/youtubei/v1/browse?prettyPrint=false&alt=json
for every lookup, because the relevant lookup parameters are only contained in the body of the POST request. This is not currently supported by Harmony's cache which uses the URL as key. - Even if the cache would support distinguishing POST requests by body data and if we hacked Innertube to pass the timestamp along, the API response contains a lot of unnecessary clutter (e.g. describing the player UI), so I wouldn't want to cache the whole thing anyway.
- I guess our best way forward is to omit YouTube caching for now or to implement separate caching logic which caches the results of the Innertube calls. If you want to rather focus on the lookup implementation, I can help you with this task.
- We can't use the usual provider caching mechanism (
P.S. I know that there are still some open questions and I guess I also have more feedback, but this should be enough to get you started (and it's getting too hot to think here 😅).
P.P.S. One last thing: Please follow https://www.conventionalcommits.org/en/v1.0.0/ (see the existing commits, the scope would be "YouTube" for all follow-ups). Eventually I will use this to generate changelogs for the website 😇
As a hobby YTM (=YouTube Music) internals researcher, I wouldn't recommend you to automatically scrapes YTM, because...
|
a50561b
to
ebcf9e4
Compare
First of all, thanks a lot for the review @kellnerd! @rinsuki, thanks for your thoughts! I will definitely try to improve my code to make as few assumptions as possible, as some of the ones it's currently making are clearly wrong (e.g. the "free streaming" thing). I get that these things are really hard to be sure about (e.g. to decide whether a track is paid or unavailable), but in cases where I can be sure about the data, I feel like it is still more valuable to return some correct data than none at all. Something I am pretty unsure about is the region problem, as I don't think there is a good way to work around this besides using proxies (which would probably cost money to use, and are probably already blocked by YouTube anyways). Innertube does provide a location parameter when constructing the instance, with the description "Geolocation setting", but I doubt that that actually does anything useful (haven't tested it yet). And the barcode thing also seems really messy.. haven't encountered it myself yet but I trust you that it's a thing. This might definitely mean that a YouTube Music provider is a lot less feasible than I thought, but I'd still love to experiment with it for a while before giving up on this^^ |
There are some examples (NOTE, all of them are checked in Japan IP address (where I live), it might behave differently in another regions):
|
6f48f52
to
670eb49
Compare
YouTube Music sometimes returns incorrect, alternate versions of a release when looking up a GTIN. This warns if a release with multiple versions is returned, as there is no way of knowing if YouTube returned the correct one.
670eb49
to
f84c352
Compare
Okay, a small (or not so small) update: The code is (in my opinion) basically feature complete now. I ended up rewriting this whole thing to not depend on YouTube.js after all, which turned out to be a lot easier than expected (even if the resulting code looks a bit less clean right now). This code should be fully compatible with the caching and permalink system now (I was able to work around the URL-based cache by simply appending a unique hash to the URLs which gets ignored by YouTube anyways), and all API requests now happen through Regarding the following concern:
I am currently returning (and thus caching) the whole API response from Besides that, I'm generally happy with the code now (though it could maybe do with a bit more clean up). Sadly there are still some pain points I haven't been able to work around (and likely won't be able to):
|
hl: 'en', | ||
gl: 'US', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might be want to decide those values based of region parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the gl
parameter based on the region would probably be a good idea (though it doesn't seem to affect YouTube's behaviour), but I'm not sure how to select one region to use if multiple regions are provided.
As far as I can tell, other providers solve this by using queryAllRegions
to query each region separately, but this doesn't make sense in this case since the region parameter doesn't affect anything anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know harmony can set multiple regions...
by the way, my main focus is hl parameter, since YTM (sometimes) have a localized tracklists and will be changed by user's language settings. i didn't check it is depending to this parameters, but I think it will.
(sidenote: but I also think change language parameter by region parameter is good or not? i don't know. Apple Music behave like that, but actually some Apple Music region supports multiple language in one regions and shows based of your device's language settings. e.g. JP region have a japanese and english tracklist)
e.g. https://music.youtube.com/playlist?list=OLAK5uy_nXbgxu51sJwZQz69UwVWVvu-j22_IYvA8 this album have a Japanese tracklist and English tracklist, and if your language settings aren't japanese, it should show english tracklist
I think different tracklist for en-US and en-GB are possibly exists, but I don't know about it actually exists or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup okay, setting hl
to ja
seems to work in order to get localized track lists.
I could try to set this dynamically, but I'm still not sure which language to use when.
I think I've read that the MusicBrainz policy is that release and recording titles should always be submitted in their original localisation (so, japanese localisation for Japanese releases), but that would require knowing what the "original" localisation is.
The problem with this is that for example, I've noticed that some releases and tracks tend to have Japanese localisations even if they aren't originally Japanese (like this album by the German band Rammstein)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the user specifies multiple preferred regions, they will only be used by providers which support the queryAllRegions
fallback. But if YouTube doesn't "fail" to return data for a certain unavailable region (in some way), this concept doesn't make sense and the first "supported" region of the preferred regions should be used.
So unless there is a meaningful list of supported regions for YouTube, I would simply use the first region.
Regarding the language / hl
parameter, I wouldn't try to derive this from the region as these aren't always one-to-one mappings. If we arrive at the conclusion that the language is a useful parameter (for YouTube and potentially other providers), Harmony should allow the user the set a preferred language as well. We can even extract the language from the release URL for some providers (namely Deezer, Spotify, Beatport, maybe Qobuz in the future), which would override the preference input just like a region from a release URL does (for Apple).
providers/YouTubeMusic/constants.ts
Outdated
browserName: 'Edge Chromium', | ||
browserVersion: '109.0.1518.61', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(note: if you replaced user agent, you might be want to also replace those values, or remove it since server don't care in many cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like I'm actually able to remove all body values besides clientName
and clientVersion
and everything still works as expected.
I'll probably do this, as this seems a lot cleaner (the current body was mostly just copied from YouTube.js requests), but I'm a bit afraid that YouTube might get.. suspicious, if the requests don't contain these values?
I have no idea what actually gets detected and logged by YouTube, but I'm afraid that this might cause YouTube to flag and block harmony quicker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we wouldn't able to know about they cares or not :P
well yeah, if you want to reduce chance of getting banned from YT, I think you need to put realistic values, or even scrape HTML (YT or YTM web server is calling API when rendering HTML, and it is included in the returned HTML, for takeover to client JS. at this case you still need to call API manually for some cases, e.g. paging or album search)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the copied code, I think you should add a copyright line to all affected files, probably constants.ts
and api_types.ts
? I am not a lawyer, but I think something like the following should be sufficient for MIT licensed code:
// Adapted from / Taken from / Inspired by https://github.com/LuanRT/YouTube.js/tree/v14.0.0
// Copyright (c) 2021 LuanRT. MIT license.
Not sure about the wording of the first line, you should know better what is the best description.
Okay, after a bit more testing, there are two other things which would probably improve the quality of the data. For one, some tracks provide a more comprehensive overview of the artist credits with the "View song credits" button in their respective kebab menu (the 3 vertical dots in the song list of the web UI). Another issue I encountered is that sometimes tracks and releases display a title with featuring artists in the name, even if YouTube has the correct title without the featuring artist. |
Great work @feathecutie! Looking good at first glance and the tests are passing as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that it took me so long to get back to review this PR, I had significantly less time available than expected this summer.
I agree with your summarized TODO items in the updated PR description and only have a few remarks to add (from my weeks-old notes).
Ideally we should avoid fetching the HTML player just to extract the corresponding browse ID for a playlist.
I still have to compare the two data representations side by side and in depth to see if there is any way to work with just one of them even. The annoyingly UI-centric format of the representation certainly doesn't help with that, that's one of the reasons why I've been putting this off for so long...
I don't know about your available time and plans for this PR, but I would like to see it merged eventually, even if it is not fully ready to be released, since it is a lot of code to review already.
We can always merge an early version of the provider and temporarily disable it by removing the changes to the provider registry for now.
const creditsEndpoint = item | ||
.musicResponsiveListItemRenderer | ||
.menu | ||
.menuRenderer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we end up removing this code, we have to ensure that the menu exists:
.menuRenderer | |
?.menuRenderer |
It is missing for unavailable tracks and currently breaks the conversion of such a release.
If you go to artist's YouTube channel (whether it's their "Official Artist Channel" or a Topic channel) on main YouTube website and pick a release from there, it would still replace Art Tracks with corresponding music videos. Sometimes it can get worse, like in this single that has both original and "edit" version of a song but they both were replaced by the same music video. Not sure if it's anyhow different for Premium users, since I'm not one myself. |
(Mostly) fixes #31
TODO
www.youtube.com
playlist datamusic.youtube.com
playlist browse request, but not correct track namescontext.client.hl
body parameter)Original message
This PR provides an initial draft of a YouTube Music provider for harmony. I intend to continue working on this, but I wanted to try to get some eyes on this already, both for potentially some feedback and help with things I'm unsure about, and also as a general motivator for me to keep working on this.
What works well (enough) already:
Things I'm not yet happy with / still unsure about:
One could in theory try to reimplement the parts of YouTube.js we are using, but the internal API that YouTube.js is using is hardly documented and most likely very sensitive to slightly incorrect requests that YouTube.js already correctly handles, not to mention that keeping up with API changes seems unnecessarily hard.
YoutubeMusic/mod.ts
) currently has to use some weird workarounds, but I'm still trying to find a better and more direct way to get the data we need here, and for now, while less elegant it still worksThis is also still missing tests, I'll try to add some soon
..also, I really hope this paragraph doesn't sound too LLM-generated, this definitely came out a lot more "formal" than I planned^^