Skip to content

Refactor to expose parseTranscript method to resolve CORS error from Obsidian youtube-summarizer plugin #34

@rshmhrj

Description

@rshmhrj

Hi all, I have helped make a few minor updates in an Obsidian plugin named youtube-summarizer. It has been broken for a while due to #19, and I have finally gotten a chance to look into it.

I started by updating to the latest version of youtube-transcript (from 1.0.6 to 1.2.1), and the TypeError issue went away, but it is replaced with some CORS errors, and I have no idea why.

image

Looking into the src, it seems like youtube-transcript is using fetch. Within the plugin, we are using the request module from the obsidian package and those requests go through without any CORS issues:

image

Instead of enabling proxy support like #25, I'm proposing a tiny refactor to modularize the fetching and parsing.

Instead of calling fetchTranscript(url) and get back the transcriptList like this:

const transcriptList = await YoutubeTranscript.fetchTranscript(url);
const transcript = transcriptList.map((transcript) => transcript.text).join(" ");

I'd like to do the fetch myself and call something like YoutubeTranscript.parseTranscript(transcriptText):

import { request } from "obsidian";
...
let response = await request({ url: url, method: "GET" });
const transcriptList = await YoutubeTranscript.parseTranscript(response);
const transcript = transcriptList.map((transcript) => transcript.text).join(" ");

Code-wise, YoutubeTranscript would have two public functions:

  public static async fetchTranscript( videoId: string, config?: TranscriptConfig ): Promise<TranscriptResponse[]>
  public static async parseTranscript( transcriptText: string ): Promise<TranscriptResponse[]>

fetchTranscript could be refactored/shortened to:

  public static async fetchTranscript( 
    videoId: string,
    config?: TranscriptConfig
  ): Promise<TranscriptResponse[]> {
    const identifier = this.retrieveVideoId(videoId);
    const videoPageResponse = await fetch(
      `https://www.youtube.com/watch?v=${identifier}`,
      {
        headers: {
          ...(config?.lang && { 'Accept-Language': config.lang }),
          'User-Agent': USER_AGENT,
        },
      }
    );
    const videoPageBody = await videoPageResponse.text();
    return this.parseTranscript(videoPageBody);

and parseTranscript would contain the rest of what is currently in fetchTranscript

Because the fetchTranscript signature and return type are the same, it should continue to work as is for existing users, and I think the modularity can help. Maybe not all CORS issues would be able to be resolved with this change, so a future proxy configuration option might still be needed. Idk.

I can draft a PR with these changes and test them out to at least validate that these minor changes would work for us and enable the youtube-summarizer plugin to finally start working again 😄

Please let me know your thoughts: @Kakulukian @mefengl and others

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions