Skip to content

fallback username with getUsernameFromProfileUrl#176

Merged
sweetmantech merged 3 commits intomainfrom
sweetmantech/myc-3465-apisocialscrape-fallback-username-from-profile_url
Nov 15, 2025
Merged

fallback username with getUsernameFromProfileUrl#176
sweetmantech merged 3 commits intomainfrom
sweetmantech/myc-3465-apisocialscrape-fallback-username-from-profile_url

Conversation

@sweetmantech
Copy link
Copy Markdown
Collaborator

@sweetmantech sweetmantech commented Nov 15, 2025

Summary by CodeRabbit

  • New Features
    • Profile scraper now extracts usernames from profile URLs when a username isn't provided, enabling scraping without explicit username input.
    • Added a utility to derive and normalize usernames from profile URLs, improving handling of varied URL formats and missing or invalid inputs.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 15, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds a new exported utility that extracts a username from a profile URL and integrates it into the profile scraper; when the incoming username is falsy, the scraper derives a finalUsername from the profile URL and uses that for scraping.

Changes

Cohort / File(s) Change Summary
New utility
lib/socials/getUsernameFromProfileUrl.ts
Added exported function `getUsernameFromProfileUrl(profileUrl: string
Scraper integration
lib/apify/scrapeProfileUrl.ts
Imports getUsernameFromProfileUrl; computes finalUsername by using the provided username or deriving it from the profile URL when username is falsy; passes finalUsername to the scraper invocation instead of the raw username.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant scrapeProfileUrl
    participant getUsernameFromProfileUrl
    participant Scraper

    Caller->>scrapeProfileUrl: invoke(profileUrl, username?)
    alt username provided (truthy)
        scrapeProfileUrl->>Scraper: call(scrape with finalUsername = username)
    else username missing/falsy
        scrapeProfileUrl->>getUsernameFromProfileUrl: call(profileUrl)
        getUsernameFromProfileUrl-->>scrapeProfileUrl: return extractedUsername or ""
        scrapeProfileUrl->>Scraper: call(scrape with finalUsername = extractedUsername)
    end
    Scraper-->>Caller: return result
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Review the regex in lib/socials/getUsernameFromProfileUrl.ts for edge cases (subdomains, query strings, trailing slashes).
  • Verify scrapeProfileUrl behavior when finalUsername is an empty string and error/logging paths.

Possibly related PRs

Poem

🐰 I hop through links and snatch a name,
From .com/ paths I play my game.
When usernames hide, I give a clue,
A tiny regex — voilà, it's true!
Thump-thump, I nibble bugs anew.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding a fallback mechanism to derive username from profile URL using the new getUsernameFromProfileUrl utility function.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bdd703f and ff4111d.

📒 Files selected for processing (1)
  • lib/socials/getUsernameFromProfileUrl.ts (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
lib/socials/getUsernameFromProfileUrl.ts (1)

15-15: Consider supporting additional TLDs for extensibility.

The regex is limited to .com and .net domains. While this covers all currently supported platforms, consider making it more flexible for future extensibility:

-const match = normalizedUrl.match(/(?:\.com|\.net)\/([^/?]+)/);
+const match = normalizedUrl.match(/\/\/[^/]+\/([^/?]+)/);

This pattern would match any domain by capturing the first path segment after the hostname.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between edd6404 and 4e5e5a5.

📒 Files selected for processing (2)
  • lib/apify/scrapeProfileUrl.ts (2 hunks)
  • lib/socials/getUsernameFromProfileUrl.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
lib/apify/scrapeProfileUrl.ts (1)
lib/socials/getUsernameFromProfileUrl.ts (1)
  • getUsernameFromProfileUrl (6-21)
🔇 Additional comments (2)
lib/socials/getUsernameFromProfileUrl.ts (1)

1-11: LGTM: Clean function signature and early return.

The documentation clearly describes the extraction logic, and the function signature appropriately handles nullable input with an early return for falsy values.

lib/apify/scrapeProfileUrl.ts (1)

7-7: LGTM: Import statement is correct.

The import path and function name are appropriate.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
lib/socials/getUsernameFromProfileUrl.ts (1)

15-15: Critical: Regex still fails for multi-segment YouTube URLs.

As flagged in the previous review, the regex pattern /(?:\.com|\.net)\/([^/?]+)/ only captures the first path segment and will produce incorrect results for YouTube URLs with multi-segment paths like youtube.com/c/channelname (extracts "c"), youtube.com/user/username (extracts "user"), and youtube.com/channel/channelid (extracts "channel").

This issue remains unresolved and must be fixed before merging.

Apply the previously suggested fix:

-    const match = normalizedUrl.match(/(?:\.com|\.net)\/([^/?]+)/);
+    const match = normalizedUrl.match(/(?:\.com|\.net)\/(?:@|c\/|user\/|channel\/)?([^/?]+)/);
🧹 Nitpick comments (1)
lib/socials/getUsernameFromProfileUrl.ts (1)

13-16: Consider broader TLD support.

The regex currently only matches .com and .net domains. If you anticipate supporting profile URLs from platforms with other TLDs (e.g., .org, .io, .co, or country-specific TLDs), consider expanding the pattern.

For example:

// Support common TLDs
const match = normalizedUrl.match(/\.(?:com|net|org|io|co)\/(?:@|c\/|user\/|channel\/)?([^/?]+)/);

Or use a more generic pattern that matches any TLD:

// Match any TLD
const match = normalizedUrl.match(/\.[a-z]{2,}\/(?:@|c\/|user\/|channel\/)?([^/?]+)/);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e5e5a5 and bdd703f.

📒 Files selected for processing (2)
  • lib/apify/scrapeProfileUrl.ts (2 hunks)
  • lib/socials/getUsernameFromProfileUrl.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/apify/scrapeProfileUrl.ts

@sweetmantech sweetmantech merged commit 0f380cd into main Nov 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant