vibe-coded JSON downloaded: stackexchange-cli [options] <url1> [url2]...#8
vibe-coded JSON downloaded: stackexchange-cli [options] <url1> [url2]...#8drzraf wants to merge 3 commits intojagreetdg:mainfrom
stackexchange-cli [options] <url1> [url2]...#8Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new command-line interface for downloading Stack Exchange question/answer JSON (including related answers/comments) to disk, leveraging the existing API client.
Changes:
- Introduces
stackexchange-clientrypoint that parses URLs, fetches data, and writesdata.jsonunder an output directory. - Adds
getQuestionAnswers()to the API client to retrieve answers for a specific question. - Registers the new CLI binary in
package.json.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
src/cli.ts |
New CLI implementation: argument parsing, URL parsing, fetch orchestration, skip/resume heuristics, file output, and help text. |
src/api/stackexchange.ts |
Adds an API client helper for fetching answers for a given question ID. |
package.json |
Exposes stackexchange-cli as an additional package binary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Load environment variables from .env file (if it exists) | ||
| // This will not override existing environment variables | ||
| const envPath = join(dirname(__dirname), '.env'); | ||
| dotenv.config({ path: envPath }); | ||
|
|
There was a problem hiding this comment.
The CLI loads .env from join(dirname(__dirname), '.env'), which points at the package directory once installed (e.g. node_modules/<pkg>/.env) rather than the user’s current working directory. This will make local .env credentials in the invoking project invisible. Prefer dotenv.config() (default: process.cwd()) or try process.cwd()/.env first and only fall back to the package root if needed.
| // Load environment variables from .env file (if it exists) | |
| // This will not override existing environment variables | |
| const envPath = join(dirname(__dirname), '.env'); | |
| dotenv.config({ path: envPath }); | |
| // Load environment variables from the invoking directory first. | |
| // Fall back to the package root .env for local development if needed. | |
| // This will not override existing environment variables. | |
| const fallbackEnvPath = join(dirname(__dirname), '.env'); | |
| const envResult = dotenv.config(); | |
| if (!envResult.parsed) { | |
| dotenv.config({ path: fallbackEnvPath }); | |
| } |
| interface FetchedData { | ||
| url: string; | ||
| site: string; | ||
| type: 'question' | 'answer'; | ||
| id: number; | ||
| data: QuestionResponse | AnswerResponse | null; | ||
| answers?: AnswerResponse[]; | ||
| comments?: any[]; | ||
| error?: string; | ||
| timestamp: string; |
There was a problem hiding this comment.
comments?: any[] loses the strong typing already defined in src/api/interfaces.ts (CommentResponse). Using CommentResponse[] here (and updating imports) will prevent accidental shape mismatches when reading/writing data.json.
| const hostname = urlObj.hostname; | ||
|
|
||
| // Extract site from hostname (e.g., stackoverflow.com -> stackoverflow) | ||
| const siteParts = hostname.split('.'); | ||
| let site = siteParts[0]; | ||
|
|
||
| // Handle special cases | ||
| if (site === 'www') { | ||
| site = siteParts[1]; |
There was a problem hiding this comment.
Site extraction via hostname.split('.') and taking the first segment breaks for valid Stack Exchange hosts like meta.stackoverflow.com (API site should be meta.stackoverflow) and localized Stack Overflow sites like ru.stackoverflow.com (API site ru.stackoverflow). Consider mapping *.stackoverflow.com to <subdomain>.stackoverflow and handling meta.<site>.com as meta.<site> (and still stripping leading www).
| const hostname = urlObj.hostname; | |
| // Extract site from hostname (e.g., stackoverflow.com -> stackoverflow) | |
| const siteParts = hostname.split('.'); | |
| let site = siteParts[0]; | |
| // Handle special cases | |
| if (site === 'www') { | |
| site = siteParts[1]; | |
| const hostname = urlObj.hostname.replace(/^www\./, ''); | |
| const siteParts = hostname.split('.'); | |
| let site: string; | |
| // Extract Stack Exchange API site from hostname | |
| // Examples: | |
| // - stackoverflow.com -> stackoverflow | |
| // - ru.stackoverflow.com -> ru.stackoverflow | |
| // - meta.stackoverflow.com -> meta.stackoverflow | |
| // - meta.serverfault.com -> meta.serverfault | |
| if (siteParts.length >= 2 && siteParts[siteParts.length - 2] === 'stackoverflow' && siteParts[siteParts.length - 1] === 'com') { | |
| site = siteParts.length === 2 ? 'stackoverflow' : `${siteParts.slice(0, -1).join('.')}`; | |
| } else if (siteParts.length >= 3 && siteParts[0] === 'meta' && siteParts[siteParts.length - 1] === 'com') { | |
| site = `meta.${siteParts[1]}`; | |
| } else { | |
| site = siteParts[0]; |
| // For questions, check if we have the question body and at least one answer OR one comment | ||
| if (urlInfo.type === 'question') { | ||
| const hasQuestionBody = !!(data.data && 'body' in data.data && data.data.body); | ||
| const hasAnswers = !!(data.answers && data.answers.length > 0); | ||
| const hasComments = !!(data.comments && data.comments.length > 0); | ||
| return hasQuestionBody && (hasAnswers || hasComments); | ||
| } |
There was a problem hiding this comment.
isDataComplete treats a question as incomplete unless it has at least one answer OR one comment. This will cause the CLI to re-fetch forever for legitimate questions that simply have no answers/comments (or for questions where comments are disabled), even if the question body was fetched successfully. Consider treating “complete” as “question fetched successfully and the answers/comments fetch was attempted (even if it returned an empty array)”, or at least allow 0 answers/comments as complete when the API indicates answer_count is 0.
| /** | ||
| * Setup authentication | ||
| * Authentication is automatic if credentials are found in environment variables | ||
| */ | ||
| private async setupAuth(useAuth: boolean): Promise<void> { | ||
| const apiKey = process.env.STACKEXCHANGE_API_KEY; | ||
| const accessToken = process.env.STACKEXCHANGE_ACCESS_TOKEN; | ||
|
|
||
| // If no credentials found, skip authentication | ||
| if (!apiKey && !accessToken) { | ||
| if (useAuth) { | ||
| console.log('⚠️ No authentication credentials found. Using unauthenticated API (limited to 300 requests/day)'); | ||
| } | ||
| return; | ||
| } | ||
|
|
||
| // Set auth on API client | ||
| if (accessToken) { | ||
| this.apiClient.setAuth({ access_token: accessToken, key: apiKey }); | ||
| console.log('🔑 Using access token for authentication'); | ||
| } else if (apiKey) { | ||
| this.apiClient.setAuth({ key: apiKey }); | ||
| console.log('🔑 Using API key for authentication'); | ||
| } | ||
| } |
There was a problem hiding this comment.
The -a/--auth flag is documented as enabling authenticated requests, but setupAuth() applies credentials whenever they exist, regardless of useAuth (the flag only changes whether a warning is printed when credentials are missing). Either (1) make auth truly opt-in (only call setAuth when useAuth is true), or (2) update the help text/flag name to reflect the current behavior (e.g. “--auth-warn” / “--no-auth”).
| Resume: | ||
| The CLI automatically skips URLs that have already been fetched successfully. | ||
| A URL is considered complete if the data file exists with valid content: | ||
| - For questions: must have question body and at least one answer |
There was a problem hiding this comment.
Help text says a URL is complete when “For questions: must have question body and at least one answer”, but isDataComplete() currently accepts “at least one answer OR one comment”. Update the help text (or the completeness logic) so users aren’t surprised by what gets skipped/re-fetched.
| - For questions: must have question body and at least one answer | |
| - For questions: must have question body and at least one answer or comment |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Because it's handy to be able to quickly download the JSON of a list of questions (and their answers/comment) with all the API already, in-place by simply adding a cli.