Skip to content

Conversation

@brendan-kellam
Copy link
Contributor

@brendan-kellam brendan-kellam commented Nov 20, 2025

This PR adds support for streaming in code search results. Here's an example:

Screen.Recording.2025-11-21.at.2.13.23.PM.mov

Additionally, this PR changes the default search behaviour to use substrings instead of regular expressions.

image image

Couple of details on how this was implemented:

  • zoekt exposes a gRPC endpoint for search, both as a unary blocking call and as a stream. This change changes the existing /search endpoint to use the unary gRPC endpoint, and introduces a new /stream_search endpoint that uses the streaming gRPC endpoint.
  • The /stream_search endpoint progressively streams results to the client as server side events (SSE) as it receives events from zoekt.
  • Unlike the zoekt REST endpoint, the gRPC endpoints do not accept a string for query.
  • Instead they accept a tree data structure for queries (akin to a AST). This is defined in query.proto. For the purposes of Sourcebot, I'm calling this data structure as the "Query Intermediate Representation" (QueryIR).
  • In order to convert a query into this intermediate representation, I've added @sourcebot/query-language, a language parser using the Lezer parser generator. The query language's grammar is defined in query.grammar.
  • The parser takes a query and generates a syntax tree, that we can then walk and turn into a QueryIR object.

@coderabbitai
Copy link

coderabbitai bot commented Nov 20, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces a Lezer-based query language parser, converts the search API from string-based queries to intermediate representation (IR), implements streaming search via SSE, adds Zoekt gRPC proto definitions, updates search UI for regex and case-sensitivity flags, and refactors Prisma permission filtering to account-based scoping.

Changes

Cohort / File(s) Summary
Query Language Package
packages/queryLanguage/*
New @sourcebot/query-language workspace package with Lezer grammar (query.grammar), tokenizer (tokens.ts), generated parser (parser.ts, parser.terms.ts), index exports, and comprehensive test suites covering basic, grouping, negation, operators, precedence, prefixes, and quoted expressions.
Search API & IR Foundation
packages/web/src/features/search/index.ts, packages/web/src/features/search/types.ts, packages/web/src/features/search/ir.ts, packages/web/src/features/search/parser.ts, packages/web/src/features/search/searchApi.ts
New/refactored search feature exports, unified type definitions (SearchRequest, SearchResponse, SearchStats, etc.), QueryIR visitor/traversal utilities, parseQuerySyntaxIntoIR for converting Lezer trees to IR, and refactored search/streamSearch functions using IR and permission filters.
Zoekt gRPC Integration
packages/web/src/features/search/zoektSearcher.ts, packages/web/src/proto/zoekt/webserver/v1/*, packages/web/src/proto/webserver.ts, packages/web/src/proto/google/protobuf/*
New zoektSearcher with createZoektSearchRequest, zoektSearch, zoektStreamSearch functions, and comprehensive proto type definitions for Zoekt gRPC (And, Boost, Branch, FileMatch, Q, SearchResponse, WebserverService, etc.) and Google Protobuf types.
Search UI Components
packages/web/src/app/[domain]/components/searchBar/*, packages/web/src/app/[domain]/browse/layout.tsx
Updated SearchBar to accept defaults object with query, isRegexEnabled, isCaseSensitivityEnabled; added case-sensitivity and regex toggles; refactored suggestion mode from "case"/"public" to "visibility"; updated search prefix constants and logic.
Streaming Search & Server Routes
packages/web/src/app/[domain]/search/useStreamedSearch.ts, packages/web/src/app/api/(server)/stream_search/route.ts, packages/web/src/app/[domain]/search/page.tsx, packages/web/src/app/[domain]/search/components/searchResultsPage.tsx
New useStreamedSearch hook for SSE-based search with caching and cancellation; POST /stream_search route handler; updated SearchResultsPage to use streaming data (isStreaming, files, stats, timeToFirstSearchResultMs, timeToSearchCompletionMs); added isRegexEnabled/isCaseSensitivityEnabled to search params.
Search Results UI
packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx, packages/web/src/app/[domain]/search/components/filterPanel/*
Converted SearchResultsPanel to forwardRef with resetScroll handle; refactored "show all matches" state to per-file-match map; added isStreaming and onFilterChange to FilterPanel; updated skeleton rendering during stream.
Code Navigation
packages/web/src/features/codeNav/api.ts, packages/web/src/features/codeNav/types.ts
Refactored findSearchBasedSymbolReferences/findSearchBasedSymbolDefinitions to use IR queries (queryType: 'ir') instead of strings; updated language filter builder to return QueryIR; adjusted parse logic and file structure mapping.
File Source API & Chat Tools
packages/web/src/features/search/fileSourceApi.ts, packages/web/src/features/chat/tools.ts, packages/web/src/app/api/(client)/client.ts
Updated fileSourceApi to construct IR queries; refactored search tool in chat/tools.ts to use new searchOptions structure; adjusted client API imports.
Database & Auth
packages/db/src/index.ts, packages/web/src/prisma.ts
Added UserWithAccounts type alias (User & { accounts: Account[] }); refactored userScopedPrismaClientExtension to accept user parameter and use new getRepoPermissionFilterForUser helper for account-based permission filtering.
Proto & Type Utilities
packages/web/src/lib/utils.ts, packages/web/src/lib/errorCodes.ts, packages/web/src/lib/types.ts, packages/web/src/lib/posthogEvents.ts
Added getCodeHostBrowseFileAtBranchUrl URL builder; added FAILED_TO_PARSE_QUERY error code; added SearchQueryParams enum members (isRegexEnabled, isCaseSensitivityEnabled); updated search_finished PosthogEventMap with new timing fields and flushReason type.
MCP Package
packages/mcp/src/index.ts, packages/mcp/src/schemas.ts, packages/mcp/CHANGELOG.md
Updated search call to use isRegexEnabled and isCaseSensitivityEnabled flags instead of case: query suffix; refactored searchRequestSchema with searchOptionsSchema composition; updated changelog.
Web Package & Build
package.json, packages/web/package.json, Dockerfile
Added @sourcebot/query-language workspace and build dependencies; added gRPC dependencies (@grpc/grpc-js, @grpc/proto-loader); added generate:protos script; updated Docker build stages for queryLanguage; removed SRC_TENANT_ENFORCEMENT_MODE=strict; added @lezer/common resolution pin.
Infrastructure & Config
.env.development, CHANGELOG.md, packages/queryLanguage/tsconfig.json, packages/queryLanguage/vitest.config.ts, packages/queryLanguage/.gitignore, packages/web/.eslintignore, packages/web/src/features/search/README.md
Removed strict tenant enforcement mode; added MCP changelog entry; added TypeScript and Vitest configs for queryLanguage; added ESLint ignores for proto output; added search feature README documentation.
Removed/Deleted Files
packages/web/src/features/search/schemas.ts, packages/web/src/features/search/zoektClient.ts, packages/web/src/features/search/zoektSchema.ts
Removed centralized Zod schemas file, zoektClient fetch wrapper, and Zoekt-specific schema definitions (moved to types.ts and new IR-based approach).
Type/Import Path Updates
packages/web/src/app/[domain]/components/lightweightCodeHighlighter.tsx, packages/web/src/app/[domain]/search/components/codePreviewPanel/*, packages/web/src/app/[domain]/search/components/searchResultsPanel/*, packages/web/src/features/codeNav/types.ts, packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts, packages/web/src/lib/extensions/searchResultHighlightExtension.ts, packages/web/src/ee/features/codeNav/components/*
Consolidated import paths for search types (SourceRange, SearchResultFile, RepositoryInfo, etc.) from @/features/search/types to @/features/search; updated fileSourceRequestSchema imports to types subpath.
Backend
packages/backend/src/index.ts
Removed debug log line for repeated shutdown signals.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant SearchPage as SearchPage
    participant useStreamedSearch as useStreamedSearch
    participant SearchAPI as /api/stream_search
    participant zoektSearcher as zoektSearcher
    participant ZoektGRPC as Zoekt gRPC

    Client->>SearchPage: Enter query + flags
    SearchPage->>useStreamedSearch: query, isRegexEnabled, isCaseSensitivityEnabled
    
    alt Cache Hit
        useStreamedSearch-->>SearchPage: Return cached results
    else Cache Miss
        useStreamedSearch->>SearchAPI: POST (SSE)
        SearchAPI->>SearchAPI: Parse query → IR (parseQuerySyntaxIntoIR)
        SearchAPI->>zoektSearcher: createZoektSearchRequest(IR)
        zoektSearcher->>ZoektGRPC: StreamSearch RPC
        
        ZoektGRPC-->>zoektSearcher: Stream chunk
        zoektSearcher->>zoektSearcher: Transform chunk + enrich repos
        zoektSearcher-->>SearchAPI: data: {files, stats, ...}
        SearchAPI-->>useStreamedSearch: SSE chunk
        useStreamedSearch->>useStreamedSearch: Update state
        useStreamedSearch-->>SearchPage: Partial results
        
        ZoektGRPC-->>zoektSearcher: Final/Error
        zoektSearcher-->>SearchAPI: Final/Error response
        SearchAPI-->>useStreamedSearch: SSE final
        useStreamedSearch->>useStreamedSearch: Cache results
        useStreamedSearch-->>SearchPage: Final results
    end
    
    SearchPage->>SearchPage: Render results
Loading
sequenceDiagram
    participant Input as Query Input
    participant Lezer as Lezer Parser
    participant transformTreeToIR as transformTreeToIR
    participant expandContext as expandSearchContext
    participant IROutput as QueryIR Output

    Input->>Lezer: "file:test.ts content:bug -archived:yes"
    Lezer-->>transformTreeToIR: Parse tree (PrefixExpr, Term, NegateExpr, etc.)
    
    transformTreeToIR->>transformTreeToIR: Walk tree recursively
    
    alt PrefixExpr
        transformTreeToIR->>transformTreeToIR: Extract prefix type (FileExpr, ContentExpr, etc.)
        transformTreeToIR->>transformTreeToIR: Build IR node (regexp, substring, raw_config, etc.)
    else Term
        transformTreeToIR->>transformTreeToIR: Check isRegexEnabled / isCaseSensitivityEnabled
        transformTreeToIR->>transformTreeToIR: Create regexp or substring query
    else ContextExpr
        transformTreeToIR->>expandContext: Resolve context name
        expandContext-->>transformTreeToIR: Repository IDs
        transformTreeToIR->>transformTreeToIR: Build repo_set query
    end
    
    transformTreeToIR-->>IROutput: QueryIR (And / Or / Not / PrefixExpr nodes)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • Query language grammar and parser (packages/queryLanguage/src/query.grammar, packages/queryLanguage/src/parser.ts, packages/queryLanguage/src/tokens.ts): Complex Lezer-based grammar with tokenization logic; verify precedence, operator handling, and edge cases in negation/prefix parsing.
  • parseQuerySyntaxIntoIR implementation (packages/web/src/features/search/parser.ts): Dense transformation logic with context expansion, case/regex sensitivity handling, and error path complexity; review all prefix transformations and error codes.
  • Zoekt gRPC integration (packages/web/src/features/search/zoektSearcher.ts): Streaming error handling, chunk accumulation, repository enrichment, and gRPC client lifecycle; verify SSE encoding and chunk/final/error response ordering.
  • Streaming search hook (packages/web/src/app/[domain]/search/useStreamedSearch.ts): SSE parsing, cache TTL validation, AbortController cancellation, and error handling paths; review cache key construction and error wrapping.
  • SearchResultsPanel refactoring (packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx): Conversion to forwardRef, per-file-match state map persistence, and debounce logic changes; verify resetScroll behavior and history restoration.
  • Prisma permission filtering (packages/web/src/prisma.ts): Shift from account-ID arrays to account-aware user scoping; verify permission edge cases and fallback to public repos logic.
  • Proto definitions (packages/web/src/proto/zoekt/webserver/v1/*, packages/web/src/proto/webserver.ts): Large number of generated type definitions; spot-check a few for correctness of field types and nested structures.

Possibly related PRs

Suggested labels

sourcebot-team

Suggested reviewers

  • msukkari

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(web): Streamed code search' clearly describes the main feature being introduced - streaming functionality for code search in the web package.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@brendan-kellam brendan-kellam force-pushed the bkellam/streamed_search branch from c137be2 to aad3507 Compare November 20, 2025 06:15
@brendan-kellam brendan-kellam changed the title [wip] Streamed search feat(web): Streamed code search Nov 21, 2025
@brendan-kellam brendan-kellam marked this pull request as ready for review November 21, 2025 02:45
@brendan-kellam
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Nov 21, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (28)
packages/web/src/app/[domain]/components/pathHeader.tsx (1)

236-236: LGTM! Clean branch name display.

The regex transformation correctly strips Git's symbolic reference prefixes (refs/heads/ or refs/tags/) to display user-friendly branch names. The implementation is safe and follows standard Git UI conventions.

Recommend extracting to a utility function—this pattern is already duplicated.

Verification found the same transformation in packages/web/src/app/[domain]/repos/components/repoBranchesTable.tsx (line 39). Consider extracting to a shared utility to reduce duplication:

// In lib/utils.ts or similar
export const formatBranchDisplayName = (branchName: string): string => {
  return branchName.replace(/^refs\/(heads|tags)\//, '');
};

Update both locations:

  • packages/web/src/app/[domain]/components/pathHeader.tsx (line 236)
  • packages/web/src/app/[domain]/repos/components/repoBranchesTable.tsx (line 39)
packages/mcp/src/schemas.ts (1)

32-35: Consider validating the query field.

The query field accepts any string including empty strings. Depending on how queries are processed, you may want to ensure non-empty queries or add appropriate validation.

If empty queries should be rejected, apply this diff:

 export const searchRequestSchema = z.object({
-    query: z.string(),                                // The zoekt query to execute.
+    query: z.string().min(1),                         // The zoekt query to execute.
     ...searchOptionsSchema.shape,
 });
packages/queryLanguage/tsconfig.json (1)

21-21: Consider broadening the include pattern.

The current configuration only includes src/index.ts, which means TypeScript will only compile files transitively imported from the index. Generated parser files in src/parser/ may not be fully type-checked if they contain exports that aren't used.

Consider changing to:

-    "include": ["src/index.ts"],
+    "include": ["src/**/*.ts"],

This ensures all TypeScript files, including generated parser code, are properly type-checked during compilation.

packages/queryLanguage/src/query.grammar (1)

23-30: Negation currently cannot target bare terms (only prefixes/paren groups)

NegateExpr only accepts negate before PrefixExpr or ParenExpr, so something like -foo won’t ever become a NegateExpr node; it’ll only work if you always express exclusions as -file:..., -repo:..., or -(...). If the intended UX is that -foo excludes results matching a plain term, consider broadening this rule:

-NegateExpr { !negate negate (PrefixExpr | ParenExpr) }
+NegateExpr { !negate negate expr }

This would allow negation of any expr (including bare Term) while still supporting prefixes and grouped subexpressions.

packages/queryLanguage/test/prefixes.txt (1)

1-335: Prefix test coverage is strong; consider a few more operator combinations

This suite hits a wide range of prefix/value shapes (short forms, wildcards, regexy patterns, invalid values), which is great. If you expect users to mix prefixes heavily with or and negation, it could be worth adding a couple of cases like:

  • file:test.js or repo:myproject
  • -file:test.js lang:typescript
  • (file:*.ts or file:*.tsx) lang:typescript

to lock in parse shapes for those combinations.

packages/queryLanguage/test/basic.txt (1)

1-71: Solid basic coverage; a couple of extra lexical edge cases might help

These cases do a nice job exercising plain terms, adjacency, and regex-like patterns. If you want to harden the lexer further, you could optionally add inputs like:

  • order or or1 (ensure only standalone or becomes the operator)
  • -foo bar (whatever behavior you intend around leading dashes)
  • hello_or_world

to document how the grammar treats borderline operator/word collisions.

packages/queryLanguage/test.ts (1)

1-46: Turn this into an automated test and use the public parser entrypoint

The pretty‑print/reconstruction logic is fine for a quick sanity check, but as a committed file it’s more useful if:

  • It’s wired into your actual test runner (e.g., moved under test/ with an assertion instead of console logging).
  • It imports the parser from the package’s public surface (e.g., ./src/index or the published module name) rather than the generated ./src/parser, so it exercises what consumers will use.

Doing that would turn this from a one‑off debug script into a stable regression test for the grammar.

packages/queryLanguage/src/tokens.ts (1)

27-58: Remove duplicate comment.

Lines 28 and 32 contain identical comments. Remove the duplicate to improve code clarity.

Apply this diff:

 // Check if followed by a prefix keyword (by checking for keyword followed by colon)
-// Look ahead until we hit a delimiter or colon
 const checkPos = input.pos;
 let foundColon = false;
 
 // Look ahead until we hit a delimiter or colon
 while (ch >= 0) {

Consider clarifying EOF handling.

The lookahead loop condition ch >= 0 (Line 33) correctly handles EOF (which returns a negative value in Lezer), but this behavior isn't documented. Consider adding a comment to make the EOF handling explicit.

Example:

 // Look ahead until we hit a delimiter or colon
-while (ch >= 0) {
+while (ch >= 0) { // ch < 0 indicates EOF
packages/web/src/features/search/zoektSearcher.ts (6)

171-318: Streaming lifecycle mostly robust; consider explicit guards around controller usage

The SSE streaming implementation correctly:

  • Tracks in-flight chunks (pendingChunks) and defers closing until both isStreamActive is false and pendingChunks === 0.
  • Pauses/resumes the gRPC stream to avoid completing before async processing is done.
  • Converts processing errors into typed error chunks rather than tearing down the stream abruptly.

Two small robustness considerations:

  1. Post-error gRPC stream lifetime
    In the per-chunk catch, isStreamActive is set to false and an error chunk is enqueued, but the underlying gRPC stream is neither cancelled nor closed explicitly. You do close the client via tryCloseController, so this is likely fine, but explicitly cancelling the stream can free resources sooner:

    } catch (error) {
      // ...
      isStreamActive = false;
  • grpcStream?.cancel();
    }

2. **Cancel callback and finalization**  
The `cancel()` handler sets `isStreamActive = false` and calls `grpcStream.cancel()`/`client.close()`, but doesn’t invoke `tryCloseController`. That’s acceptable (consumer has cancelled), but if you want symmetric behavior (e.g., always emit a final “done” marker on server-side aborts), consider calling `tryCloseController()` once in-flight chunks drain.

Both are optional refinements; the current implementation should behave correctly for the common happy-path and error-path cases.

---

`329-368`: **Per-chunk repo lookup and caching are solid; consider minor ergonomics**

The repo resolution logic per chunk behaves well:

- Per-request cache (`_reposMapCache`) avoids redundant DB hits across chunks.
- Per-chunk `reposMap` limits lookups to just the repositories actually present in that response.
- Numeric vs string ids are handled via `getRepoIdForFile`.

Two optional improvements:

- You could avoid the extra `reposMap` Map and just reuse the shared cache for lookups, since you already filter by presence before transforming files, but the current separation is clear and not problematic.
- If Prisma throws (e.g., transient DB issues), that error will bubble into `zoektStreamSearch`’s per-chunk `catch`, which is good; combined with the fix suggested for `zoektSearch`, error propagation is consistent across both search modes.

No required changes here.

---

`370-446`: **Strict “repo not found” behavior can hard-fail searches**

In `transformZoektSearchResponse`, if a file’s repository can’t be found in the DB/cache, you throw:

```ts
if (!repo) {
 throw new Error(`Repository not found for file: ${file.file_name}`);
}

Given out-of-sync states between Zoekt indices and your Repo table are possible (e.g., repo deleted in DB but still indexed, temporary ingestion lag), this will hard-fail the whole search.

Consider a more forgiving behavior, e.g.:

  • Skip such files and continue, or
  • Emit them with a sentinel repo name/id and surface a warning elsewhere.

Example adjustment:

-        if (!repo) {
-            throw new Error(`Repository not found for file: ${file.file_name}`);
-        }
+        if (!repo) {
+            // Skip files whose repositories are missing in the DB to avoid failing the entire search.
+            return undefined;
+        }
...
-    }).filter(file => file !== undefined);
+    }).filter((file): file is SearchResultFile => file !== undefined);

This keeps the search resilient to transient data inconsistencies.


448-480: Double-check units and semantics for duration/wait and flushReason

The stats mapping looks coherent, but relies on external Zoekt semantics:

  • duration, wait, matchTree* fields are populated from the .nanos part only. If Zoekt ever returns non-zero seconds, these will under-report timings.
  • flushReason is set via response.stats?.flush_reason?.toString() and then used as a string; this matches the z.string() type, but you might want to standardize on the enum values directly (no .toString()) for easier downstream handling.

Both are minor and can be left as-is, but worth verifying against Zoekt’s current SearchStats definition and the behavior of your gRPC/proto loader.


544-572: Stats accumulation across stream chunks may overcount depending on Zoekt semantics

accumulateStats currently adds all numeric fields (including totalMatchCount, fileCount, duration fields, etc.) across chunks, and keeps the first non-unknown flushReason.

This is correct only if Zoekt’s streamed SearchStats are per-chunk deltas. If Zoekt instead sends cumulative stats (“so far”) for each flush, summing will overcount and could:

  • Inflate totalMatchCount, fileCount, etc.
  • Distort isSearchExhaustive in the streaming final response, which compares accumulatedStats.totalMatchCount to accumulatedStats.actualMatchCount.

If Zoekt’s stats are cumulative, the accumulator should instead take the last stats object (or max per field) rather than summing.

Please confirm from the Zoekt docs or implementation whether SearchStats in streamed responses are deltas or cumulative, and adjust accumulateStats accordingly.


513-541: gRPC client construction: URL parsing and path robustness

Overall the client setup is clear and matches the generated types. Two small robustness points:

  1. Port extraction from ZOEKT_WEBSERVER_URL

    If the URL omits an explicit port, zoektUrl.port will be '', leading to an address like hostname:. You might want a fallback:

  • const grpcAddress = ${zoektUrl.hostname}:${zoektUrl.port};
  • const port = zoektUrl.port || (zoektUrl.protocol === 'https:' ? '443' : '80');
  • const grpcAddress = ${zoektUrl.hostname}:${port};

2. **Proto path based on `process.cwd()`**

Using `process.cwd()` with `'../../vendor/zoekt/grpc/protos'` assumes a specific working-directory layout at runtime. If this ever runs from a different cwd (e.g., Next.js production server root), proto loading will fail. Consider anchoring paths via a config/env var, or document the assumption clearly.

Both are defensive improvements; the current code is fine if your deployment environment guarantees cwd and URL shape.

</blockquote></details>
<details>
<summary>packages/web/src/features/search/types.ts (1)</summary><blockquote>

`33-56`: **SearchStats shape matches Zoekt stats mapping**

The `searchStatsSchema` fields and comments mirror the Zoekt `SearchStats` fields you populate in `transformZoektSearchResponse`. That alignment is good.

Optional improvement: if you want stronger validation, you could constrain these to non-negative integers, e.g.:

```ts
actualMatchCount: z.number().int().nonnegative(),

for the count fields and similar for the byte counters, while keeping simple z.number() for timing values if they might be fractional in the future.

packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx (1)

72-78: Cosmetic dependency array reordering.

The dependency array for branchDisplayName was reordered from [isBranchFilteringEnabled, branches] to [branches, isBranchFilteringEnabled]. While this doesn't affect React's memoization behavior, it may be for consistency with a linting rule or code style.

packages/web/src/features/search/fileSourceApi.ts (1)

8-38: Validate IR shape and regex escaping for repository / fileName

The QueryIR shape here looks reasonable, but both repo.regexp and the file regexp.regexp are being populated directly from repository and fileName. If these values can contain regex metacharacters (e.g. +, ?, [], etc.), this will change the match semantics compared to a literal path/repo match and could even produce invalid patterns, depending on Zoekt’s expectations.

If the intent is “exact repo” and “exact file path”:

  • Consider normalizing both to anchored, escaped patterns, e.g. ^<escapedRepo>$ / ^<escapedPath>$, or
  • Use an IR helper that treats these as literal names rather than raw regexes (if such an abstraction exists in your new IR layer).

This is especially important for getFileSource, which is effectively a “fetch exact file” path rather than a fuzzy search.

Also applies to: 41-47

packages/web/src/app/[domain]/search/components/filterPanel/index.tsx (1)

15-20: onFilterChange / isStreaming integration looks solid

The new props are plumbed cleanly: onFilterChange is only invoked when query params actually change, and isStreaming is consistently passed down to both Filter components. This should make it straightforward for the parent to react to filter changes during streaming without introducing extra renders.

If you find the repo/language onEntryClicked blocks diverging over time, consider extracting the “toggle and sync URL + fire onFilterChange” logic into a small shared helper, but that’s optional.

Also applies to: 36-38, 42-44, 137-187

packages/web/src/prisma.ts (1)

3-3: Repo permission filter extraction looks correct; consider safer where composition

The refactor to accept user?: UserWithAccounts and delegate repo scoping to getRepoPermissionFilterForUser keeps the semantics clear: repo queries under the feature flag are constrained to repositories that are either permitted to one of the user’s accounts or public, with unauthenticated/no-account users seeing only public repos.

One nuance to be aware of: in repo.$allOperations, the new filter is merged via:

argsWithWhere.where = {
  ...(argsWithWhere.where || {}),
  ...getRepoPermissionFilterForUser(user),
};

If callers ever provide their own OR in argsWithWhere.where, this spread will overwrite it with the permission filter’s OR. Today that may be fine (or pre-existing), but if you expect more complex repo predicates in the future, it might be safer to wrap these in an explicit AND: [originalWhere, permissionFilter] instead.

Not a blocker, but worth keeping in mind as you expand repo query usage.

Also applies to: 35-52, 61-85

packages/web/src/app/[domain]/components/searchBar/searchBar.tsx (1)

95-104: Consider syncing regex/case flags from updated defaults like query

defaultIsRegexEnabled / defaultIsCaseSensitivityEnabled are only used as initial state; if the parent updates defaults (e.g. when navigating back/forward and rebuilding search params), the toggles won’t follow, while query is explicitly resynced via useEffect. If the intent is for the UI to always reflect the URL/props, consider adding similar effects for the flags:

useEffect(() => {
    setIsRegexEnabled(defaultIsRegexEnabled);
}, [defaultIsRegexEnabled]);

useEffect(() => {
    setIsCaseSensitivityEnabled(defaultIsCaseSensitivityEnabled);
}, [defaultIsCaseSensitivityEnabled]);

Or confirm that flags are intentionally “sticky” across navigation and don’t need to follow defaults.

Also applies to: 113-115, 119-133

packages/web/src/app/[domain]/search/useStreamedSearch.ts (2)

9-35: Cache key & entry shape are tightly coupled to current SearchRequest and omit stats

Two small things to consider:

  • createCacheKey manually lists a subset of SearchRequest fields. If SearchRequest later grows with result-affecting fields (e.g. repo scope, domain, filters), the cache will start cross-contaminating queries unless this function is updated in lockstep. You might want to either:

    • Base the key on the entire SearchRequest object (e.g. JSON.stringify(params)), or
    • Add a unit test that fails when SearchRequest changes without updating createCacheKey.
  • CacheEntry doesn’t store stats, and the cached setState path also omits stats, so repeated identical queries served from cache will never have stats populated even if the original streamed run did. If the UI relies on stats, consider adding it to CacheEntry and wiring it through in both the cache write and read paths.

Also applies to: 91-106, 233-245


152-231: SSE parsing assumes single-line data: events and only breaks inner loop on [DONE]

The SSE loop works for the current “data: <JSON>\n\n per event” shape, but it has a few assumptions:

  • message.match(/^data: (.+)$/) only handles a single data: line per event and ignores any additional lines/fields. If the server ever emits multi-line data: payloads or other SSE fields, this parser will start dropping information.
  • When encountering [DONE], you break only the inner for (const message of messages) loop; the outer while continues reading until the stream naturally ends. If the server keeps the SSE connection open after sending [DONE], this hook would never finalize/search-cache until the connection closes.

If you expect to rely on a [DONE] sentinel, consider:

  • Tracking a doneStreaming flag that breaks the outer loop when [DONE] is seen, and
  • Optionally making the parsing a bit more tolerant to additional data: lines if the backend ever evolves.
packages/web/src/features/codeNav/api.ts (3)

23-45: Consider escaping symbolName in the IR regexp to avoid unintended regex behavior

Both reference and definition queries build a regexp around symbolName with \\b${symbolName}\\b, but symbolName is interpolated raw. If it can contain regex metacharacters (e.g., ., *, +, ?, []), this will change the meaning of the pattern and could either over‑match or fail to match the intended symbol.

If you want literal symbol lookup, consider escaping the name before interpolation, e.g.:

+const escapeRegex = (value: string) =>
+  value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");

- regexp: `\\b${symbolName}\\b`,
+ regexp: `\\b${escapeRegex(symbolName)}\\b`,

If, instead, you intentionally rely on regex semantics here, it would be good to document that assumption and add tests for symbols containing regex‑special characters.

Also applies to: 74-88


116-140: Simplified response parsing is good; consider map instead of flatMap on files

The new parseRelatedSymbolsSearchResponse that directly maps SearchResponseFindRelatedSymbolsResponse is clear, and the inner chunks.flatMap(...matchRanges...) with a final .filter(file => file.matches.length > 0) makes sense.

Minor nit: searchResult.files.flatMap((file) => { return { ... } }) always returns a single object per file, so flatMap behaves like map here and is a bit surprising to readers. Switching to map would better express intent without behavior change:

- files: searchResult.files.flatMap((file) => {
+ files: searchResult.files.map((file) => {
   // ...
- }).filter((file) => file.matches.length > 0),
+ }).filter((file) => file.matches.length > 0),

143-181: Language IR filter expansion looks good; just confirm language values match Zoekt’s language names

The getExpandedLanguageFilter helper that builds an or over TypeScript/JavaScript/JSX/TSX (and otherwise a single language node) is a nice improvement over string‑based filters. Please just ensure the language strings here exactly match the language identifiers emitted by your indexer/Zoekt configuration so the filters don’t silently miss matches.

packages/web/src/features/search/parser.ts (1)

126-203: Unknown AST node types defaulting to const: true can silently broaden queries

In transformTreeToIR’s default branch, unknown node types log a warning and return { const: true, query: "const" }, effectively “match all”. That’s safe in the sense of not throwing, but if the grammar evolves and new nodes appear, this will silently broaden user queries instead of failing fast.

Consider failing closed by throwing (or converting to a parse error as per the previous comment), so unsupported syntax doesn’t produce unexpectedly wide result sets.

packages/web/src/app/[domain]/search/components/searchResultsPage.tsx (2)

80-86: Factor out error message construction to keep UI and toast in sync

Both the toast and inline error message compute the same ternary expression for ServiceErrorException vs generic Error. To keep things DRY and ensure they never diverge, you could precompute errorMessage once:

-    useEffect(() => {
-        if (error) {
-            toast({
-                description: `❌ Search failed. Reason: ${error instanceof ServiceErrorException ? error.serviceError.message : error.message}`,
-            });
-        }
-    }, [error, toast]);
+    const errorMessage =
+        error instanceof ServiceErrorException
+            ? error.serviceError.message
+            : error?.message;
+
+    useEffect(() => {
+        if (errorMessage) {
+            toast({
+                description: `❌ Search failed. Reason: ${errorMessage}`,
+            });
+        }
+    }, [errorMessage, toast]);
@@
-                    <p className="text-sm text-center">{error instanceof ServiceErrorException ? error.serviceError.message : error.message}</p>
+                    <p className="text-sm text-center">{errorMessage}</p>

This keeps the error handling consistent and slightly simplifies the JSX.

Also applies to: 184-189


105-114: Consider gating console.debug timing logs to non‑production environments

The console.debug calls for timeToFirstSearchResultMs and timeToSearchCompletionMs are useful during development, but may add noise in production consoles.

If you don’t rely on them in prod, consider wrapping them in an environment check or removing them once metrics are fully wired into analytics:

-        console.debug('timeToFirstSearchResultMs:', timeToFirstSearchResultMs);
-        console.debug('timeToSearchCompletionMs:', timeToSearchCompletionMs);
+        if (process.env.NODE_ENV !== 'production') {
+            console.debug('timeToFirstSearchResultMs:', timeToFirstSearchResultMs);
+            console.debug('timeToSearchCompletionMs:', timeToSearchCompletionMs);
+        }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09507d3 and 2f6fc04.

⛔ Files ignored due to path filters (1)
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (107)
  • .env.development (0 hunks)
  • CHANGELOG.md (1 hunks)
  • Dockerfile (5 hunks)
  • package.json (2 hunks)
  • packages/backend/src/index.ts (0 hunks)
  • packages/db/src/index.ts (1 hunks)
  • packages/mcp/CHANGELOG.md (1 hunks)
  • packages/mcp/src/index.ts (1 hunks)
  • packages/mcp/src/schemas.ts (2 hunks)
  • packages/queryLanguage/.gitignore (1 hunks)
  • packages/queryLanguage/package.json (1 hunks)
  • packages/queryLanguage/src/index.ts (1 hunks)
  • packages/queryLanguage/src/parser.terms.ts (1 hunks)
  • packages/queryLanguage/src/parser.ts (1 hunks)
  • packages/queryLanguage/src/query.grammar (1 hunks)
  • packages/queryLanguage/src/tokens.ts (1 hunks)
  • packages/queryLanguage/test.ts (1 hunks)
  • packages/queryLanguage/test/basic.txt (1 hunks)
  • packages/queryLanguage/test/grammar.test.ts (1 hunks)
  • packages/queryLanguage/test/grouping.txt (1 hunks)
  • packages/queryLanguage/test/negation.txt (1 hunks)
  • packages/queryLanguage/test/operators.txt (1 hunks)
  • packages/queryLanguage/test/precedence.txt (1 hunks)
  • packages/queryLanguage/test/prefixes.txt (1 hunks)
  • packages/queryLanguage/test/quoted.txt (1 hunks)
  • packages/queryLanguage/tsconfig.json (1 hunks)
  • packages/queryLanguage/vitest.config.ts (1 hunks)
  • packages/web/.eslintignore (1 hunks)
  • packages/web/package.json (3 hunks)
  • packages/web/src/actions.ts (1 hunks)
  • packages/web/src/app/[domain]/browse/layout.tsx (1 hunks)
  • packages/web/src/app/[domain]/components/lightweightCodeHighlighter.tsx (1 hunks)
  • packages/web/src/app/[domain]/components/pathHeader.tsx (1 hunks)
  • packages/web/src/app/[domain]/components/searchBar/constants.ts (1 hunks)
  • packages/web/src/app/[domain]/components/searchBar/searchBar.tsx (5 hunks)
  • packages/web/src/app/[domain]/components/searchBar/searchSuggestionsBox.tsx (4 hunks)
  • packages/web/src/app/[domain]/components/searchBar/useRefineModeSuggestions.ts (1 hunks)
  • packages/web/src/app/[domain]/components/searchBar/useSuggestionModeMappings.ts (1 hunks)
  • packages/web/src/app/[domain]/components/searchBar/useSuggestionsData.ts (1 hunks)
  • packages/web/src/app/[domain]/components/searchBar/zoektLanguageExtension.ts (1 hunks)
  • packages/web/src/app/[domain]/search/components/codePreviewPanel/codePreview.tsx (1 hunks)
  • packages/web/src/app/[domain]/search/components/codePreviewPanel/index.tsx (1 hunks)
  • packages/web/src/app/[domain]/search/components/filterPanel/filter.tsx (3 hunks)
  • packages/web/src/app/[domain]/search/components/filterPanel/index.tsx (5 hunks)
  • packages/web/src/app/[domain]/search/components/filterPanel/useFilterMatches.ts (1 hunks)
  • packages/web/src/app/[domain]/search/components/searchResultsPage.tsx (9 hunks)
  • packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatch.tsx (1 hunks)
  • packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx (2 hunks)
  • packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx (7 hunks)
  • packages/web/src/app/[domain]/search/page.tsx (2 hunks)
  • packages/web/src/app/[domain]/search/useStreamedSearch.ts (1 hunks)
  • packages/web/src/app/api/(client)/client.ts (1 hunks)
  • packages/web/src/app/api/(server)/search/route.ts (2 hunks)
  • packages/web/src/app/api/(server)/source/route.ts (1 hunks)
  • packages/web/src/app/api/(server)/stream_search/route.ts (1 hunks)
  • packages/web/src/ee/features/codeNav/components/exploreMenu/referenceList.tsx (1 hunks)
  • packages/web/src/ee/features/codeNav/components/symbolHoverPopup/symbolDefinitionPreview.tsx (1 hunks)
  • packages/web/src/ee/features/codeNav/components/symbolHoverPopup/useHoveredOverSymbolInfo.ts (1 hunks)
  • packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts (1 hunks)
  • packages/web/src/features/chat/tools.ts (5 hunks)
  • packages/web/src/features/codeNav/api.ts (4 hunks)
  • packages/web/src/features/codeNav/types.ts (1 hunks)
  • packages/web/src/features/search/README.md (1 hunks)
  • packages/web/src/features/search/fileSourceApi.ts (1 hunks)
  • packages/web/src/features/search/index.ts (1 hunks)
  • packages/web/src/features/search/ir.ts (1 hunks)
  • packages/web/src/features/search/parser.ts (1 hunks)
  • packages/web/src/features/search/schemas.ts (0 hunks)
  • packages/web/src/features/search/searchApi.ts (1 hunks)
  • packages/web/src/features/search/types.ts (1 hunks)
  • packages/web/src/features/search/zoektClient.ts (0 hunks)
  • packages/web/src/features/search/zoektSchema.ts (0 hunks)
  • packages/web/src/features/search/zoektSearcher.ts (1 hunks)
  • packages/web/src/lib/errorCodes.ts (1 hunks)
  • packages/web/src/lib/extensions/searchResultHighlightExtension.ts (1 hunks)
  • packages/web/src/lib/posthogEvents.ts (2 hunks)
  • packages/web/src/lib/types.ts (1 hunks)
  • packages/web/src/lib/utils.ts (1 hunks)
  • packages/web/src/prisma.ts (4 hunks)
  • packages/web/src/proto/google/protobuf/Duration.ts (1 hunks)
  • packages/web/src/proto/google/protobuf/Timestamp.ts (1 hunks)
  • packages/web/src/proto/query.ts (1 hunks)
  • packages/web/src/proto/webserver.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/And.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Boost.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Branch.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/BranchRepos.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/BranchesRepos.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/ChunkMatch.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/FileMatch.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/FileNameSet.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/FlushReason.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/IndexMetadata.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Language.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/LineFragmentMatch.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/LineMatch.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/ListOptions.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/ListRequest.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/ListResponse.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Location.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/MinimalRepoListEntry.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Not.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Or.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Progress.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Q.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Range.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/RawConfig.ts (1 hunks)
⛔ Files not processed due to max files limit (21)
  • packages/web/src/proto/zoekt/webserver/v1/Regexp.ts
  • packages/web/src/proto/zoekt/webserver/v1/Repo.ts
  • packages/web/src/proto/zoekt/webserver/v1/RepoIds.ts
  • packages/web/src/proto/zoekt/webserver/v1/RepoListEntry.ts
  • packages/web/src/proto/zoekt/webserver/v1/RepoRegexp.ts
  • packages/web/src/proto/zoekt/webserver/v1/RepoSet.ts
  • packages/web/src/proto/zoekt/webserver/v1/RepoStats.ts
  • packages/web/src/proto/zoekt/webserver/v1/Repository.ts
  • packages/web/src/proto/zoekt/webserver/v1/RepositoryBranch.ts
  • packages/web/src/proto/zoekt/webserver/v1/SearchOptions.ts
  • packages/web/src/proto/zoekt/webserver/v1/SearchRequest.ts
  • packages/web/src/proto/zoekt/webserver/v1/SearchResponse.ts
  • packages/web/src/proto/zoekt/webserver/v1/Stats.ts
  • packages/web/src/proto/zoekt/webserver/v1/StreamSearchRequest.ts
  • packages/web/src/proto/zoekt/webserver/v1/StreamSearchResponse.ts
  • packages/web/src/proto/zoekt/webserver/v1/Substring.ts
  • packages/web/src/proto/zoekt/webserver/v1/Symbol.ts
  • packages/web/src/proto/zoekt/webserver/v1/SymbolInfo.ts
  • packages/web/src/proto/zoekt/webserver/v1/Type.ts
  • packages/web/src/proto/zoekt/webserver/v1/WebserverService.ts
  • packages/web/src/withAuthV2.ts
💤 Files with no reviewable changes (5)
  • packages/backend/src/index.ts
  • .env.development
  • packages/web/src/features/search/zoektClient.ts
  • packages/web/src/features/search/schemas.ts
  • packages/web/src/features/search/zoektSchema.ts
🧰 Additional context used
🪛 Biome (2.1.2)
packages/web/src/features/search/ir.ts

[error] 4-4: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

🪛 LanguageTool
packages/queryLanguage/test/negation.txt

[grammar] ~193-~193: Use a hyphen to join words.
Context: ...PrefixExpr(FileExpr))))) # Negate short form prefix -f:test.js ==> Program(Ne...

(QB_NEW_EN_HYPHEN)


[grammar] ~209-~209: Use a hyphen to join words.
Context: ...r(PrefixExpr(RepoExpr))) # Negate short form content -c:console ==> Program(N...

(QB_NEW_EN_HYPHEN)

packages/queryLanguage/test/basic.txt

[typographical] ~10-~10: This greeting should probably end with a comma.
Context: ... ==> Program(Term) # Multiple terms hello world ==> Program(AndExpr(Term,Term)) # Mu...

(EN_GREETING_WITHOUT_COMMA)

packages/queryLanguage/test/quoted.txt

[grammar] ~465-~465: Use a hyphen to join words.
Context: ...m,PrefixExpr(FileExpr))) # Quoted short form prefix "f:test" ==> Program(Term...

(QB_NEW_EN_HYPHEN)

packages/queryLanguage/test/operators.txt

[grammar] ~97-~97: Use a hyphen to join words.
Context: ...ixExpr(FileExpr),Term)) # OR with short form prefixes f:test.js or r:myrepo ==...

(QB_NEW_EN_HYPHEN)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants