Skip to content

Andrey - Get All Files#250

Merged
KashGiannis34 merged 9 commits intomainfrom
andrey/247-get-all-files-from-cloud
Apr 12, 2026
Merged

Andrey - Get All Files#250
KashGiannis34 merged 9 commits intomainfrom
andrey/247-get-all-files-from-cloud

Conversation

@nyccreator
Copy link
Copy Markdown
Member

Adds an endpoint to retrieve files from all S3 and Azure buckets for a config from the db. Includes E2E tests for both file-service and api-gateway.

Demo Video:
https://github.com/user-attachments/assets/d5cbcb8d-a2f9-4737-8d9c-c83b43248e2c

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 31, 2026

Greptile Summary

This PR adds a GET /file/all/:configId endpoint that retrieves all files across every S3 and Azure bucket associated with a given config, fanning out to provider APIs in parallel and grouping results by bucket name. The implementation spans the proto definition, file-service gRPC handler, and API gateway, and is accompanied by E2E tests for both layers.

Issues found:

  • Silent error swallowing (P1): The inner catch { return null; } block in getAllFiles (file-service) discards every per-bucket error without logging or surfacing it. Callers receive a 200 with the failed bucket simply absent — indistinguishable from "bucket is empty." At minimum errors should be logged; ideally an error field should be included in the response.
  • Hardcoded S3 region (P2): listFiles hard-codes 'us-east-005', continuing a pre-existing pattern in the S3 handler. This silently fails for any non-Backblaze S3 bucket, and because errors are swallowed, failures would be invisible to callers.
  • Fragile test assertion (P2): The "nonexistent config" test asserts response.files is undefined rather than [], relying on proto-js serialization behaviour rather than the feature's semantic contract.
  • Test leaks a DB entry (P2): The happy-path getAllFiles test calls registerBucket (which writes to both S3 and DB) but only deletes the S3 resource on cleanup, leaving a stale DB row across test runs.

Checklist areas needing attention (score: 72/100):

  • Error handling: getAllFiles does not surface per-bucket failures to the caller — errors are swallowed silently.
  • Operations optimized / handles edge cases: No indication to the user when some (or all) buckets fail to list files.
  • Schemas/tests: Test cleanup is incomplete (DB entry leak), and one assertion is semantically incorrect.

Confidence Score: 4/5

Safe to merge after addressing the silent error-swallowing in the service layer; the rest of the surface area is clean.

One P1 issue exists: per-bucket errors are silently swallowed, causing getAllFiles to return HTTP 200 with missing data and no signal to the caller when buckets fail. This is a correctness issue on the changed path that should be resolved before merging. The remaining findings are P2 (hardcoded region, fragile test assertion, DB cleanup gap) and do not block merge on their own.

packages/file-service/src/modules/file_bucket/file_bucket.service.ts — the inner catch block at line 182 needs to at minimum log errors before discarding them.

Important Files Changed

Filename Overview
packages/file-service/src/modules/file_bucket/file_bucket.service.ts Adds getAllFiles: fetches buckets from DB then fans out to S3/Azure handlers in parallel; inner catch silently swallows per-bucket errors, returning a partial/empty 200 with no indication of failure — P1 issue.
packages/file-service/src/modules/file_bucket/s3_handler.ts Adds listFiles with proper pagination via ContinuationToken; region is hardcoded to 'us-east-005' matching existing methods but limits use to Backblaze B2.
packages/file-service/src/modules/file_bucket/azure_handler.ts Adds listFiles using listBlobsFlat with async iteration; implementation is clean and consistent with the existing Azure handler pattern.
packages/api-gateway/src/modules/file_bucket/file_bucket.controller.ts Adds GET /file/all/:configId with input validation, auth guard, and correct gRPC forwarding; implementation mirrors existing getBucketsByConfigIdAndEnv pattern.
packages/api-gateway/src/modules/file_bucket/file_bucket.module.ts Extends ApiKeyMiddleware to cover /file/all/* route; consistent with existing protection patterns.
packages/api-gateway/src/models/file_bucket.dto.ts Adds BucketWithFiles DTO using an index signature to produce {bucketName: files[]} objects; unusual shape (array of single-key maps) but functional; no Swagger ApiProperty decorators on the dynamic key.
packages/file-service/src/modules/file_bucket/file_bucket.controller.ts Delegates getAllFiles to the service; clean pass-through matching existing controller pattern.
packages/proto/definitions/file_bucket.proto Adds GetAllFilesRequest, Files, and GetAllFilesResponse messages, and the getAllFiles RPC to BucketFileService; proto definition looks correct.
packages/file-service/test/file_bucket.e2e-spec.ts Adds two E2E tests for getAllFiles; happy-path test leaks a DB entry after cleanup, and the nonexistent-config assertion relies on protobuf runtime behavior (undefined vs []) rather than the semantic contract.
packages/api-gateway/test/file_bucket.e2e-spec.ts Adds three gateway-level E2E tests covering success, invalid configId (400), and missing auth (401); coverage is appropriate.

Sequence Diagram

sequenceDiagram
    participant Client
    participant APIGateway as API Gateway<br/>/file/all/:configId
    participant FileService as File Service<br/>getAllFiles RPC
    participant DBService as DB Service<br/>getBucketsByConfigIdAndEnv
    participant S3 as S3 / Azure

    Client->>APIGateway: GET /file/all/:configId<br/>(Bearer token)
    APIGateway->>APIGateway: Validate configId (parseInt, ≥0)
    APIGateway->>FileService: gRPC getAllFiles(configId, configEnv)
    FileService->>DBService: getBucketsByConfigIdAndEnv(configId, configEnv)
    DBService-->>FileService: Buckets[]
    loop For each bucket (parallel Promise.all)
        FileService->>FileService: getProvider(bucket.fileProviderName)
        alt S3 provider
            FileService->>S3: ListObjectsV2Command (paginated)
            S3-->>FileService: file keys[]
        else Azure provider
            FileService->>S3: listBlobsFlat()
            S3-->>FileService: blob names[]
        else Error
            FileService-->>FileService: catch → return null (silent)
        end
    end
    FileService->>FileService: filter(result !== null)
    FileService-->>APIGateway: GetAllFilesResponse { files: Files[] }
    APIGateway-->>Client: 200 BucketWithFiles[]
Loading

Reviews (1): Last reviewed commit: "test: add e2e tests for getAllFiles" | Re-trigger Greptile

Comment thread packages/file-service/src/modules/file_bucket/file_bucket.service.ts Outdated
Comment thread packages/file-service/src/modules/file_bucket/s3_handler.ts
Comment thread packages/file-service/test/file_bucket.e2e-spec.ts
Comment on lines +272 to +323
it('Successfully gets all files for a config', async () => {
const metadata = {
endpoint: baseURL,
region: region,
credentials: {
accessKeyId: accessKeyId as string,
secretAccessKey: secretAccessKey as string,
},
};

const client = new S3Client(metadata);
const command = new DeleteBucketCommand({
Bucket: `get-all-files-bucket-${configId}-${configEnv}`,
});

try {
await client.send(command);
} catch {}

await new Promise((resolve, reject) => {
bucketClient.registerBucket(
{
name: 'get-all-files-bucket',
configId,
configEnv,
fileProviderName: providerName,
},
(err: any) => {
if (err) return reject(err);
resolve(0);
},
);
});

const getAllFilesPromise = new Promise((resolve, reject) => {
bucketClient.getAllFiles(
{
configId,
configEnv,
},
(err: any, response: any) => {
if (err) return reject(err);
expect(err).toBeNull();
expect(response).toBeDefined();
expect(response.files).toBeDefined();
resolve(response);
},
);
});
await getAllFilesPromise;

await client.send(command);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Test leaks a DB entry after the getAllFiles happy-path test

registerBucket writes both to S3 and to the Juno DB. The cleanup at the end of the test calls client.send(command) which only deletes the S3 bucket; the corresponding DB row for get-all-files-bucket is never removed. The resetDb in beforeAll only runs once per test suite, so subsequent test runs (without a full reset) could see a stale row and get unexpected results.

Either call the service's removeBucket RPC after the assertions, or add a try/finally block so cleanup always runs even on test failure.

@llam36 llam36 linked an issue Apr 12, 2026 that may be closed by this pull request
11 tasks
Copy link
Copy Markdown
Collaborator

@KashGiannis34 KashGiannis34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@KashGiannis34 KashGiannis34 merged commit c2ef7ec into main Apr 12, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Spring 2026] Get all files from remote file providers

2 participants