Skip to content

feat: Built-in s3 key sanitizer (when creating files) #337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

AdrianCurtin
Copy link

@AdrianCurtin AdrianCurtin commented Aug 13, 2025

Adds an automatic cleaning of S3 object keys to ensure they are safe and compliant with S3 guidelines. With optional addition of a cleanKey flag, which, when enabled, sanitizes filenames (or generated keys) before storing files in S3.

(Requires parse-community/parse-server#9557 due to requirement that the revised s3 filepath is returned and saved)

Key changes:

S3 Key Cleaning Feature:

  • Added a cleanS3Key function that sanitizes S3 object keys by replacing or removing problematic characters, whitespace, and non-printable characters, ensuring compliance with S3 key guidelines.
  • Updated the S3Adapter class to accept a cleanKey option and apply the cleaning logic to filenames or generated keys before uploading files to S3. The cleaning is applied after any custom key generation. [1] [2]

Testing Enhancements:

  • Added new tests to verify that S3 keys are properly cleaned when the cleanKey option is enabled, including cases for both direct filenames and generated keys. Also added a test to ensure that keys are not cleaned when the option is disabled.

Summary by CodeRabbit

  • New Features
    • Added an optional “clean key” setting for S3 uploads to automatically sanitize object names. This removes or replaces problematic characters, collapses whitespace to single hyphens, trims leading/trailing hyphens or periods, and ensures a non-empty fallback. The cleaning also applies to keys produced by custom key generators. Disabled by default; enable via configuration.

optionally enabled, should have no effect on valid filenames
Copy link

I will reformat the title to use the proper commit message syntax.

@parse-github-assistant parse-github-assistant bot changed the title feat: built-in s3 key sanitizer (when creating files) feat: Built-in s3 key sanitizer (when creating files) Aug 13, 2025
Copy link

🚀 Thanks for opening this pull request!

Copy link

coderabbitai bot commented Aug 13, 2025

📝 Walkthrough

Walkthrough

Added an S3 key sanitization helper and a new cleanKey option to S3Adapter. createFile now derives a finalKey via optional generateKey, optionally sanitizes it, and uses it for PutObject. Tests verify cleaning behavior with/without generateKey and when cleanKey is disabled.

Changes

Cohort / File(s) Summary
S3Adapter Core
index.js
Introduced cleanS3Key helper; added constructor option cleanKey; updated createFile to compute finalKey (generateKey → optional clean) and use it in PutObject Key; retained other S3 params unchanged.
Tests
spec/test.spec.js
Added tests covering key cleaning: basic filename cleaning, cleaning after generateKey, and no cleaning when disabled; assert PutObject Key and call sequence.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant S3Adapter
  participant S3Client as AWS S3Client

  Caller->>S3Adapter: createFile(filename, data, contentType, options)
  alt generateKey provided
    S3Adapter->>S3Adapter: finalKey = generateKey(...)
  else
    S3Adapter->>S3Adapter: finalKey = filename
  end
  opt cleanKey enabled
    S3Adapter->>S3Adapter: finalKey = cleanS3Key(finalKey)
  end
  S3Adapter->>S3Client: PutObject(Key = bucketPrefix + finalKey, Body, ContentType, ACL, ...)
  S3Client-->>S3Adapter: PutObjectResult
  S3Adapter-->>Caller: result
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@parseplatformorg
Copy link
Contributor

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
index.js (1)

209-256: Confirm downstream usage stores/returns the sanitized key

Cleaning occurs only at write time; delete/get still take the caller-provided filename. Ensure the caller (Parse Server) persists and passes the sanitized finalKey thereafter, or you’ll get 404s on read/delete. You already noted the dependency on parse-server#9557; please document the minimum Parse Server version in the README and consider feature-flagging behavior based on an adapter capability handshake if needed.

spec/test.spec.js (2)

871-892: Good coverage for filename cleaning with cleanKey=true

This verifies the PutObject Key is sanitized as expected. Consider also adding a case with a bucketPrefix to ensure the Key is prefix + cleaned name.

Apply this additional test:

it('should include bucketPrefix when cleaning is enabled', async () => {
  const options = { bucket: 'bucket-1', bucketPrefix: 'pref/', cleanKey: true };
  const s3 = new S3Adapter(options);
  s3ClientMock.send.and.returnValue(Promise.resolve({}));
  s3._s3Client = s3ClientMock;

  const problematicFilename = ' bad name?.txt ';
  await s3.createFile(problematicFilename, 'hello world', 'text/utf8', {});

  const commandArg = s3ClientMock.send.calls.all()[1].args[0];
  expect(commandArg).toBeInstanceOf(PutObjectCommand);
  expect(commandArg.input.Key.startsWith('pref/')).toBeTrue();
});

894-915: Cleaning applies to generated keys — add async generateKey variance

Great to assert cleaning runs after generateKey. Add a variant with async generateKey to lock in behavior if we adopt awaiting generateKey in index.js.

it('should await async generateKey and then clean', async () => {
  const options = {
    bucket: 'bucket-1',
    cleanKey: true,
    generateKey: async (filename) => `generated/${filename.normalize('NFKD')}`
  };
  const s3 = new S3Adapter(options);
  s3ClientMock.send.and.returnValue(Promise.resolve({}));
  s3._s3Client = s3ClientMock;

  await s3.createFile('café.txt', 'hello world', 'text/utf8', {});
  const cmd = s3ClientMock.send.calls.all()[1].args[0];
  expect(cmd).toBeInstanceOf(PutObjectCommand);
  expect(cmd.input.Key).toMatch(/^generated\//);
});
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e634835 and 035b407.

📒 Files selected for processing (2)
  • index.js (3 hunks)
  • spec/test.spec.js (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
spec/test.spec.js (2)
spec/mocks/s3adapter.js (2)
  • s3 (11-15)
  • S3Adapter (2-2)
spec/support/server.js (1)
  • S3Adapter (8-8)
🔇 Additional comments (2)
index.js (2)

142-142: Constructor wiring for cleanKey looks good

Storing the option as a boolean flag is clear and keeps the feature opt-in.


221-224: Guard against undefined bucketPrefix when building the Key

If bucketPrefix is not provided and optionsFromArguments doesn’t default it, you’ll end up with "undefined" on the wire. Safe-guard by falling back to ''.

-      Key: this._bucketPrefix + finalKey,
+      Key: (this._bucketPrefix || '') + finalKey,

Comment on lines +50 to +112
function cleanS3Key(key) {
// Single-pass character replacement using a lookup map for better performance
// See: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html#object-key-guidelines

const charMap = {
'\\': '-',
'{': '-', '}': '-',
'^': '-', '`': '-', '[': '-', ']': '-', '"': '-', '<': '-', '>': '-',
'~': '-', '#': '-', '|': '-', '%': '-',
'&': '-and-',
'$': '-dollar-',
'@': '-at-',
'=': '-equals-',
';': '-', ':': '-', '+': '-plus-', ',': '-', '?': '-'
};

let result = '';
let lastWasHyphen = false;

for (let i = 0; i < key.length; i++) {
const char = key[i];
const charCode = char.charCodeAt(0);

// Skip non-printable ASCII and extended ASCII
if ((charCode >= 0 && charCode <= 31) || charCode === 127 || charCode >= 128) {
continue;
}

// Handle whitespace - convert to single hyphen
if (/\s/.test(char)) {
if (!lastWasHyphen) {
result += '-';
lastWasHyphen = true;
}
continue;
}

// Replace problematic characters
if (charMap[char]) {
const replacement = charMap[char];
if (replacement === '-') {
if (!lastWasHyphen) {
result += '-';
lastWasHyphen = true;
}
} else {
result += replacement;
lastWasHyphen = false;
}
continue;
}

// Keep safe characters
result += char;
lastWasHyphen = false;
}

// Remove leading/trailing hyphens and periods
result = result.replace(/^[-\.]+|[-\.]+$/g, '');

// Ensure we don't end up with an empty string
return result || `file-${Date.now()}-${Math.random().toString(36).substring(2, 11)}`;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

cleanS3Key: solid baseline, but consider preserving Unicode and tightening post-processing

The implementation is efficient and readable. A few improvements will make it safer and more broadly compatible with S3:

  • Dropping all non-ASCII (Line 74) is aggressive; S3 supports UTF‑8 object keys. Consider percent-encoding (or configurable transliteration) instead of dropping.
  • Expansions like "-and-" can introduce double hyphens between adjacent replacements. Collapse consecutive hyphens in a final pass.
  • Guard against the 1024-byte S3 key length limit.

Suggested diffs:

  1. Preserve non-ASCII via encoding instead of dropping:
-    // Skip non-printable ASCII and extended ASCII
-    if ((charCode >= 0 && charCode <= 31) || charCode === 127 || charCode >= 128) {
-      continue;
-    }
+    // Skip non-printable ASCII; preserve Unicode by percent-encoding
+    if ((charCode >= 0 && charCode <= 31) || charCode === 127) {
+      continue;
+    }
+    if (charCode >= 128) {
+      result += encodeURIComponent(char);
+      lastWasHyphen = false;
+      continue;
+    }
  1. Collapse multiple hyphens and enforce S3 key length (1024 bytes) after trimming:
   // Remove leading/trailing hyphens and periods
   result = result.replace(/^[-\.]+|[-\.]+$/g, '');
 
+  // Collapse multiple hyphens
+  result = result.replace(/-{2,}/g, '-');
+
+  // Enforce S3 key length (bytes)
+  if (Buffer.byteLength(result, 'utf8') > 1024) {
+    // Truncate conservatively at the byte boundary
+    let trimmed = '';
+    for (let i = 0, bytes = 0; i < result.length; i++) {
+      const b = Buffer.byteLength(result[i], 'utf8');
+      if (bytes + b > 1024) break;
+      trimmed += result[i];
+      bytes += b;
+    }
+    result = trimmed;
+  }
+
   // Ensure we don't end up with an empty string
   return result || `file-${Date.now()}-${Math.random().toString(36).substring(2, 11)}`;
  1. Minor nit: avoid re-creating the charMap per invocation by hoisting it to module scope.

Example (outside the function):

const CLEAN_S3_KEY_CHARMAP = {
  '\\': '-',
  '{': '-', '}': '-',
  '^': '-', '`': '-', '[': '-', ']': '-', '"': '-', '<': '-', '>': '-',
  '~': '-', '#': '-', '|': '-', '%': '-',
  '&': '-and-',
  '$': '-dollar-',
  '@': '-at-',
  '=': '-equals-',
  ';': '-', ':': '-', '+': '-plus-', ',': '-', '?': '-'
};

and inside the function:

-  const charMap = {
-    ...
-  };
+  const charMap = CLEAN_S3_KEY_CHARMAP;
🤖 Prompt for AI Agents
In index.js around lines 50 to 112, the cleanS3Key function currently drops
non-ASCII characters, can produce consecutive hyphens from multi-character
replacements, recreates the charMap on every call, and doesn't enforce S3's
1024-byte key limit; fix by hoisting the charMap to module scope as a constant,
change the non-ASCII handling to percent-encode (or URL-encode) UTF-8 bytes
instead of skipping, after building the result collapse multiple consecutive
hyphens into a single hyphen and trim leading/trailing hyphens/periods, then
enforce the 1024-byte limit by truncating the UTF-8 encoded bytes (not
characters) and re-trimming if truncation leaves trailing hyphens/periods, and
ensure lastWasHyphen remains consistent when inserting multi-character
replacements.

Comment on lines +209 to +220
let finalKey = filename;

// Apply generateKey if provided
if (this._generateKey instanceof Function) {
finalKey = this._generateKey(filename, contentType, options);
}

// Apply key cleaning if enabled (after generateKey)
if (this._cleanKey) {
finalKey = cleanS3Key(finalKey);
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Support async generateKey to avoid surprising breakage

If a consumer provides an async generateKey, the current code will pass a Promise to cleanS3Key and to S3 as the Key. Since createFile is already async, await generateKey to support both sync and async implementations.

-    if (this._generateKey instanceof Function) {
-      finalKey = this._generateKey(filename, contentType, options);
-    }
+    if (this._generateKey instanceof Function) {
+      finalKey = await this._generateKey(filename, contentType, options);
+    }

🏁 Script executed:

#!/bin/bash
# Locate createFile definition and check if it's async
rg -n "createFile" -C3 index.js | sed -n '1,30p'

# Specifically search for 'async createFile' in index.js
rg -n "async createFile" -n index.js

# Find where cleanS3Key is defined
rg -R "cleanS3Key" -n .

Length of output: 879


Await async generateKey in createFile

Since createFile is already async, we should await any user-provided generateKey to avoid passing a Promise into cleanS3Key or to S3.

Locations to update:

  • File: index.js
  • Method: async createFile(...)
  • Around lines 211–214

Apply this diff:

-    if (this._generateKey instanceof Function) {
-      finalKey = this._generateKey(filename, contentType, options);
-    }
+    if (this._generateKey instanceof Function) {
+      finalKey = await this._generateKey(filename, contentType, options);
+    }

This ensures both sync and async generateKey implementations are handled correctly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let finalKey = filename;
// Apply generateKey if provided
if (this._generateKey instanceof Function) {
finalKey = this._generateKey(filename, contentType, options);
}
// Apply key cleaning if enabled (after generateKey)
if (this._cleanKey) {
finalKey = cleanS3Key(finalKey);
}
let finalKey = filename;
// Apply generateKey if provided
if (this._generateKey instanceof Function) {
finalKey = await this._generateKey(filename, contentType, options);
}
// Apply key cleaning if enabled (after generateKey)
if (this._cleanKey) {
finalKey = cleanS3Key(finalKey);
}
🤖 Prompt for AI Agents
In index.js around lines 209 to 220, the code calls this._generateKey
synchronously and may pass a Promise into cleanS3Key or S3; change the block
that applies generateKey so that if this._generateKey is a function you await
its result (e.g. finalKey = await this._generateKey(filename, contentType,
options)), then continue to apply cleanS3Key and use finalKey; ensure createFile
remains async and handle the awaited value as the string key.

Comment on lines +917 to +936
it('should not clean keys when cleanKey option is disabled', async () => {
const options = {
bucket: 'bucket-1',
cleanKey: false // explicitly disabled
};
const s3 = new S3Adapter(options);
s3ClientMock.send.and.returnValue(Promise.resolve({}));
s3._s3Client = s3ClientMock;

const problematicFilename = 'test file{with}[bad]chars.txt';
await s3.createFile(problematicFilename, 'hello world', 'text/utf8', {});

expect(s3ClientMock.send).toHaveBeenCalledTimes(2);
const commands = s3ClientMock.send.calls.all();
const commandArg = commands[1].args[0];
expect(commandArg).toBeInstanceOf(PutObjectCommand);

// Should preserve original filename
expect(commandArg.input.Key).toBe(problematicFilename);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Non-cleaning path validated; please add Unicode and empty-result edge cases

Two additional edge-cases worth locking down:

  • Unicode characters (e.g., 'café/Δ.txt') don’t get dropped unexpectedly (or do, if that’s desired), and behavior is documented.
  • An all-removed input results in the fallback name.

I can contribute tests for these scenarios if helpful.

🤖 Prompt for AI Agents
In spec/test.spec.js around lines 917 to 936 add two small tests extending the
existing "cleanKey disabled" spec: (1) add a test that uses a filename
containing Unicode characters (e.g., 'café/Δ.txt') and asserts that the
PutObjectCommand Key equals the original Unicode string when cleanKey is false;
(2) add a test for the all-removed-input edge case where the original filename
would be sanitized to an empty string (simulate by using a filename composed
only of characters your cleaner would remove) and assert that the adapter falls
back to the configured fallback name (or documented default) as the
PutObjectCommand Key. Ensure both tests set cleanKey: false for the Unicode case
and the relevant cleanKey behavior for the fallback case, reuse s3ClientMock
setup and assertions style (call counts and instance checks) consistent with
neighboring tests.

@AdrianCurtin
Copy link
Author

Actually this can't be done without #242

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants