Skip to content

feat: Built-in s3 key sanitizer (when creating files) #337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 78 additions & 5 deletions index.js
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,70 @@ async function buildDirectAccessUrl(baseUrl, baseUrlFileKey, presignedUrl, confi
return directAccessUrl;
}

function cleanS3Key(key) {
// Single-pass character replacement using a lookup map for better performance
// See: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html#object-key-guidelines

const charMap = {
'\\': '-',
'{': '-', '}': '-',
'^': '-', '`': '-', '[': '-', ']': '-', '"': '-', '<': '-', '>': '-',
'~': '-', '#': '-', '|': '-', '%': '-',
'&': '-and-',
'$': '-dollar-',
'@': '-at-',
'=': '-equals-',
';': '-', ':': '-', '+': '-plus-', ',': '-', '?': '-'
};

let result = '';
let lastWasHyphen = false;

for (let i = 0; i < key.length; i++) {
const char = key[i];
const charCode = char.charCodeAt(0);

// Skip non-printable ASCII and extended ASCII
if ((charCode >= 0 && charCode <= 31) || charCode === 127 || charCode >= 128) {
continue;
}

// Handle whitespace - convert to single hyphen
if (/\s/.test(char)) {
if (!lastWasHyphen) {
result += '-';
lastWasHyphen = true;
}
continue;
}

// Replace problematic characters
if (charMap[char]) {
const replacement = charMap[char];
if (replacement === '-') {
if (!lastWasHyphen) {
result += '-';
lastWasHyphen = true;
}
} else {
result += replacement;
lastWasHyphen = false;
}
continue;
}

// Keep safe characters
result += char;
lastWasHyphen = false;
}

// Remove leading/trailing hyphens and periods
result = result.replace(/^[-\.]+|[-\.]+$/g, '');

// Ensure we don't end up with an empty string
return result || `file-${Date.now()}-${Math.random().toString(36).substring(2, 11)}`;
}
Comment on lines +50 to +112
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

cleanS3Key: solid baseline, but consider preserving Unicode and tightening post-processing

The implementation is efficient and readable. A few improvements will make it safer and more broadly compatible with S3:

  • Dropping all non-ASCII (Line 74) is aggressive; S3 supports UTF‑8 object keys. Consider percent-encoding (or configurable transliteration) instead of dropping.
  • Expansions like "-and-" can introduce double hyphens between adjacent replacements. Collapse consecutive hyphens in a final pass.
  • Guard against the 1024-byte S3 key length limit.

Suggested diffs:

  1. Preserve non-ASCII via encoding instead of dropping:
-    // Skip non-printable ASCII and extended ASCII
-    if ((charCode >= 0 && charCode <= 31) || charCode === 127 || charCode >= 128) {
-      continue;
-    }
+    // Skip non-printable ASCII; preserve Unicode by percent-encoding
+    if ((charCode >= 0 && charCode <= 31) || charCode === 127) {
+      continue;
+    }
+    if (charCode >= 128) {
+      result += encodeURIComponent(char);
+      lastWasHyphen = false;
+      continue;
+    }
  1. Collapse multiple hyphens and enforce S3 key length (1024 bytes) after trimming:
   // Remove leading/trailing hyphens and periods
   result = result.replace(/^[-\.]+|[-\.]+$/g, '');
 
+  // Collapse multiple hyphens
+  result = result.replace(/-{2,}/g, '-');
+
+  // Enforce S3 key length (bytes)
+  if (Buffer.byteLength(result, 'utf8') > 1024) {
+    // Truncate conservatively at the byte boundary
+    let trimmed = '';
+    for (let i = 0, bytes = 0; i < result.length; i++) {
+      const b = Buffer.byteLength(result[i], 'utf8');
+      if (bytes + b > 1024) break;
+      trimmed += result[i];
+      bytes += b;
+    }
+    result = trimmed;
+  }
+
   // Ensure we don't end up with an empty string
   return result || `file-${Date.now()}-${Math.random().toString(36).substring(2, 11)}`;
  1. Minor nit: avoid re-creating the charMap per invocation by hoisting it to module scope.

Example (outside the function):

const CLEAN_S3_KEY_CHARMAP = {
  '\\': '-',
  '{': '-', '}': '-',
  '^': '-', '`': '-', '[': '-', ']': '-', '"': '-', '<': '-', '>': '-',
  '~': '-', '#': '-', '|': '-', '%': '-',
  '&': '-and-',
  '$': '-dollar-',
  '@': '-at-',
  '=': '-equals-',
  ';': '-', ':': '-', '+': '-plus-', ',': '-', '?': '-'
};

and inside the function:

-  const charMap = {
-    ...
-  };
+  const charMap = CLEAN_S3_KEY_CHARMAP;
🤖 Prompt for AI Agents
In index.js around lines 50 to 112, the cleanS3Key function currently drops
non-ASCII characters, can produce consecutive hyphens from multi-character
replacements, recreates the charMap on every call, and doesn't enforce S3's
1024-byte key limit; fix by hoisting the charMap to module scope as a constant,
change the non-ASCII handling to percent-encode (or URL-encode) UTF-8 bytes
instead of skipping, after building the result collapse multiple consecutive
hyphens into a single hyphen and trim leading/trailing hyphens/periods, then
enforce the 1024-byte limit by truncating the UTF-8 encoded bytes (not
characters) and re-trimming if truncation leaves trailing hyphens/periods, and
ensure lastWasHyphen remains consistent when inserting multi-character
replacements.


function responseToBuffer(response) {
return new Promise((resolve, reject) => {
const chunks = [];
Expand Down Expand Up @@ -75,6 +139,7 @@ class S3Adapter {
this._presignedUrlExpires = parseInt(options.presignedUrlExpires, 10);
this._encryption = options.ServerSideEncryption;
this._generateKey = options.generateKey;
this._cleanKey = options.cleanKey;
this._endpoint = options.s3overrides?.endpoint;
// Optional FilesAdaptor method
this.validateFilename = options.validateFilename;
Expand Down Expand Up @@ -141,15 +206,23 @@ class S3Adapter {
// For a given config object, filename, and data, store a file in S3
// Returns a promise containing the S3 object creation response
async createFile(filename, data, contentType, options = {}) {
let finalKey = filename;

// Apply generateKey if provided
if (this._generateKey instanceof Function) {
finalKey = this._generateKey(filename, contentType, options);
}

// Apply key cleaning if enabled (after generateKey)
if (this._cleanKey) {
finalKey = cleanS3Key(finalKey);
}

Comment on lines +209 to +220
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Support async generateKey to avoid surprising breakage

If a consumer provides an async generateKey, the current code will pass a Promise to cleanS3Key and to S3 as the Key. Since createFile is already async, await generateKey to support both sync and async implementations.

-    if (this._generateKey instanceof Function) {
-      finalKey = this._generateKey(filename, contentType, options);
-    }
+    if (this._generateKey instanceof Function) {
+      finalKey = await this._generateKey(filename, contentType, options);
+    }

🏁 Script executed:

#!/bin/bash
# Locate createFile definition and check if it's async
rg -n "createFile" -C3 index.js | sed -n '1,30p'

# Specifically search for 'async createFile' in index.js
rg -n "async createFile" -n index.js

# Find where cleanS3Key is defined
rg -R "cleanS3Key" -n .

Length of output: 879


Await async generateKey in createFile

Since createFile is already async, we should await any user-provided generateKey to avoid passing a Promise into cleanS3Key or to S3.

Locations to update:

  • File: index.js
  • Method: async createFile(...)
  • Around lines 211–214

Apply this diff:

-    if (this._generateKey instanceof Function) {
-      finalKey = this._generateKey(filename, contentType, options);
-    }
+    if (this._generateKey instanceof Function) {
+      finalKey = await this._generateKey(filename, contentType, options);
+    }

This ensures both sync and async generateKey implementations are handled correctly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let finalKey = filename;
// Apply generateKey if provided
if (this._generateKey instanceof Function) {
finalKey = this._generateKey(filename, contentType, options);
}
// Apply key cleaning if enabled (after generateKey)
if (this._cleanKey) {
finalKey = cleanS3Key(finalKey);
}
let finalKey = filename;
// Apply generateKey if provided
if (this._generateKey instanceof Function) {
finalKey = await this._generateKey(filename, contentType, options);
}
// Apply key cleaning if enabled (after generateKey)
if (this._cleanKey) {
finalKey = cleanS3Key(finalKey);
}
🤖 Prompt for AI Agents
In index.js around lines 209 to 220, the code calls this._generateKey
synchronously and may pass a Promise into cleanS3Key or S3; change the block
that applies generateKey so that if this._generateKey is a function you await
its result (e.g. finalKey = await this._generateKey(filename, contentType,
options)), then continue to apply cleanS3Key and use finalKey; ensure createFile
remains async and handle the awaited value as the string key.

const params = {
Bucket: this._bucket,
Key: this._bucketPrefix + filename,
Key: this._bucketPrefix + finalKey,
Body: data,
};

if (this._generateKey instanceof Function) {
params.Key = this._bucketPrefix + this._generateKey(filename);
}
if (this._fileAcl) {
if (this._fileAcl === 'none') {
delete params.ACL;
Expand Down
67 changes: 67 additions & 0 deletions spec/test.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -867,6 +867,73 @@ describe('S3Adapter tests', () => {
expect(commandArg).toBeInstanceOf(PutObjectCommand);
expect(commandArg.input.ACL).toBeUndefined();
});

it('should clean S3 keys when cleanKey option is enabled', async () => {
const options = {
bucket: 'bucket-1',
cleanKey: true
};
const s3 = new S3Adapter(options);
s3ClientMock.send.and.returnValue(Promise.resolve({}));
s3._s3Client = s3ClientMock;

// Test filename with problematic characters
const problematicFilename = 'test file{with}[bad]chars<>&%?.txt';
await s3.createFile(problematicFilename, 'hello world', 'text/utf8', {});

expect(s3ClientMock.send).toHaveBeenCalledTimes(2);
const commands = s3ClientMock.send.calls.all();
const commandArg = commands[1].args[0];
expect(commandArg).toBeInstanceOf(PutObjectCommand);

// Should have cleaned the filename
const expectedCleanKey = 'test-file-with-chars-and-.txt';
expect(commandArg.input.Key).toBe(expectedCleanKey);
});

it('should clean generated keys when cleanKey option is enabled', async () => {
const options = {
bucket: 'bucket-1',
cleanKey: true,
generateKey: (filename) => `generated/${filename.toUpperCase()}`
};
const s3 = new S3Adapter(options);
s3ClientMock.send.and.returnValue(Promise.resolve({}));
s3._s3Client = s3ClientMock;

const problematicFilename = 'test{bad}.txt';
await s3.createFile(problematicFilename, 'hello world', 'text/utf8', {});

expect(s3ClientMock.send).toHaveBeenCalledTimes(2);
const commands = s3ClientMock.send.calls.all();
const commandArg = commands[1].args[0];
expect(commandArg).toBeInstanceOf(PutObjectCommand);

// Should have cleaned the generated key (not the original filename)
const expectedCleanKey = 'generated/TEST-BAD-.TXT';
expect(commandArg.input.Key).toBe(expectedCleanKey);
});

it('should not clean keys when cleanKey option is disabled', async () => {
const options = {
bucket: 'bucket-1',
cleanKey: false // explicitly disabled
};
const s3 = new S3Adapter(options);
s3ClientMock.send.and.returnValue(Promise.resolve({}));
s3._s3Client = s3ClientMock;

const problematicFilename = 'test file{with}[bad]chars.txt';
await s3.createFile(problematicFilename, 'hello world', 'text/utf8', {});

expect(s3ClientMock.send).toHaveBeenCalledTimes(2);
const commands = s3ClientMock.send.calls.all();
const commandArg = commands[1].args[0];
expect(commandArg).toBeInstanceOf(PutObjectCommand);

// Should preserve original filename
expect(commandArg.input.Key).toBe(problematicFilename);
});
Comment on lines +917 to +936
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Non-cleaning path validated; please add Unicode and empty-result edge cases

Two additional edge-cases worth locking down:

  • Unicode characters (e.g., 'café/Δ.txt') don’t get dropped unexpectedly (or do, if that’s desired), and behavior is documented.
  • An all-removed input results in the fallback name.

I can contribute tests for these scenarios if helpful.

🤖 Prompt for AI Agents
In spec/test.spec.js around lines 917 to 936 add two small tests extending the
existing "cleanKey disabled" spec: (1) add a test that uses a filename
containing Unicode characters (e.g., 'café/Δ.txt') and asserts that the
PutObjectCommand Key equals the original Unicode string when cleanKey is false;
(2) add a test for the all-removed-input edge case where the original filename
would be sanitized to an empty string (simulate by using a filename composed
only of characters your cleaner would remove) and assert that the adapter falls
back to the configured fallback name (or documented default) as the
PutObjectCommand Key. Ensure both tests set cleanKey: false for the Unicode case
and the relevant cleanKey behavior for the fallback case, reuse s3ClientMock
setup and assertions style (call counts and instance checks) consistent with
neighboring tests.

});

describe('handleFileStream', () => {
Expand Down