Phase 6 and Phase 7: Song Library API, and Step Functions#4
Phase 6 and Phase 7: Song Library API, and Step Functions#4theadityamittal wants to merge 3 commits intomainfrom
Conversation
…sets
Add CRUD endpoints for the user's song library:
- GET /songs — list songs with optional ?status= filter
- GET /songs/{songId} — song detail with presigned S3 URLs for stems/lyrics (COMPLETED only)
- DELETE /songs/{songId} — delete song record + S3 objects (output + upload)
- GET /presets — 5 mixing presets (Balanced, Karaoke, Drum/Bass Practice, Vocals Only)
Infrastructure:
- 4 new Lambda functions in template.yaml with appropriate IAM policies
- 4 new API Gateway routes in api.yaml with Cognito authorization
- MIXING_PRESETS constant in shared/constants.py
Tests (TDD):
- 16 new unit tests across 4 files (108 total)
- test_list_songs.py (5): empty list, multi-song, status filter, invalid status, user isolation
- test_get_song.py (5): completed with URLs, non-completed, 404, cross-user, missing params
- test_delete_song.py (4): happy path, S3 cleanup verification, 404, cross-user
- test_get_presets.py (2): preset list, structure validation
Wire Demucs and Whisper Fargate tasks into an AWS Step Functions state
machine with full error handling, progress notifications, and cleanup.
Add importable Postman collection and environment for API testing.
Step Functions orchestration:
- Add processing.asl.json with 9 states: ValidateInput, RunDemucs,
RunWhisper, MarkCompleted, SendNotification, CleanupUpload, Done,
MarkFailed, Failed
- ECS RunTask .sync integration (SFN waits natively for task completion)
- FARGATE_SPOT capacity provider with retry (5 attempts, 60s backoff)
- Timeouts: Demucs 30min, Whisper 15min
- Both ECS tasks catch States.ALL → failure handler path
- 10 DefinitionSubstitutions for cluster, task defs, subnets, SG, Lambdas
4 new Lambda handlers (all with aws-lambda-powertools structured logging):
- completion: marks song COMPLETED in DynamoDB
- cleanup: deletes original upload from S3
- notify: invokes SendProgress Lambda with status notification
- failure_handler: marks FAILED, cleans partial S3 outputs, notifies user
(notification failure is swallowed — must not raise)
Infrastructure:
- templates/orchestration.yaml: nested stack with IAM role + state machine
- templates/ecs.yaml: export 3 IAM role ARNs for SFN iam:PassRole
- template.yaml: 4 Lambdas, OrchestrationStack, wire STATE_MACHINE_ARN,
add states:StartExecution policy to ProcessUploadFunction
- layers/common/requirements.txt: add aws-lambda-powertools>=2.0
- process_upload: swap to powertools Logger
Postman collection (tests/postman/):
- unplugd-rest.postman_collection.json: 6 folders, 19 requests
- Setup folder with auto-populate from CloudFormation stack outputs
- Upload Flow, Song Library, AWS Verification (DynamoDB/S3 via Sig v4),
Step Functions monitoring
- Collection-level Cognito async/await pre-request auth script
- unplugd-dev.postman_environment.json: 26 variables (7 manual,
15 auto-populated from stack, 4 auto-populated by scripts)
- All hardcoded sensitive values (user subs, account IDs, subnet IDs,
security groups) replaced with environment variable placeholders
Tests (15 new, 123 total):
- test_completion.py (2), test_cleanup.py (2), test_notify.py (3),
test_failure_handler.py (3), test_processing_asl.py (4),
test_process_upload.py (+1 SFN invocation test)
- FakeLambdaContext fixture for powertools @inject_lambda_context
- All 123 tests passing, ruff clean, SAM validates
📝 WalkthroughWalkthroughImplements Step Functions orchestration (Phase 6) and a Song Library API (Phase 7): adds a processing ASL, orchestration SAM template, new Lambda handlers for song lifecycle and notifications/cleanup/failure, API/stack wiring, mixing presets, powertools dependency, many unit tests and a Postman collection. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client/API
participant Gateway as API Gateway
participant Upload as process_upload<br/>Lambda
participant SM as Step Functions<br/>State Machine
participant ECS as ECS Task<br/>(Demucs/Whisper)
participant Complete as completion<br/>Lambda
participant Notify as notify<br/>Lambda
participant Cleanup as cleanup<br/>Lambda
participant DDB as DynamoDB
participant S3 as S3
Client->>Gateway: POST /songs (start upload)
Gateway->>Upload: Invoke process_upload
Upload->>DDB: Create song (PROCESSING)
Upload->>SM: StartExecution(userId,songId,key)
SM->>ECS: RunDemucs (FARGATE_SPOT)
ECS->>S3: Write stems
SM->>ECS: RunWhisper
ECS->>S3: Write lyrics.json
SM->>Complete: Invoke completion Lambda
Complete->>DDB: Update status=COMPLETED
SM->>Notify: Invoke notify Lambda
Notify->>Notify: Async SendProgress invocation
SM->>Cleanup: Invoke cleanup Lambda
Cleanup->>S3: Delete uploads/{user}/{song}/
SM-->>Client: Execution complete (succeed)
sequenceDiagram
participant Client as Client/Browser
participant Auth as Cognito/Auth
participant Gateway as API Gateway
participant ListFn as list_songs<br/>Lambda
participant GetFn as get_song<br/>Lambda
participant DeleteFn as delete_song<br/>Lambda
participant DDB as DynamoDB
participant S3 as S3
Client->>Auth: Login
Auth-->>Client: idToken, userSub
Client->>Gateway: GET /songs
Gateway->>ListFn: Invoke (userId from token)
ListFn->>DDB: Query songs by userId
DDB-->>ListFn: [songs]
Client->>Gateway: GET /songs/{songId}
Gateway->>GetFn: Invoke
GetFn->>DDB: Get song item
DDB-->>GetFn: song (if COMPLETED)
GetFn->>S3: Generate presigned stem/lyrics URLs
GetFn-->>Client: song data + URLs
Client->>Gateway: DELETE /songs/{songId}
Gateway->>DeleteFn: Invoke
DeleteFn->>S3: Delete output and upload prefixes
DeleteFn->>DDB: Delete item
DeleteFn-->>Client: {"deleted": true}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/PROJECT_PLAN.md (1)
453-465:⚠️ Potential issue | 🟡 MinorDuplicate Phase 7 section — remove the stale placeholder.
Lines 371–404 contain the detailed, completed Phase 7 section. This block (lines 453–465) is the original placeholder that was never removed. It duplicates content and will confuse readers.
🤖 Fix all issues with AI agents
In `@functions/delete_song/handler.py`:
- Around line 23-25: The delete handler currently fetches the song via
get_song(user_id, song_id) then deletes it; add a guard after fetching: if
song.status == "PROCESSING" (or the enum/value used for in-flight jobs) raise a
409 Conflict (e.g., raise ConflictError(f"Song '{song_id}' is currently
processing") or return an HTTP 409) to prevent removing DB/S3 while Step
Functions are running; update any tests that assume deletion succeeds to expect
a 409 for PROCESSING status.
In `@layers/common/requirements.txt`:
- Line 6: Update the aws-lambda-powertools requirement to a v3 minimum to match
dev dependency and the v3 APIs used by handlers: change the entry for
aws-lambda-powertools in the requirements file from "aws-lambda-powertools>=2.0"
to "aws-lambda-powertools>=3.24.0" so features like
inject_lambda_context(correlation_id_path=...) are available at runtime.
In `@pyproject.toml`:
- Line 23: The pyproject.toml currently requires "aws-lambda-powertools>=3.24.0"
while the Lambda layer file layers/common/requirements.txt still pins
aws-lambda-powertools to ">=2.0", causing a runtime/test mismatch; update the
constraint in layers/common/requirements.txt to match (preferably
"aws-lambda-powertools>=3.24.0" or at minimum ">=3.0") so tests and deployed
Lambdas use the same major version and v3 APIs like Logger and
inject_lambda_context.
In `@statemachines/processing.asl.json`:
- Around line 78-136: In the RunWhisper Task definition remove the conflicting
"LaunchType": "FARGATE" entry from Parameters because a CapacityProviderStrategy
is being used (keep the existing "CapacityProviderStrategy" block); update the
RunWhisper Parameters to match RunDemucs by deleting the LaunchType key so ECS
will use the specified capacity provider, and verify ContainerOverrides and
NetworkConfiguration remain unchanged.
- Around line 186-199: The MarkFailed Task state currently has no Retry, so
transient Lambda errors can abort the workflow; update the MarkFailed state (the
Task that invokes FailureHandlerFunctionArn) to add a Retry block with at least
one or two attempts and exponential backoff (e.g., set ErrorEquals to
["States.ALL"], MaxAttempts to 2-3, IntervalSeconds to a small base like 2, and
BackoffRate to ~2) so throttles/timeouts are retried before failing; keep
ResultPath and Next unchanged.
- Around line 138-180: The three post-processing Task states (MarkCompleted,
SendNotification, CleanupUpload) lack retry and catch handling; add a Retry
block to each to handle transient Lambda errors (e.g., Lambda.ServiceException,
Lambda.AWSLambdaException, throttling, and timeouts) with exponential backoff
and limited attempts, and add a Catch on MarkCompleted that routes to the
existing MarkFailed state (using a ResultPath to capture error info) so failures
surface to failure handling; for SendNotification and CleanupUpload decide
best-effort behaviour—either add Retry + a Catch that swallows/logs the error
and continues to Done (so pipeline proceeds) or route to MarkFailed if you want
failures to abort—implement the chosen policy by adding appropriate Catch
handlers referencing MarkFailed or Done.
- Around line 18-76: The RunDemucs Task includes both "LaunchType": "FARGATE"
and a "CapacityProviderStrategy", which are mutually exclusive and cause ECS
RunTask API errors; remove the "LaunchType" property from the Parameters of the
"RunDemucs" state (the ecs:runTask.sync Task using DemucsTaskDefinitionArn /
Cluster) so the CapacityProviderStrategy is the sole launch configuration and
the call succeeds.
In `@tests/postman/unplugd-rest.postman_collection.json`:
- Around line 62-68: The JWT decoding uses atob(idToken.split('.')[1]) which
fails for base64url-encoded tokens; update the extraction around idToken (where
payload is computed and pm.environment.set('userSub') is set) to first normalize
the base64url segment (replace '-'->'+', '_'->'/', and add '=' padding to
length%4) then call atob on the normalized string and JSON.parse the result;
apply the identical normalization fix in the other occurrence in the "Get Token
(debug)" request as well.
🧹 Nitpick comments (10)
functions/get_song/handler.py (1)
5-14: Inconsistent logging approach — uses stdliblogginginstead of PowertoolsLogger.Other handlers in this PR (
completion,process_upload,cleanup,notify) useaws_lambda_powertools.Loggerfor structured logging with correlation IDs. This handler anddelete_songuse stdliblogging, losing structured context and consistency.Consider switching to Powertools
Loggerfor uniform observability across all handlers.Suggested change
-import logging +from aws_lambda_powertools import Logger from typing import Any from shared.constants import STATUS_COMPLETED @@ .. -logger = logging.getLogger(__name__) +logger = Logger(service="get_song")functions/delete_song/handler.py (2)
5-14: Same inconsistency — stdliblogginginstead of PowertoolsLogger.Same recommendation as
get_song/handler.py— switch toLogger(service="delete_song")for consistent structured logging.
27-38: Consider deleting the DynamoDB record before S3 cleanup.If the DynamoDB
delete_songcall on line 38 fails after S3 objects have already been deleted (lines 29, 34), the user is left with a song record pointing to non-existent files — a worse UX than orphaned S3 objects (which lifecycle policies can clean up).Reversing the order makes the failure mode safer: if S3 cleanup fails after the record is gone, the user no longer sees the song and orphaned objects are eventually reaped.
Suggested reordering
+ # Delete DynamoDB record first — orphaned S3 objects are safer than dangling DB records + delete_song(user_id, song_id) + logger.info("Deleted song record: userId=%s songId=%s", user_id, song_id) + # Delete output objects (stems + lyrics) output_prefix = f"output/{user_id}/{song_id}/" deleted_output = delete_objects_by_prefix(OUTPUT_BUCKET_NAME, output_prefix) logger.info("Deleted %d output objects for songId=%s", deleted_output, song_id) # Delete upload objects (may already be cleaned up by lifecycle policy) upload_prefix = f"uploads/{user_id}/{song_id}/" deleted_upload = delete_objects_by_prefix(UPLOAD_BUCKET_NAME, upload_prefix) logger.info("Deleted %d upload objects for songId=%s", deleted_upload, song_id) - # Delete DynamoDB record - delete_song(user_id, song_id) - - logger.info("Deleted song: userId=%s songId=%s", user_id, song_id) return success({"deleted": True})docs/PROJECT_PLAN.md (1)
426-429: Add a language specifier to the fenced code block.The flow diagram code block lacks a language identifier, which triggers a markdownlint warning (MD040). Use
```textfor plain-text diagrams.Proposed fix
-``` +```text ValidateInput → RunDemucs (30min timeout) → RunWhisper (15min timeout) → MarkCompleted → SendNotification → CleanupUpload → Done On error: MarkFailed (DDB + S3 cleanup + notify) → Failed</details> </blockquote></details> <details> <summary>docs/PROJECT.md (1)</summary><blockquote> `122-134`: **Add a language specifier to the fenced code block.** Same markdownlint MD040 issue — use ` ```text ` for the workflow diagram. <details> <summary>Proposed fix</summary> ```diff -``` +```text ValidateInput (Pass — computes outputPrefix + vocalsKey)functions/failure_handler/handler.py (1)
48-64: Notification payload construction is duplicated withnotify/handler.py.The
failure_handlerbuilds the invoke payload inline (lines 50-57) with the same structure asnotify/handler.py(lines 26-34). Both construct{"userId": ..., "message": {"type": ..., "songId": ..., "message": ...}}and invokeSEND_PROGRESS_FUNCTION_ARNwithInvocationType: "Event".This is understandable given the different error-handling semantics (best-effort here vs. propagating in the notify handler), but worth noting for future maintainability — if the payload schema changes, both locations need updating.
tests/unit/test_completion.py (2)
11-26:_seed_songhelper is duplicated across test files.This exact helper appears in both
test_completion.pyandtest_failure_handler.py. Consider extracting it intotests/conftest.pyor a shared test utility module to reduce duplication.
46-53: Ruff false positive —dynamodb_tablesfixture is required for side effects.The
dynamodb_tablesparameter is needed to activate themock_aws()context and create the mocked DynamoDB tables, even though it's not directly referenced in the test body. Consider suppressing the warning:-def test_song_not_found_raises(dynamodb_tables: dict[str, Any], lambda_context: Any) -> None: +def test_song_not_found_raises(dynamodb_tables: dict[str, Any], lambda_context: Any) -> None: # noqa: ARG001tests/unit/test_failure_handler.py (1)
13-28: Consider extracting_seed_songinto a shared fixture or helper inconftest.py.This helper is duplicated (with minor variations in defaults) across
test_failure_handler.py,test_completion.py,test_delete_song.py,test_get_song.py, andtest_list_songs.py. A shared helper inconftest.pywould reduce duplication and ensure consistent test data shape.template.yaml (1)
288-312: ConsiderDynamoDBReadPolicyinstead ofDynamoDBCrudPolicyfor read-only functions.
ListSongsFunctionandGetSongFunctiononly need to query/read from DynamoDB.DynamoDBCrudPolicygrants full CRUD (includingPutItem,DeleteItem,UpdateItem), which violates least-privilege.Proposed fix
ListSongsFunction: Type: AWS::Serverless::Function Properties: ... Policies: - - DynamoDBCrudPolicy: + - DynamoDBReadPolicy: TableName: !GetAtt StorageStack.Outputs.SongsTableName GetSongFunction: Type: AWS::Serverless::Function Properties: ... Policies: - - DynamoDBCrudPolicy: + - DynamoDBReadPolicy: TableName: !GetAtt StorageStack.Outputs.SongsTableName - S3ReadPolicy: BucketName: !GetAtt StorageStack.Outputs.OutputBucketName
| song = get_song(user_id, song_id) | ||
| if song is None: | ||
| raise NotFoundError(f"Song '{song_id}' not found") |
There was a problem hiding this comment.
No guard against deleting a song that is currently being processed.
If a user calls DELETE /songs/{songId} while the Step Functions pipeline is actively processing (status PROCESSING), this will remove the DB record and S3 objects, causing the in-flight execution to fail unpredictably. Consider rejecting deletion for songs in PROCESSING status with a 409 Conflict.
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 25-25: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@functions/delete_song/handler.py` around lines 23 - 25, The delete handler
currently fetches the song via get_song(user_id, song_id) then deletes it; add a
guard after fetching: if song.status == "PROCESSING" (or the enum/value used for
in-flight jobs) raise a 409 Conflict (e.g., raise ConflictError(f"Song
'{song_id}' is currently processing") or return an HTTP 409) to prevent removing
DB/S3 while Step Functions are running; update any tests that assume deletion
succeeds to expect a 409 for PROCESSING status.
| typing_extensions>=4.0 | ||
| PyJWT>=2.8 | ||
| cryptography>=42.0 | ||
| aws-lambda-powertools>=2.0 |
There was a problem hiding this comment.
Tighten the version floor to match the dev dependency and the v3 APIs used in handlers.
Handlers use v3 features (e.g., correlation_id_path kwarg on inject_lambda_context). With >=2.0, pip may resolve to a 2.x release in the Lambda layer, breaking at runtime. Bump to at least >=3.24.0 to match pyproject.toml.
Proposed fix
-aws-lambda-powertools>=2.0
+aws-lambda-powertools>=3.24.0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| aws-lambda-powertools>=2.0 | |
| aws-lambda-powertools>=3.24.0 |
🤖 Prompt for AI Agents
In `@layers/common/requirements.txt` at line 6, Update the aws-lambda-powertools
requirement to a v3 minimum to match dev dependency and the v3 APIs used by
handlers: change the entry for aws-lambda-powertools in the requirements file
from "aws-lambda-powertools>=2.0" to "aws-lambda-powertools>=3.24.0" so features
like inject_lambda_context(correlation_id_path=...) are available at runtime.
| "pre-commit>=4.0", | ||
| "pyjwt>=2.11.0", | ||
| "cryptography>=46.0.4", | ||
| "aws-lambda-powertools>=3.24.0", |
There was a problem hiding this comment.
Version mismatch with runtime dependency in layers/common/requirements.txt.
Dev pins aws-lambda-powertools>=3.24.0 here, but the Lambda layer (layers/common/requirements.txt) pins >=2.0. Tests will run against v3 APIs (e.g., Logger behavior, inject_lambda_context kwargs) while the deployed Lambda could resolve to a v2.x release. This can cause silent runtime failures that tests won't catch.
Align the two constraints — ideally both should require >=3.24.0 (or at least >=3.0).
🤖 Prompt for AI Agents
In `@pyproject.toml` at line 23, The pyproject.toml currently requires
"aws-lambda-powertools>=3.24.0" while the Lambda layer file
layers/common/requirements.txt still pins aws-lambda-powertools to ">=2.0",
causing a runtime/test mismatch; update the constraint in
layers/common/requirements.txt to match (preferably
"aws-lambda-powertools>=3.24.0" or at minimum ">=3.0") so tests and deployed
Lambdas use the same major version and v3 APIs like Logger and
inject_lambda_context.
| "MarkCompleted": { | ||
| "Type": "Task", | ||
| "Resource": "arn:aws:states:::lambda:invoke", | ||
| "Parameters": { | ||
| "FunctionName": "${CompletionFunctionArn}", | ||
| "Payload": { | ||
| "userId.$": "$.userId", | ||
| "songId.$": "$.songId" | ||
| } | ||
| }, | ||
| "ResultPath": "$.completionResult", | ||
| "Next": "SendNotification" | ||
| }, | ||
|
|
||
| "SendNotification": { | ||
| "Type": "Task", | ||
| "Resource": "arn:aws:states:::lambda:invoke", | ||
| "Parameters": { | ||
| "FunctionName": "${NotifyFunctionArn}", | ||
| "Payload": { | ||
| "userId.$": "$.userId", | ||
| "songId.$": "$.songId", | ||
| "status": "COMPLETED" | ||
| } | ||
| }, | ||
| "ResultPath": "$.notifyResult", | ||
| "Next": "CleanupUpload" | ||
| }, | ||
|
|
||
| "CleanupUpload": { | ||
| "Type": "Task", | ||
| "Resource": "arn:aws:states:::lambda:invoke", | ||
| "Parameters": { | ||
| "FunctionName": "${CleanupFunctionArn}", | ||
| "Payload": { | ||
| "userId.$": "$.userId", | ||
| "songId.$": "$.songId", | ||
| "key.$": "$.key" | ||
| } | ||
| }, | ||
| "ResultPath": "$.cleanupResult", | ||
| "Next": "Done" | ||
| }, |
There was a problem hiding this comment.
Post-processing Lambda states lack Retry and Catch blocks.
MarkCompleted, SendNotification, and CleanupUpload have no error handling. If any of these Lambdas fails (transient error, throttle, timeout), the entire execution fails without routing through MarkFailed, so the song stays in a limbo status with no user notification.
At minimum, add retries for transient Lambda errors and a Catch to MarkFailed on MarkCompleted. For SendNotification and CleanupUpload, consider whether a failure there should block the pipeline or be best-effort.
Example: adding Retry + Catch to MarkCompleted
"MarkCompleted": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${CompletionFunctionArn}",
"Payload": {
"userId.$": "$.userId",
"songId.$": "$.songId"
}
},
"ResultPath": "$.completionResult",
+ "Retry": [
+ {
+ "ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException", "Lambda.AWSLambdaException"],
+ "IntervalSeconds": 5,
+ "MaxAttempts": 3,
+ "BackoffRate": 2
+ }
+ ],
+ "Catch": [
+ {
+ "ErrorEquals": ["States.ALL"],
+ "Next": "MarkFailed",
+ "ResultPath": "$.error"
+ }
+ ],
"Next": "SendNotification"
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "MarkCompleted": { | |
| "Type": "Task", | |
| "Resource": "arn:aws:states:::lambda:invoke", | |
| "Parameters": { | |
| "FunctionName": "${CompletionFunctionArn}", | |
| "Payload": { | |
| "userId.$": "$.userId", | |
| "songId.$": "$.songId" | |
| } | |
| }, | |
| "ResultPath": "$.completionResult", | |
| "Next": "SendNotification" | |
| }, | |
| "SendNotification": { | |
| "Type": "Task", | |
| "Resource": "arn:aws:states:::lambda:invoke", | |
| "Parameters": { | |
| "FunctionName": "${NotifyFunctionArn}", | |
| "Payload": { | |
| "userId.$": "$.userId", | |
| "songId.$": "$.songId", | |
| "status": "COMPLETED" | |
| } | |
| }, | |
| "ResultPath": "$.notifyResult", | |
| "Next": "CleanupUpload" | |
| }, | |
| "CleanupUpload": { | |
| "Type": "Task", | |
| "Resource": "arn:aws:states:::lambda:invoke", | |
| "Parameters": { | |
| "FunctionName": "${CleanupFunctionArn}", | |
| "Payload": { | |
| "userId.$": "$.userId", | |
| "songId.$": "$.songId", | |
| "key.$": "$.key" | |
| } | |
| }, | |
| "ResultPath": "$.cleanupResult", | |
| "Next": "Done" | |
| }, | |
| "MarkCompleted": { | |
| "Type": "Task", | |
| "Resource": "arn:aws:states:::lambda:invoke", | |
| "Parameters": { | |
| "FunctionName": "${CompletionFunctionArn}", | |
| "Payload": { | |
| "userId.$": "$.userId", | |
| "songId.$": "$.songId" | |
| } | |
| }, | |
| "ResultPath": "$.completionResult", | |
| "Retry": [ | |
| { | |
| "ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException", "Lambda.AWSLambdaException"], | |
| "IntervalSeconds": 5, | |
| "MaxAttempts": 3, | |
| "BackoffRate": 2 | |
| } | |
| ], | |
| "Catch": [ | |
| { | |
| "ErrorEquals": ["States.ALL"], | |
| "Next": "MarkFailed", | |
| "ResultPath": "$.error" | |
| } | |
| ], | |
| "Next": "SendNotification" | |
| }, | |
| "SendNotification": { | |
| "Type": "Task", | |
| "Resource": "arn:aws:states:::lambda:invoke", | |
| "Parameters": { | |
| "FunctionName": "${NotifyFunctionArn}", | |
| "Payload": { | |
| "userId.$": "$.userId", | |
| "songId.$": "$.songId", | |
| "status": "COMPLETED" | |
| } | |
| }, | |
| "ResultPath": "$.notifyResult", | |
| "Next": "CleanupUpload" | |
| }, | |
| "CleanupUpload": { | |
| "Type": "Task", | |
| "Resource": "arn:aws:states:::lambda:invoke", | |
| "Parameters": { | |
| "FunctionName": "${CleanupFunctionArn}", | |
| "Payload": { | |
| "userId.$": "$.userId", | |
| "songId.$": "$.songId", | |
| "key.$": "$.key" | |
| } | |
| }, | |
| "ResultPath": "$.cleanupResult", | |
| "Next": "Done" | |
| }, |
🤖 Prompt for AI Agents
In `@statemachines/processing.asl.json` around lines 138 - 180, The three
post-processing Task states (MarkCompleted, SendNotification, CleanupUpload)
lack retry and catch handling; add a Retry block to each to handle transient
Lambda errors (e.g., Lambda.ServiceException, Lambda.AWSLambdaException,
throttling, and timeouts) with exponential backoff and limited attempts, and add
a Catch on MarkCompleted that routes to the existing MarkFailed state (using a
ResultPath to capture error info) so failures surface to failure handling; for
SendNotification and CleanupUpload decide best-effort behaviour—either add Retry
+ a Catch that swallows/logs the error and continues to Done (so pipeline
proceeds) or route to MarkFailed if you want failures to abort—implement the
chosen policy by adding appropriate Catch handlers referencing MarkFailed or
Done.
| "MarkFailed": { | ||
| "Type": "Task", | ||
| "Resource": "arn:aws:states:::lambda:invoke", | ||
| "Parameters": { | ||
| "FunctionName": "${FailureHandlerFunctionArn}", | ||
| "Payload": { | ||
| "userId.$": "$.userId", | ||
| "songId.$": "$.songId", | ||
| "error.$": "$.error" | ||
| } | ||
| }, | ||
| "ResultPath": "$.failureResult", | ||
| "Next": "Failed" | ||
| }, |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
MarkFailed itself has no Retry — a transient failure here silently loses error context.
If the failure-handler Lambda is throttled or times out, the execution terminates with an unhandled error and the song record is never marked as failed. Adding a small retry with backoff here is a low-cost safety net.
Proposed fix
"MarkFailed": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${FailureHandlerFunctionArn}",
"Payload": {
"userId.$": "$.userId",
"songId.$": "$.songId",
"error.$": "$.error"
}
},
"ResultPath": "$.failureResult",
+ "Retry": [
+ {
+ "ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException", "Lambda.AWSLambdaException"],
+ "IntervalSeconds": 5,
+ "MaxAttempts": 3,
+ "BackoffRate": 2
+ }
+ ],
"Next": "Failed"
},🤖 Prompt for AI Agents
In `@statemachines/processing.asl.json` around lines 186 - 199, The MarkFailed
Task state currently has no Retry, so transient Lambda errors can abort the
workflow; update the MarkFailed state (the Task that invokes
FailureHandlerFunctionArn) to add a Retry block with at least one or two
attempts and exponential backoff (e.g., set ErrorEquals to ["States.ALL"],
MaxAttempts to 2-3, IntervalSeconds to a small base like 2, and BackoffRate to
~2) so throttles/timeouts are retried before failing; keep ResultPath and Next
unchanged.
| "if (json.AuthenticationResult) {", | ||
| " const idToken = json.AuthenticationResult.IdToken;", | ||
| " pm.environment.set('idToken', idToken);", | ||
| "", | ||
| " // Decode JWT to extract user sub", | ||
| " const payload = JSON.parse(atob(idToken.split('.')[1]));", | ||
| " pm.environment.set('userSub', payload.sub);", |
There was a problem hiding this comment.
atob() does not handle base64url encoding — JWT decoding may fail intermittently.
JWT payloads use base64url encoding (- and _ instead of + and /, no padding). atob() expects standard base64 and will throw on tokens containing those characters. While many Cognito tokens happen to work, this is fragile.
The same issue exists in the debug "Get Token" request at Line 195.
🔧 Suggested fix: normalize base64url → base64 before decoding
- " const payload = JSON.parse(atob(idToken.split('.')[1]));",
+ " const base64Url = idToken.split('.')[1];",
+ " const base64 = base64Url.replace(/-/g, '+').replace(/_/g, '/');",
+ " const payload = JSON.parse(atob(base64));",Apply the same fix at the Get Token (debug) request (Line 195).
🤖 Prompt for AI Agents
In `@tests/postman/unplugd-rest.postman_collection.json` around lines 62 - 68, The
JWT decoding uses atob(idToken.split('.')[1]) which fails for base64url-encoded
tokens; update the extraction around idToken (where payload is computed and
pm.environment.set('userSub') is set) to first normalize the base64url segment
(replace '-'->'+', '_'->'/', and add '=' padding to length%4) then call atob on
the normalized string and JSON.parse the result; apply the identical
normalization fix in the other occurrence in the "Get Token (debug)" request as
well.
Remove duplicate LaunchType from RunDemucs/RunWhisper ASL states that conflicted with CapacityProviderStrategy (FARGATE_SPOT). Add Phase 7.5 section to PROJECT_PLAN with planned work: SQS backpressure queue, stale errorMessage fix, retry endpoint, and powertools logging migration.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/PROJECT_PLAN.md`:
- Around line 376-383: Docs list of mixing presets is missing one item; update
the `/presets` endpoint description to enumerate all five presets separately.
Edit the text that currently reads "Balanced, Karaoke, Drum/Bass Practice,
Vocals Only" and replace it with "Balanced, Karaoke, Drum Practice, Bass
Practice, Vocals Only" so it matches the distinct presets implemented by the
get_presets/handler.py and the templates/api.yaml route definition. Ensure the
parenthetical list now contains five comma-separated items and stays consistent
with the codebase presets.
🧹 Nitpick comments (1)
docs/PROJECT_PLAN.md (1)
426-430: Add a language identifier to the fenced code block.The code block on line 427 lacks a language specifier, flagged by markdownlint (MD040).
Proposed fix
-``` +```text ValidateInput → RunDemucs (30min timeout) → RunWhisper (15min timeout) → MarkCompleted → SendNotification → CleanupUpload → Done On error: MarkFailed (DDB + S3 cleanup + notify) → Failed</details> </blockquote></details> </blockquote></details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
| **Endpoints** (4 new routes in `templates/api.yaml`, all Cognito-authorized): | ||
|
|
||
| | Method | Path | Handler | Description | | ||
| |--------|------|---------|-------------| | ||
| | `GET` | `/songs` | `list_songs/handler.py` | List user's songs (optional `?status=` filter) | | ||
| | `GET` | `/songs/{songId}` | `get_song/handler.py` | Song details + presigned stem/lyrics URLs (if COMPLETED) | | ||
| | `DELETE` | `/songs/{songId}` | `delete_song/handler.py` | Delete song record + S3 objects (any status) | | ||
| | `GET` | `/presets` | `get_presets/handler.py` | 5 mixing presets (Balanced, Karaoke, Drum/Bass Practice, Vocals Only) | |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for MIXING_PRESETS definition across the codebase
echo "=== Searching for MIXING_PRESETS definition ==="
rg -B 2 -A 20 "MIXING_PRESETS" --type pyRepository: theadityamittal/unplugd
Length of output: 2325
🏁 Script executed:
#!/bin/bash
# Read the full MIXING_PRESETS definition from constants.py
cat -n functions/shared/constants.py | grep -A 60 "MIXING_PRESETS:"Repository: theadityamittal/unplugd
Length of output: 1367
Fix incomplete preset list in description: "Drum/Bass Practice" conflates two separate presets.
The docs correctly state "5 mixing presets", but the parenthetical list is incomplete. It says "Balanced, Karaoke, Drum/Bass Practice, Vocals Only" (4 items), when it should list all 5 separately: "Balanced, Karaoke, Drum Practice, Bass Practice, Vocals Only". The codebase has distinct presets for Drum Practice and Bass Practice.
🤖 Prompt for AI Agents
In `@docs/PROJECT_PLAN.md` around lines 376 - 383, Docs list of mixing presets is
missing one item; update the `/presets` endpoint description to enumerate all
five presets separately. Edit the text that currently reads "Balanced, Karaoke,
Drum/Bass Practice, Vocals Only" and replace it with "Balanced, Karaoke, Drum
Practice, Bass Practice, Vocals Only" so it matches the distinct presets
implemented by the get_presets/handler.py and the templates/api.yaml route
definition. Ensure the parenthetical list now contains five comma-separated
items and stays consistent with the codebase presets.
Summary by CodeRabbit
New Features
Documentation
Tests