Skip to content

Phase 6 and Phase 7: Song Library API, and Step Functions#4

Open
theadityamittal wants to merge 3 commits intomainfrom
phase-6-7
Open

Phase 6 and Phase 7: Song Library API, and Step Functions#4
theadityamittal wants to merge 3 commits intomainfrom
phase-6-7

Conversation

@theadityamittal
Copy link
Copy Markdown
Owner

@theadityamittal theadityamittal commented Feb 8, 2026

Summary by CodeRabbit

  • New Features

    • Song Library API: list, get, delete songs (filter by status) and presigned URLs for stems/lyrics
    • GET /presets returning five mixing presets (Balanced, Karaoke, Drum Practice, Bass Practice, Vocals Only)
    • Orchestrated audio pipeline: Step Functions workflow with completion notifications and automatic upload cleanup
  • Documentation

    • Updated project docs and quick-start to reflect Phase 6/7 progress and new test counts
  • Tests

    • +31 unit tests (now 123 total) and a Postman collection for API/E2E validation

…sets

Add CRUD endpoints for the user's song library:
- GET /songs — list songs with optional ?status= filter
- GET /songs/{songId} — song detail with presigned S3 URLs for stems/lyrics (COMPLETED only)
- DELETE /songs/{songId} — delete song record + S3 objects (output + upload)
- GET /presets — 5 mixing presets (Balanced, Karaoke, Drum/Bass Practice, Vocals Only)

Infrastructure:
- 4 new Lambda functions in template.yaml with appropriate IAM policies
- 4 new API Gateway routes in api.yaml with Cognito authorization
- MIXING_PRESETS constant in shared/constants.py

Tests (TDD):
- 16 new unit tests across 4 files (108 total)
- test_list_songs.py (5): empty list, multi-song, status filter, invalid status, user isolation
- test_get_song.py (5): completed with URLs, non-completed, 404, cross-user, missing params
- test_delete_song.py (4): happy path, S3 cleanup verification, 404, cross-user
- test_get_presets.py (2): preset list, structure validation
Wire Demucs and Whisper Fargate tasks into an AWS Step Functions state
machine with full error handling, progress notifications, and cleanup.
Add importable Postman collection and environment for API testing.

Step Functions orchestration:
- Add processing.asl.json with 9 states: ValidateInput, RunDemucs,
  RunWhisper, MarkCompleted, SendNotification, CleanupUpload, Done,
  MarkFailed, Failed
- ECS RunTask .sync integration (SFN waits natively for task completion)
- FARGATE_SPOT capacity provider with retry (5 attempts, 60s backoff)
- Timeouts: Demucs 30min, Whisper 15min
- Both ECS tasks catch States.ALL → failure handler path
- 10 DefinitionSubstitutions for cluster, task defs, subnets, SG, Lambdas

4 new Lambda handlers (all with aws-lambda-powertools structured logging):
- completion: marks song COMPLETED in DynamoDB
- cleanup: deletes original upload from S3
- notify: invokes SendProgress Lambda with status notification
- failure_handler: marks FAILED, cleans partial S3 outputs, notifies user
  (notification failure is swallowed — must not raise)

Infrastructure:
- templates/orchestration.yaml: nested stack with IAM role + state machine
- templates/ecs.yaml: export 3 IAM role ARNs for SFN iam:PassRole
- template.yaml: 4 Lambdas, OrchestrationStack, wire STATE_MACHINE_ARN,
  add states:StartExecution policy to ProcessUploadFunction
- layers/common/requirements.txt: add aws-lambda-powertools>=2.0
- process_upload: swap to powertools Logger

Postman collection (tests/postman/):
- unplugd-rest.postman_collection.json: 6 folders, 19 requests
  - Setup folder with auto-populate from CloudFormation stack outputs
  - Upload Flow, Song Library, AWS Verification (DynamoDB/S3 via Sig v4),
    Step Functions monitoring
  - Collection-level Cognito async/await pre-request auth script
- unplugd-dev.postman_environment.json: 26 variables (7 manual,
  15 auto-populated from stack, 4 auto-populated by scripts)
- All hardcoded sensitive values (user subs, account IDs, subnet IDs,
  security groups) replaced with environment variable placeholders

Tests (15 new, 123 total):
- test_completion.py (2), test_cleanup.py (2), test_notify.py (3),
  test_failure_handler.py (3), test_processing_asl.py (4),
  test_process_upload.py (+1 SFN invocation test)
- FakeLambdaContext fixture for powertools @inject_lambda_context
- All 123 tests passing, ruff clean, SAM validates
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 8, 2026

📝 Walkthrough

Walkthrough

Implements Step Functions orchestration (Phase 6) and a Song Library API (Phase 7): adds a processing ASL, orchestration SAM template, new Lambda handlers for song lifecycle and notifications/cleanup/failure, API/stack wiring, mixing presets, powertools dependency, many unit tests and a Postman collection.

Changes

Cohort / File(s) Summary
State Machine & Orchestration Template
statemachines/processing.asl.json, templates/orchestration.yaml
New Step Functions definition and SAM orchestration template wiring ECS tasks, Lambda callbacks, IAM role and DefinitionSubstitutions for Demucs/Whisper, completion/notify/cleanup/failure flows and retries/catches.
API Templates & Stack Wiring
template.yaml, templates/api.yaml, templates/ecs.yaml
Added Song Library API routes and Lambda permission parameters, exposed StateMachineArn to ProcessUploadFunction and granted states:StartExecution, and added ECS task role ARN outputs.
Song Library & Orchestration Lambdas
functions/list_songs/handler.py, functions/get_song/handler.py, functions/delete_song/handler.py, functions/get_presets/handler.py, functions/completion/handler.py, functions/cleanup/handler.py, functions/notify/handler.py, functions/failure_handler/handler.py
New Lambda handlers: list/get/delete songs and get presets; orchestration helpers to mark completion, cleanup uploads, send notifications, and handle failures (DynamoDB updates, S3 cleanup, best-effort notification).
Shared Constants & Logging
functions/shared/constants.py, functions/process_upload/handler.py
Added MIXING_PRESETS constant (5 presets). Switched process_upload to use aws-lambda-powertools Logger.
Dependencies & Packaging
layers/common/requirements.txt, pyproject.toml
Added aws-lambda-powertools to layer requirements and dev dependencies.
Unit & Integration Tests
tests/unit/*, tests/conftest.py, tests/postman/*
Many new unit tests for all handlers and ASL validation, added FakeLambdaContext fixture, Postman environment and collection for end-to-end flows; test count increased to ~123.
Documentation & Project Files
README.md, docs/PROJECT.md, docs/PROJECT_PLAN.md
Updated docs and status tables to mark Phase 6 and Phase 7 DONE, add Phase 7.5 in progress, and update test counts and workflow descriptions.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/API
    participant Gateway as API Gateway
    participant Upload as process_upload<br/>Lambda
    participant SM as Step Functions<br/>State Machine
    participant ECS as ECS Task<br/>(Demucs/Whisper)
    participant Complete as completion<br/>Lambda
    participant Notify as notify<br/>Lambda
    participant Cleanup as cleanup<br/>Lambda
    participant DDB as DynamoDB
    participant S3 as S3

    Client->>Gateway: POST /songs (start upload)
    Gateway->>Upload: Invoke process_upload
    Upload->>DDB: Create song (PROCESSING)
    Upload->>SM: StartExecution(userId,songId,key)
    SM->>ECS: RunDemucs (FARGATE_SPOT)
    ECS->>S3: Write stems
    SM->>ECS: RunWhisper
    ECS->>S3: Write lyrics.json
    SM->>Complete: Invoke completion Lambda
    Complete->>DDB: Update status=COMPLETED
    SM->>Notify: Invoke notify Lambda
    Notify->>Notify: Async SendProgress invocation
    SM->>Cleanup: Invoke cleanup Lambda
    Cleanup->>S3: Delete uploads/{user}/{song}/
    SM-->>Client: Execution complete (succeed)
Loading
sequenceDiagram
    participant Client as Client/Browser
    participant Auth as Cognito/Auth
    participant Gateway as API Gateway
    participant ListFn as list_songs<br/>Lambda
    participant GetFn as get_song<br/>Lambda
    participant DeleteFn as delete_song<br/>Lambda
    participant DDB as DynamoDB
    participant S3 as S3

    Client->>Auth: Login
    Auth-->>Client: idToken, userSub
    Client->>Gateway: GET /songs
    Gateway->>ListFn: Invoke (userId from token)
    ListFn->>DDB: Query songs by userId
    DDB-->>ListFn: [songs]
    Client->>Gateway: GET /songs/{songId}
    Gateway->>GetFn: Invoke
    GetFn->>DDB: Get song item
    DDB-->>GetFn: song (if COMPLETED)
    GetFn->>S3: Generate presigned stem/lyrics URLs
    GetFn-->>Client: song data + URLs
    Client->>Gateway: DELETE /songs/{songId}
    Gateway->>DeleteFn: Invoke
    DeleteFn->>S3: Delete output and upload prefixes
    DeleteFn->>DDB: Delete item
    DeleteFn-->>Client: {"deleted": true}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Seven handlers hop and dance,
Step Functions lead the songs to chance,
From demucs drums to whispered lines,
We clean, we mark, we send the signs;
The rabbit cheers — orchestration shines!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main changes: implementing Phase 6 Step Functions orchestration and Phase 7 Song Library API endpoints with associated infrastructure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch phase-6-7

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/PROJECT_PLAN.md (1)

453-465: ⚠️ Potential issue | 🟡 Minor

Duplicate Phase 7 section — remove the stale placeholder.

Lines 371–404 contain the detailed, completed Phase 7 section. This block (lines 453–465) is the original placeholder that was never removed. It duplicates content and will confuse readers.

🤖 Fix all issues with AI agents
In `@functions/delete_song/handler.py`:
- Around line 23-25: The delete handler currently fetches the song via
get_song(user_id, song_id) then deletes it; add a guard after fetching: if
song.status == "PROCESSING" (or the enum/value used for in-flight jobs) raise a
409 Conflict (e.g., raise ConflictError(f"Song '{song_id}' is currently
processing") or return an HTTP 409) to prevent removing DB/S3 while Step
Functions are running; update any tests that assume deletion succeeds to expect
a 409 for PROCESSING status.

In `@layers/common/requirements.txt`:
- Line 6: Update the aws-lambda-powertools requirement to a v3 minimum to match
dev dependency and the v3 APIs used by handlers: change the entry for
aws-lambda-powertools in the requirements file from "aws-lambda-powertools>=2.0"
to "aws-lambda-powertools>=3.24.0" so features like
inject_lambda_context(correlation_id_path=...) are available at runtime.

In `@pyproject.toml`:
- Line 23: The pyproject.toml currently requires "aws-lambda-powertools>=3.24.0"
while the Lambda layer file layers/common/requirements.txt still pins
aws-lambda-powertools to ">=2.0", causing a runtime/test mismatch; update the
constraint in layers/common/requirements.txt to match (preferably
"aws-lambda-powertools>=3.24.0" or at minimum ">=3.0") so tests and deployed
Lambdas use the same major version and v3 APIs like Logger and
inject_lambda_context.

In `@statemachines/processing.asl.json`:
- Around line 78-136: In the RunWhisper Task definition remove the conflicting
"LaunchType": "FARGATE" entry from Parameters because a CapacityProviderStrategy
is being used (keep the existing "CapacityProviderStrategy" block); update the
RunWhisper Parameters to match RunDemucs by deleting the LaunchType key so ECS
will use the specified capacity provider, and verify ContainerOverrides and
NetworkConfiguration remain unchanged.
- Around line 186-199: The MarkFailed Task state currently has no Retry, so
transient Lambda errors can abort the workflow; update the MarkFailed state (the
Task that invokes FailureHandlerFunctionArn) to add a Retry block with at least
one or two attempts and exponential backoff (e.g., set ErrorEquals to
["States.ALL"], MaxAttempts to 2-3, IntervalSeconds to a small base like 2, and
BackoffRate to ~2) so throttles/timeouts are retried before failing; keep
ResultPath and Next unchanged.
- Around line 138-180: The three post-processing Task states (MarkCompleted,
SendNotification, CleanupUpload) lack retry and catch handling; add a Retry
block to each to handle transient Lambda errors (e.g., Lambda.ServiceException,
Lambda.AWSLambdaException, throttling, and timeouts) with exponential backoff
and limited attempts, and add a Catch on MarkCompleted that routes to the
existing MarkFailed state (using a ResultPath to capture error info) so failures
surface to failure handling; for SendNotification and CleanupUpload decide
best-effort behaviour—either add Retry + a Catch that swallows/logs the error
and continues to Done (so pipeline proceeds) or route to MarkFailed if you want
failures to abort—implement the chosen policy by adding appropriate Catch
handlers referencing MarkFailed or Done.
- Around line 18-76: The RunDemucs Task includes both "LaunchType": "FARGATE"
and a "CapacityProviderStrategy", which are mutually exclusive and cause ECS
RunTask API errors; remove the "LaunchType" property from the Parameters of the
"RunDemucs" state (the ecs:runTask.sync Task using DemucsTaskDefinitionArn /
Cluster) so the CapacityProviderStrategy is the sole launch configuration and
the call succeeds.

In `@tests/postman/unplugd-rest.postman_collection.json`:
- Around line 62-68: The JWT decoding uses atob(idToken.split('.')[1]) which
fails for base64url-encoded tokens; update the extraction around idToken (where
payload is computed and pm.environment.set('userSub') is set) to first normalize
the base64url segment (replace '-'->'+', '_'->'/', and add '=' padding to
length%4) then call atob on the normalized string and JSON.parse the result;
apply the identical normalization fix in the other occurrence in the "Get Token
(debug)" request as well.
🧹 Nitpick comments (10)
functions/get_song/handler.py (1)

5-14: Inconsistent logging approach — uses stdlib logging instead of Powertools Logger.

Other handlers in this PR (completion, process_upload, cleanup, notify) use aws_lambda_powertools.Logger for structured logging with correlation IDs. This handler and delete_song use stdlib logging, losing structured context and consistency.

Consider switching to Powertools Logger for uniform observability across all handlers.

Suggested change
-import logging
+from aws_lambda_powertools import Logger
 from typing import Any

 from shared.constants import STATUS_COMPLETED
@@ ..
-logger = logging.getLogger(__name__)
+logger = Logger(service="get_song")
functions/delete_song/handler.py (2)

5-14: Same inconsistency — stdlib logging instead of Powertools Logger.

Same recommendation as get_song/handler.py — switch to Logger(service="delete_song") for consistent structured logging.


27-38: Consider deleting the DynamoDB record before S3 cleanup.

If the DynamoDB delete_song call on line 38 fails after S3 objects have already been deleted (lines 29, 34), the user is left with a song record pointing to non-existent files — a worse UX than orphaned S3 objects (which lifecycle policies can clean up).

Reversing the order makes the failure mode safer: if S3 cleanup fails after the record is gone, the user no longer sees the song and orphaned objects are eventually reaped.

Suggested reordering
+    # Delete DynamoDB record first — orphaned S3 objects are safer than dangling DB records
+    delete_song(user_id, song_id)
+    logger.info("Deleted song record: userId=%s songId=%s", user_id, song_id)
+
     # Delete output objects (stems + lyrics)
     output_prefix = f"output/{user_id}/{song_id}/"
     deleted_output = delete_objects_by_prefix(OUTPUT_BUCKET_NAME, output_prefix)
     logger.info("Deleted %d output objects for songId=%s", deleted_output, song_id)

     # Delete upload objects (may already be cleaned up by lifecycle policy)
     upload_prefix = f"uploads/{user_id}/{song_id}/"
     deleted_upload = delete_objects_by_prefix(UPLOAD_BUCKET_NAME, upload_prefix)
     logger.info("Deleted %d upload objects for songId=%s", deleted_upload, song_id)

-    # Delete DynamoDB record
-    delete_song(user_id, song_id)
-
-    logger.info("Deleted song: userId=%s songId=%s", user_id, song_id)
     return success({"deleted": True})
docs/PROJECT_PLAN.md (1)

426-429: Add a language specifier to the fenced code block.

The flow diagram code block lacks a language identifier, which triggers a markdownlint warning (MD040). Use ```text for plain-text diagrams.

Proposed fix
-```
+```text
 ValidateInput → RunDemucs (30min timeout) → RunWhisper (15min timeout) → MarkCompleted → SendNotification → CleanupUpload → Done
 On error: MarkFailed (DDB + S3 cleanup + notify) → Failed
</details>

</blockquote></details>
<details>
<summary>docs/PROJECT.md (1)</summary><blockquote>

`122-134`: **Add a language specifier to the fenced code block.**

Same markdownlint MD040 issue — use ` ```text ` for the workflow diagram.

<details>
<summary>Proposed fix</summary>

```diff
-```
+```text
 ValidateInput (Pass — computes outputPrefix + vocalsKey)
functions/failure_handler/handler.py (1)

48-64: Notification payload construction is duplicated with notify/handler.py.

The failure_handler builds the invoke payload inline (lines 50-57) with the same structure as notify/handler.py (lines 26-34). Both construct {"userId": ..., "message": {"type": ..., "songId": ..., "message": ...}} and invoke SEND_PROGRESS_FUNCTION_ARN with InvocationType: "Event".

This is understandable given the different error-handling semantics (best-effort here vs. propagating in the notify handler), but worth noting for future maintainability — if the payload schema changes, both locations need updating.

tests/unit/test_completion.py (2)

11-26: _seed_song helper is duplicated across test files.

This exact helper appears in both test_completion.py and test_failure_handler.py. Consider extracting it into tests/conftest.py or a shared test utility module to reduce duplication.


46-53: Ruff false positive — dynamodb_tables fixture is required for side effects.

The dynamodb_tables parameter is needed to activate the mock_aws() context and create the mocked DynamoDB tables, even though it's not directly referenced in the test body. Consider suppressing the warning:

-def test_song_not_found_raises(dynamodb_tables: dict[str, Any], lambda_context: Any) -> None:
+def test_song_not_found_raises(dynamodb_tables: dict[str, Any], lambda_context: Any) -> None:  # noqa: ARG001
tests/unit/test_failure_handler.py (1)

13-28: Consider extracting _seed_song into a shared fixture or helper in conftest.py.

This helper is duplicated (with minor variations in defaults) across test_failure_handler.py, test_completion.py, test_delete_song.py, test_get_song.py, and test_list_songs.py. A shared helper in conftest.py would reduce duplication and ensure consistent test data shape.

template.yaml (1)

288-312: Consider DynamoDBReadPolicy instead of DynamoDBCrudPolicy for read-only functions.

ListSongsFunction and GetSongFunction only need to query/read from DynamoDB. DynamoDBCrudPolicy grants full CRUD (including PutItem, DeleteItem, UpdateItem), which violates least-privilege.

Proposed fix
  ListSongsFunction:
    Type: AWS::Serverless::Function
    Properties:
      ...
      Policies:
-       - DynamoDBCrudPolicy:
+       - DynamoDBReadPolicy:
            TableName: !GetAtt StorageStack.Outputs.SongsTableName

  GetSongFunction:
    Type: AWS::Serverless::Function
    Properties:
      ...
      Policies:
-       - DynamoDBCrudPolicy:
+       - DynamoDBReadPolicy:
            TableName: !GetAtt StorageStack.Outputs.SongsTableName
        - S3ReadPolicy:
            BucketName: !GetAtt StorageStack.Outputs.OutputBucketName

Comment on lines +23 to +25
song = get_song(user_id, song_id)
if song is None:
raise NotFoundError(f"Song '{song_id}' not found")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

No guard against deleting a song that is currently being processed.

If a user calls DELETE /songs/{songId} while the Step Functions pipeline is actively processing (status PROCESSING), this will remove the DB record and S3 objects, causing the in-flight execution to fail unpredictably. Consider rejecting deletion for songs in PROCESSING status with a 409 Conflict.

🧰 Tools
🪛 Ruff (0.14.14)

[warning] 25-25: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@functions/delete_song/handler.py` around lines 23 - 25, The delete handler
currently fetches the song via get_song(user_id, song_id) then deletes it; add a
guard after fetching: if song.status == "PROCESSING" (or the enum/value used for
in-flight jobs) raise a 409 Conflict (e.g., raise ConflictError(f"Song
'{song_id}' is currently processing") or return an HTTP 409) to prevent removing
DB/S3 while Step Functions are running; update any tests that assume deletion
succeeds to expect a 409 for PROCESSING status.

typing_extensions>=4.0
PyJWT>=2.8
cryptography>=42.0
aws-lambda-powertools>=2.0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Tighten the version floor to match the dev dependency and the v3 APIs used in handlers.

Handlers use v3 features (e.g., correlation_id_path kwarg on inject_lambda_context). With >=2.0, pip may resolve to a 2.x release in the Lambda layer, breaking at runtime. Bump to at least >=3.24.0 to match pyproject.toml.

Proposed fix
-aws-lambda-powertools>=2.0
+aws-lambda-powertools>=3.24.0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
aws-lambda-powertools>=2.0
aws-lambda-powertools>=3.24.0
🤖 Prompt for AI Agents
In `@layers/common/requirements.txt` at line 6, Update the aws-lambda-powertools
requirement to a v3 minimum to match dev dependency and the v3 APIs used by
handlers: change the entry for aws-lambda-powertools in the requirements file
from "aws-lambda-powertools>=2.0" to "aws-lambda-powertools>=3.24.0" so features
like inject_lambda_context(correlation_id_path=...) are available at runtime.

Comment thread pyproject.toml
"pre-commit>=4.0",
"pyjwt>=2.11.0",
"cryptography>=46.0.4",
"aws-lambda-powertools>=3.24.0",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Version mismatch with runtime dependency in layers/common/requirements.txt.

Dev pins aws-lambda-powertools>=3.24.0 here, but the Lambda layer (layers/common/requirements.txt) pins >=2.0. Tests will run against v3 APIs (e.g., Logger behavior, inject_lambda_context kwargs) while the deployed Lambda could resolve to a v2.x release. This can cause silent runtime failures that tests won't catch.

Align the two constraints — ideally both should require >=3.24.0 (or at least >=3.0).

🤖 Prompt for AI Agents
In `@pyproject.toml` at line 23, The pyproject.toml currently requires
"aws-lambda-powertools>=3.24.0" while the Lambda layer file
layers/common/requirements.txt still pins aws-lambda-powertools to ">=2.0",
causing a runtime/test mismatch; update the constraint in
layers/common/requirements.txt to match (preferably
"aws-lambda-powertools>=3.24.0" or at minimum ">=3.0") so tests and deployed
Lambdas use the same major version and v3 APIs like Logger and
inject_lambda_context.

Comment thread statemachines/processing.asl.json
Comment thread statemachines/processing.asl.json
Comment on lines +138 to +180
"MarkCompleted": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${CompletionFunctionArn}",
"Payload": {
"userId.$": "$.userId",
"songId.$": "$.songId"
}
},
"ResultPath": "$.completionResult",
"Next": "SendNotification"
},

"SendNotification": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${NotifyFunctionArn}",
"Payload": {
"userId.$": "$.userId",
"songId.$": "$.songId",
"status": "COMPLETED"
}
},
"ResultPath": "$.notifyResult",
"Next": "CleanupUpload"
},

"CleanupUpload": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${CleanupFunctionArn}",
"Payload": {
"userId.$": "$.userId",
"songId.$": "$.songId",
"key.$": "$.key"
}
},
"ResultPath": "$.cleanupResult",
"Next": "Done"
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Post-processing Lambda states lack Retry and Catch blocks.

MarkCompleted, SendNotification, and CleanupUpload have no error handling. If any of these Lambdas fails (transient error, throttle, timeout), the entire execution fails without routing through MarkFailed, so the song stays in a limbo status with no user notification.

At minimum, add retries for transient Lambda errors and a Catch to MarkFailed on MarkCompleted. For SendNotification and CleanupUpload, consider whether a failure there should block the pipeline or be best-effort.

Example: adding Retry + Catch to MarkCompleted
     "MarkCompleted": {
       "Type": "Task",
       "Resource": "arn:aws:states:::lambda:invoke",
       "Parameters": {
         "FunctionName": "${CompletionFunctionArn}",
         "Payload": {
           "userId.$": "$.userId",
           "songId.$": "$.songId"
         }
       },
       "ResultPath": "$.completionResult",
+      "Retry": [
+        {
+          "ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException", "Lambda.AWSLambdaException"],
+          "IntervalSeconds": 5,
+          "MaxAttempts": 3,
+          "BackoffRate": 2
+        }
+      ],
+      "Catch": [
+        {
+          "ErrorEquals": ["States.ALL"],
+          "Next": "MarkFailed",
+          "ResultPath": "$.error"
+        }
+      ],
       "Next": "SendNotification"
     },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"MarkCompleted": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${CompletionFunctionArn}",
"Payload": {
"userId.$": "$.userId",
"songId.$": "$.songId"
}
},
"ResultPath": "$.completionResult",
"Next": "SendNotification"
},
"SendNotification": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${NotifyFunctionArn}",
"Payload": {
"userId.$": "$.userId",
"songId.$": "$.songId",
"status": "COMPLETED"
}
},
"ResultPath": "$.notifyResult",
"Next": "CleanupUpload"
},
"CleanupUpload": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${CleanupFunctionArn}",
"Payload": {
"userId.$": "$.userId",
"songId.$": "$.songId",
"key.$": "$.key"
}
},
"ResultPath": "$.cleanupResult",
"Next": "Done"
},
"MarkCompleted": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${CompletionFunctionArn}",
"Payload": {
"userId.$": "$.userId",
"songId.$": "$.songId"
}
},
"ResultPath": "$.completionResult",
"Retry": [
{
"ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException", "Lambda.AWSLambdaException"],
"IntervalSeconds": 5,
"MaxAttempts": 3,
"BackoffRate": 2
}
],
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"Next": "MarkFailed",
"ResultPath": "$.error"
}
],
"Next": "SendNotification"
},
"SendNotification": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${NotifyFunctionArn}",
"Payload": {
"userId.$": "$.userId",
"songId.$": "$.songId",
"status": "COMPLETED"
}
},
"ResultPath": "$.notifyResult",
"Next": "CleanupUpload"
},
"CleanupUpload": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${CleanupFunctionArn}",
"Payload": {
"userId.$": "$.userId",
"songId.$": "$.songId",
"key.$": "$.key"
}
},
"ResultPath": "$.cleanupResult",
"Next": "Done"
},
🤖 Prompt for AI Agents
In `@statemachines/processing.asl.json` around lines 138 - 180, The three
post-processing Task states (MarkCompleted, SendNotification, CleanupUpload)
lack retry and catch handling; add a Retry block to each to handle transient
Lambda errors (e.g., Lambda.ServiceException, Lambda.AWSLambdaException,
throttling, and timeouts) with exponential backoff and limited attempts, and add
a Catch on MarkCompleted that routes to the existing MarkFailed state (using a
ResultPath to capture error info) so failures surface to failure handling; for
SendNotification and CleanupUpload decide best-effort behaviour—either add Retry
+ a Catch that swallows/logs the error and continues to Done (so pipeline
proceeds) or route to MarkFailed if you want failures to abort—implement the
chosen policy by adding appropriate Catch handlers referencing MarkFailed or
Done.

Comment on lines +186 to +199
"MarkFailed": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${FailureHandlerFunctionArn}",
"Payload": {
"userId.$": "$.userId",
"songId.$": "$.songId",
"error.$": "$.error"
}
},
"ResultPath": "$.failureResult",
"Next": "Failed"
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

MarkFailed itself has no Retry — a transient failure here silently loses error context.

If the failure-handler Lambda is throttled or times out, the execution terminates with an unhandled error and the song record is never marked as failed. Adding a small retry with backoff here is a low-cost safety net.

Proposed fix
     "MarkFailed": {
       "Type": "Task",
       "Resource": "arn:aws:states:::lambda:invoke",
       "Parameters": {
         "FunctionName": "${FailureHandlerFunctionArn}",
         "Payload": {
           "userId.$": "$.userId",
           "songId.$": "$.songId",
           "error.$": "$.error"
         }
       },
       "ResultPath": "$.failureResult",
+      "Retry": [
+        {
+          "ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException", "Lambda.AWSLambdaException"],
+          "IntervalSeconds": 5,
+          "MaxAttempts": 3,
+          "BackoffRate": 2
+        }
+      ],
       "Next": "Failed"
     },
🤖 Prompt for AI Agents
In `@statemachines/processing.asl.json` around lines 186 - 199, The MarkFailed
Task state currently has no Retry, so transient Lambda errors can abort the
workflow; update the MarkFailed state (the Task that invokes
FailureHandlerFunctionArn) to add a Retry block with at least one or two
attempts and exponential backoff (e.g., set ErrorEquals to ["States.ALL"],
MaxAttempts to 2-3, IntervalSeconds to a small base like 2, and BackoffRate to
~2) so throttles/timeouts are retried before failing; keep ResultPath and Next
unchanged.

Comment on lines +62 to +68
"if (json.AuthenticationResult) {",
" const idToken = json.AuthenticationResult.IdToken;",
" pm.environment.set('idToken', idToken);",
"",
" // Decode JWT to extract user sub",
" const payload = JSON.parse(atob(idToken.split('.')[1]));",
" pm.environment.set('userSub', payload.sub);",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

atob() does not handle base64url encoding — JWT decoding may fail intermittently.

JWT payloads use base64url encoding (- and _ instead of + and /, no padding). atob() expects standard base64 and will throw on tokens containing those characters. While many Cognito tokens happen to work, this is fragile.

The same issue exists in the debug "Get Token" request at Line 195.

🔧 Suggested fix: normalize base64url → base64 before decoding
-          "    const payload = JSON.parse(atob(idToken.split('.')[1]));",
+          "    const base64Url = idToken.split('.')[1];",
+          "    const base64 = base64Url.replace(/-/g, '+').replace(/_/g, '/');",
+          "    const payload = JSON.parse(atob(base64));",

Apply the same fix at the Get Token (debug) request (Line 195).

🤖 Prompt for AI Agents
In `@tests/postman/unplugd-rest.postman_collection.json` around lines 62 - 68, The
JWT decoding uses atob(idToken.split('.')[1]) which fails for base64url-encoded
tokens; update the extraction around idToken (where payload is computed and
pm.environment.set('userSub') is set) to first normalize the base64url segment
(replace '-'->'+', '_'->'/', and add '=' padding to length%4) then call atob on
the normalized string and JSON.parse the result; apply the identical
normalization fix in the other occurrence in the "Get Token (debug)" request as
well.

  Remove duplicate LaunchType from RunDemucs/RunWhisper ASL states that
  conflicted with CapacityProviderStrategy (FARGATE_SPOT). Add Phase 7.5
  section to PROJECT_PLAN with planned work: SQS backpressure queue,
  stale errorMessage fix, retry endpoint, and powertools logging migration.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/PROJECT_PLAN.md`:
- Around line 376-383: Docs list of mixing presets is missing one item; update
the `/presets` endpoint description to enumerate all five presets separately.
Edit the text that currently reads "Balanced, Karaoke, Drum/Bass Practice,
Vocals Only" and replace it with "Balanced, Karaoke, Drum Practice, Bass
Practice, Vocals Only" so it matches the distinct presets implemented by the
get_presets/handler.py and the templates/api.yaml route definition. Ensure the
parenthetical list now contains five comma-separated items and stays consistent
with the codebase presets.
🧹 Nitpick comments (1)
docs/PROJECT_PLAN.md (1)

426-430: Add a language identifier to the fenced code block.

The code block on line 427 lacks a language specifier, flagged by markdownlint (MD040).

Proposed fix
-```
+```text
 ValidateInput → RunDemucs (30min timeout) → RunWhisper (15min timeout) → MarkCompleted → SendNotification → CleanupUpload → Done
 On error: MarkFailed (DDB + S3 cleanup + notify) → Failed
</details>

</blockquote></details>

</blockquote></details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread docs/PROJECT_PLAN.md
Comment on lines +376 to +383
**Endpoints** (4 new routes in `templates/api.yaml`, all Cognito-authorized):

| Method | Path | Handler | Description |
|--------|------|---------|-------------|
| `GET` | `/songs` | `list_songs/handler.py` | List user's songs (optional `?status=` filter) |
| `GET` | `/songs/{songId}` | `get_song/handler.py` | Song details + presigned stem/lyrics URLs (if COMPLETED) |
| `DELETE` | `/songs/{songId}` | `delete_song/handler.py` | Delete song record + S3 objects (any status) |
| `GET` | `/presets` | `get_presets/handler.py` | 5 mixing presets (Balanced, Karaoke, Drum/Bass Practice, Vocals Only) |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for MIXING_PRESETS definition across the codebase
echo "=== Searching for MIXING_PRESETS definition ==="
rg -B 2 -A 20 "MIXING_PRESETS" --type py

Repository: theadityamittal/unplugd

Length of output: 2325


🏁 Script executed:

#!/bin/bash
# Read the full MIXING_PRESETS definition from constants.py
cat -n functions/shared/constants.py | grep -A 60 "MIXING_PRESETS:"

Repository: theadityamittal/unplugd

Length of output: 1367


Fix incomplete preset list in description: "Drum/Bass Practice" conflates two separate presets.

The docs correctly state "5 mixing presets", but the parenthetical list is incomplete. It says "Balanced, Karaoke, Drum/Bass Practice, Vocals Only" (4 items), when it should list all 5 separately: "Balanced, Karaoke, Drum Practice, Bass Practice, Vocals Only". The codebase has distinct presets for Drum Practice and Bass Practice.

🤖 Prompt for AI Agents
In `@docs/PROJECT_PLAN.md` around lines 376 - 383, Docs list of mixing presets is
missing one item; update the `/presets` endpoint description to enumerate all
five presets separately. Edit the text that currently reads "Balanced, Karaoke,
Drum/Bass Practice, Vocals Only" and replace it with "Balanced, Karaoke, Drum
Practice, Bass Practice, Vocals Only" so it matches the distinct presets
implemented by the get_presets/handler.py and the templates/api.yaml route
definition. Ensure the parenthetical list now contains five comma-separated
items and stays consistent with the codebase presets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant