Skip to content

feat: add LTX-2 19B audio-to-video model (ltx-2-a2v)#193

Merged
SecurityQQ merged 1 commit intomainfrom
feature/ltx-2-a2v
Apr 2, 2026
Merged

feat: add LTX-2 19B audio-to-video model (ltx-2-a2v)#193
SecurityQQ merged 1 commit intomainfrom
feature/ltx-2-a2v

Conversation

@SecurityQQ
Copy link
Copy Markdown
Contributor

Summary

Add LTX-2 19B audio-to-video as a new lipsync model (ltx-2-a2v). Takes audio + prompt as required inputs, optional image as first frame, and generates a synced video.

  • Fal endpoint: fal-ai/ltx-2-19b/audio-to-video
  • Model ID: ltx-2-a2v
  • Category: lipsync (same routing as omnihuman/veed/wan)

Changes

  • New model schema (definitions/models/ltx-a2v.ts): Full Zod input/output schema covering all fal API params (camera LoRA, multiscale, audio strength, etc.)
  • Registered in model index (definitions/models/index.ts)
  • ai-sdk provider (ai-sdk/providers/fal.ts): Added to LIPSYNC_MODELS map + prompt passthrough (like omnihuman)
  • Old provider (providers/fal.ts): New ltx2AudioToVideo() method with all relevant params
  • Sync action (definitions/actions/sync.ts): Added ltx-2-a2v to model enum, made image optional (only ltx-2-a2v doesn't require it), added validation for non-ltx models

Key design decisions

  • Registered as a lipsync model (not a VIDEO_MODEL) since the primary input is audio, matching the existing pattern for omnihuman/veed/wan
  • Uses the non-distilled endpoint (fal-ai/ltx-2-19b/audio-to-video) since there's no distilled variant for a2v
  • Prompt is required (unlike other lipsync models that skip prompt) — handled via model-specific passthrough
  • match_audio_length: true by default so video auto-matches audio duration

Companion PR

  • Gateway: vargHQ/gateway — feature/ltx-2-a2v branch (model routing, pricing, prompt passthrough, OpenAPI spec)

Add LTX-2 audio-to-video as a lipsync model — audio in, video out,
optional image as first frame. Registered across model schema, ai-sdk
provider, old provider, and sync action definition.

- New per-model Zod schema in definitions/models/ltx-a2v.ts
- Registered in LIPSYNC_MODELS with prompt passthrough
- Added ltx2AudioToVideo method to FalProvider
- Added ltx-2-a2v to sync action model enum (image now optional)
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 2, 2026

📝 Walkthrough

walkthrough

this pr adds support for the ltx-2 19b audio-to-video model across the sdk. it includes a new model definition with comprehensive schema, fal provider integration with the ltx2AudioToVideo method, and extends lipsync action support with optional image conditioning for audio-to-video workflows.

changes

cohort / file(s) summary
model definition & exports
src/definitions/models/ltx-a2v.ts, src/definitions/models/index.ts
added comprehensive ltx-2-a2v model definition with zod schema for audio-to-video inputs (audio url, optional image, frame/alignment options, motion controls, safety settings) and registered it in model exports.
fal provider integration
src/ai-sdk/providers/fal.ts, src/providers/fal.ts
added ltx-2-a2v model mapping to fal endpoint; updated FalVideoModel.doGenerate to include prompt for ltx-2-a2v; implemented new ltx2AudioToVideo method with file upload, input mapping, and defaults for multiscale/prompt expansion; exported convenience function.
lipsync action schema & dispatch
src/definitions/actions/sync.ts
extended model union to include ltx-2-a2v; made image field optional with model-specific descriptions; added runtime guard requiring image for non-ltx-2-a2v models; routed ltx-2-a2v calls through new falProvider.ltx2AudioToVideo path with conditional imageUrl parameter.

sequence diagram

sequenceDiagram
    participant user as User
    participant sync as LipsyncAction
    participant falprov as FalProvider
    participant falapi as Fal API

    user->>sync: call lipsync(audioUrl, model: ltx-2-a2v, image?)
    sync->>sync: validate inputs & model guard
    alt ltx-2-a2v model
        sync->>falprov: ltx2AudioToVideo({audioUrl, imageUrl?, ...options})
        falprov->>falprov: upload audioUrl & imageUrl (if present)
        falprov->>falapi: fal.subscribe('fal-ai/ltx-2-19b/audio-to-video', input)
        falapi-->>falprov: video generation result
        falprov-->>sync: {video, seed, prompt}
    else other models (omnihuman-v1.5, etc)
        sync->>falprov: call model-specific method (required image)
        falprov-->>sync: result
    end
    sync-->>user: LipsyncResult

    rect rgba(100, 200, 150, 0.5)
    note right of falprov: conditional imageUrl included only when image provided for ltx-2-a2v
    end
Loading

estimated code review effort

🎯 3 (moderate) | ⏱️ ~20 minutes

possibly related prs

poem

🎙️ audio dances with the video stream,
optional images paint the dream,
ltx-2 brings the lips to sync,
no image? no problem for this link meow,
fal routes the call, the model flies—
audio-to-video magic in the skies ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed the title directly and specifically describes the main change: adding a new ltx-2 audio-to-video model to the lipsync category, which aligns perfectly with all the file changes in the pr.
Description check ✅ Passed the description is well-detailed and clearly related to the changeset, covering the new model schema, provider integration, sync action updates, and key design decisions.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/ltx-2-a2v

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/providers/fal.ts (1)

432-447: document this new public api and name the args type.

ltx2AudioToVideo is now part of the public provider surface, but its parameter shape is anonymous and the new entry point is undocumented. a named interface + jsdoc would make this a lot easier to consume, meow.

suggested refactor
+export interface Ltx2AudioToVideoArgs {
+  prompt: string;
+  audioUrl: string;
+  imageUrl?: string;
+  matchAudioLength?: boolean;
+  numFrames?: number;
+  videoSize?: string;
+  useMultiscale?: boolean;
+  fps?: number;
+  guidanceScale?: number;
+  numInferenceSteps?: number;
+  seed?: number;
+  enablePromptExpansion?: boolean;
+  audioStrength?: number;
+  imageStrength?: number;
+}
+
+/**
+ * generate video from audio and prompt, with an optional first-frame image.
+ */
-  async ltx2AudioToVideo(args: {
-    prompt: string;
-    audioUrl: string;
-    imageUrl?: string;
-    matchAudioLength?: boolean;
-    numFrames?: number;
-    videoSize?: string;
-    useMultiscale?: boolean;
-    fps?: number;
-    guidanceScale?: number;
-    numInferenceSteps?: number;
-    seed?: number;
-    enablePromptExpansion?: boolean;
-    audioStrength?: number;
-    imageStrength?: number;
-  }) {
+  async ltx2AudioToVideo(args: Ltx2AudioToVideoArgs) {

as per coding guidelines, "use interfaces for object type definitions in typescript" and "ensure all public functions and classes have jsdoc comments".

Also applies to: 755-757

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/providers/fal.ts` around lines 432 - 447, Create a named interface (e.g.,
Ltx2AudioToVideoArgs) for the anonymous parameter object used by
ltx2AudioToVideo and replace the inline type with that interface; add JSDoc
above the interface describing each field (prompt, audioUrl, imageUrl,
matchAudioLength, numFrames, videoSize, useMultiscale, fps, guidanceScale,
numInferenceSteps, seed, enablePromptExpansion, audioStrength, imageStrength)
and a JSDoc comment for the ltx2AudioToVideo method explaining the function,
parameter type, and return value; also apply the same interface/documentation
pattern to the other two occurrences referenced (around the code at the other
spots noted, e.g., the lines mentioned) so all public entry points use a named,
documented interface.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai-sdk/providers/fal.ts`:
- Around line 179-180: The "ltx-2-a2v" entry in the provider map should get its
own API path instead of reusing the generic lip-sync mapping: change the mapping
for the "ltx-2-a2v" key in the provider map (found in
src/ai-sdk/providers/fal.ts) to a dedicated path and handler so it does not
inherit the generic lip-sync behavior; update the request creation logic for
that key to match src/definitions/models/ltx-a2v.ts semantics (prefer image_url
over video_url, do not drop or omit prompt when it is missing/empty, and ensure
seed and fps are passed through without being gated by the old isLtx2 check).
Also apply the same separate-path fix for the duplicate mappings around the
other occurrence mentioned (lines referenced 565-570) so both places use the
dedicated "ltx-2-a2v" path and payload rules.

In `@src/definitions/actions/sync.ts`:
- Around line 19-25: The schema currently makes image optional unconditionally
which mismatches runtime checks in lipsync; update syncInputSchema so image is
required for all models except "ltx-2-a2v" (use a conditional/when on the model
enum to make filePathSchema.required() for the other values), and update the
LipsyncOptions type to reflect the same conditional (e.g., a discriminated union
or narrower type where image is required unless model === "ltx-2-a2v"); ensure
the image field name and the lipsync function's runtime checks remain consistent
with the schema so validation fails early instead of during execution.

In `@src/definitions/models/ltx-a2v.ts`:
- Around line 47-63: The schema currently gives num_frames a default (121) which
forces it into validated payloads despite match_audio_length defaulting to true;
remove the .default(121) on the num_frames Zod definition (leave it
.optional()/.number()/.int()/.min()/.max()) so that when match_audio_length is
true the validated object does not include num_frames by default; update any
tests or callers expecting a default to instead handle the absence of num_frames
or set it explicitly when match_audio_length is false (refer to the
match_audio_length and num_frames properties in the ltx-a2v schema).

---

Nitpick comments:
In `@src/providers/fal.ts`:
- Around line 432-447: Create a named interface (e.g., Ltx2AudioToVideoArgs) for
the anonymous parameter object used by ltx2AudioToVideo and replace the inline
type with that interface; add JSDoc above the interface describing each field
(prompt, audioUrl, imageUrl, matchAudioLength, numFrames, videoSize,
useMultiscale, fps, guidanceScale, numInferenceSteps, seed,
enablePromptExpansion, audioStrength, imageStrength) and a JSDoc comment for the
ltx2AudioToVideo method explaining the function, parameter type, and return
value; also apply the same interface/documentation pattern to the other two
occurrences referenced (around the code at the other spots noted, e.g., the
lines mentioned) so all public entry points use a named, documented interface.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e0bac096-140d-47f6-978c-2b74d4fbb1ae

📥 Commits

Reviewing files that changed from the base of the PR and between 494f37d and 37d4f9a.

📒 Files selected for processing (5)
  • src/ai-sdk/providers/fal.ts
  • src/definitions/actions/sync.ts
  • src/definitions/models/index.ts
  • src/definitions/models/ltx-a2v.ts
  • src/providers/fal.ts

Comment on lines +179 to 180
"ltx-2-a2v": "fal-ai/ltx-2-19b/audio-to-video",
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

give ltx-2-a2v its own path here.

adding this model to the generic lip-sync bucket makes it inherit behavior that does not match src/definitions/models/ltx-a2v.ts: the branch still prefers video_url over image_url, silently drops prompt when it is missing or empty, and later keeps seed/fps behind the old isLtx2 gate. that means valid-looking sdk calls can either submit an invalid payload or lose supported controls.

suggested fix
-    const isLtx2 = this.modelId === "ltx-2-19b-distilled";
+    const isLtx2 = this.modelId === "ltx-2-19b-distilled";
+    const isLtxA2v = this.modelId === "ltx-2-a2v";
@@
-      if (videoFile) {
-        input.video_url = await fileToUrl(videoFile);
-      } else if (imageFile) {
+      if (isLtxA2v && videoFile) {
+        throw new Error(
+          "ltx-2-a2v accepts audio with an optional image, not video input",
+        );
+      }
+      if (imageFile) {
         input.image_url = await fileToUrl(imageFile);
+      } else if (videoFile) {
+        input.video_url = await fileToUrl(videoFile);
       }
       if (audioFile) {
         input.audio_url = await fileToUrl(audioFile);
       }
 
-      if (
-        prompt &&
-        (this.modelId === "omnihuman-v1.5" || this.modelId === "ltx-2-a2v")
-      ) {
+      if (isLtxA2v && !prompt) {
+        throw new Error("ltx-2-a2v requires a prompt");
+      }
+      if (prompt && (this.modelId === "omnihuman-v1.5" || isLtxA2v)) {
         input.prompt = prompt;
       }
@@
-      if (isLtx2) {
+      if (isLtx2 || isLtxA2v) {
         input.seed = options.seed;
       }
@@
-      if (isLtx2) {
+      if (isLtx2 || isLtxA2v) {
         input.fps = options.fps;
       }

Also applies to: 565-570

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai-sdk/providers/fal.ts` around lines 179 - 180, The "ltx-2-a2v" entry in
the provider map should get its own API path instead of reusing the generic
lip-sync mapping: change the mapping for the "ltx-2-a2v" key in the provider map
(found in src/ai-sdk/providers/fal.ts) to a dedicated path and handler so it
does not inherit the generic lip-sync behavior; update the request creation
logic for that key to match src/definitions/models/ltx-a2v.ts semantics (prefer
image_url over video_url, do not drop or omit prompt when it is missing/empty,
and ensure seed and fps are passed through without being gated by the old isLtx2
check). Also apply the same separate-path fix for the duplicate mappings around
the other occurrence mentioned (lines referenced 565-570) so both places use the
dedicated "ltx-2-a2v" path and payload rules.

Comment on lines +19 to +25
.enum(["wan-25", "omnihuman-v1.5", "veed-fabric-1.0", "ltx-2-a2v"])
.optional()
.default("wan-25")
.describe("Lip sync / avatar backend model"),
image: filePathSchema.describe("Input image"),
image: filePathSchema
.optional()
.describe("Input image (optional for ltx-2-a2v)"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

keep the image requirement in the contract.

syncInputSchema and LipsyncOptions now allow image to be missing for every model, then lines 90-92 throw for three of them. that widens the public action contract and pushes a validation error into execution, which will confuse any cli/docs/ui that read the definition. please encode the image requirement in the schema/type instead of only re-checking it inside lipsync.

Also applies to: 57-58, 90-92

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/definitions/actions/sync.ts` around lines 19 - 25, The schema currently
makes image optional unconditionally which mismatches runtime checks in lipsync;
update syncInputSchema so image is required for all models except "ltx-2-a2v"
(use a conditional/when on the model enum to make filePathSchema.required() for
the other values), and update the LipsyncOptions type to reflect the same
conditional (e.g., a discriminated union or narrower type where image is
required unless model === "ltx-2-a2v"); ensure the image field name and the
lipsync function's runtime checks remain consistent with the schema so
validation fails early instead of during execution.

Comment on lines +47 to +63
match_audio_length: z
.boolean()
.optional()
.default(true)
.describe(
"When enabled, num_frames is calculated from audio duration and FPS",
),
num_frames: z
.number()
.int()
.min(9)
.max(481)
.optional()
.default(121)
.describe(
"Number of frames to generate (used when match_audio_length is false)",
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "ltx-a2v.ts" -type f 2>/dev/null | head -5

Repository: vargHQ/sdk

Length of output: 36


🏁 Script executed:

# first let's find the file
find . -name "*.ts" -path "*/definitions/models/*" 2>/dev/null | head -20

Repository: vargHQ/sdk

Length of output: 748


🏁 Script executed:

# search for the specific file
git ls-files | grep -E "definitions/models" | grep -v node_modules

Repository: vargHQ/sdk

Length of output: 710


🏁 Script executed:

# read the schema file
cat -n src/definitions/models/ltx-a2v.ts 2>/dev/null || echo "file not found"

Repository: vargHQ/sdk

Length of output: 6373


🏁 Script executed:

# check how ltxA2vInputSchema is used downstream
rg "ltxA2vInputSchema|ltxA2v" --type ts --type tsx -B 2 -A 2 | head -100

Repository: vargHQ/sdk

Length of output: 81


🏁 Script executed:

# check if there are tests for this schema
find . -name "*.test.ts" -o -name "*.spec.ts" | xargs grep -l "ltx.*a2v" 2>/dev/null || echo "no tests found"

Repository: vargHQ/sdk

Length of output: 68


🏁 Script executed:

# check the zod version in package.json
grep -A 2 -B 2 "zod" package.json

Repository: vargHQ/sdk

Length of output: 414


🏁 Script executed:

# check if num_frames is referenced elsewhere in the codebase
rg "num_frames" --type ts --type tsx | head -30

Repository: vargHQ/sdk

Length of output: 81


🏁 Script executed:

# search for how the schema is used in the codebase
rg "ltxA2vInputSchema|ltx-a2v" -t ts | grep -v "node_modules" | head -50

Repository: vargHQ/sdk

Length of output: 475


🏁 Script executed:

# check if there's any parsing or validation of this schema
rg "parse|safeParse" -t ts -A 3 -B 1 | grep -A 3 -B 1 "ltx"

Repository: vargHQ/sdk

Length of output: 36


🏁 Script executed:

# verify zod 4 behavior with optional + default
cat > /tmp/test_zod.ts << 'EOF'
import { z } from "zod";

const schema = z.object({
  match_audio_length: z
    .boolean()
    .optional()
    .default(true),
  num_frames: z
    .number()
    .int()
    .optional()
    .default(121),
});

// Test 1: empty object
const result1 = schema.parse({});
console.log("empty object:", JSON.stringify(result1));

// Test 2: only match_audio_length
const result2 = schema.parse({ match_audio_length: false });
console.log("match_audio_length=false:", JSON.stringify(result2));

// Test 3: both fields
const result3 = schema.parse({ match_audio_length: true, num_frames: 50 });
console.log("both fields:", JSON.stringify(result3));
EOF
cd /tmp && node -e "$(cat test_zod.ts | sed 's/import.*zod.*/const z = require("zod").z;/')" 2>&1 || echo "failed to run zod test"

Repository: vargHQ/sdk

Length of output: 879


🏁 Script executed:

# check package.json for more context on how dependencies are structured
cat package.json | grep -A 10 -B 5 "devDependencies\|dependencies"

Repository: vargHQ/sdk

Length of output: 901


🏁 Script executed:

# check how the schema definition is used in exports
rg "ltxA2v" -t ts | grep -E "export|import"

Repository: vargHQ/sdk

Length of output: 227


🏁 Script executed:

# look for usage in actual model invocation or processing
rg "ltxA2vInputSchema" -t ts -A 5 -B 2

Repository: vargHQ/sdk

Length of output: 1110


🏁 Script executed:

# check if there's any request building or parameter processing
rg "match_audio_length|num_frames" -t ts -B 3 -A 3 | head -60

Repository: vargHQ/sdk

Length of output: 3179


🏁 Script executed:

# look for any provider-specific handling or API calls
fd -e ts | xargs grep -l "fal-ai/ltx" | head -5

Repository: vargHQ/sdk

Length of output: 136


🏁 Script executed:

# let's check how the schema is validated before being sent to the provider
rg "ltxA2vInputSchema|parse|safeParse" -t ts -B 3 -A 5 | grep -A 10 -B 10 "ltx"

Repository: vargHQ/sdk

Length of output: 2747


🏁 Script executed:

# check the full fal provider implementation for ltx-2
sed -n '1,250p' src/providers/fal.ts | head -100

Repository: vargHQ/sdk

Length of output: 3181


🏁 Script executed:

# see the full context around match_audio_length and num_frames handling
rg -B 10 -A 10 "match_audio_length.*true" src/providers/fal.ts

Repository: vargHQ/sdk

Length of output: 720


🏁 Script executed:

# check how the ai-sdk provider handles it
sed -n '1,200p' src/ai-sdk/providers/fal.ts | grep -A 20 "isLtx2"

Repository: vargHQ/sdk

Length of output: 36


🏁 Script executed:

# look for where ltxA2vInputSchema is actually parsed/validated
rg "ltxA2vInputSchema" -t ts -A 10 -B 5

Repository: vargHQ/sdk

Length of output: 2280


🏁 Script executed:

# check if the schema is used in any actual parsing/validation flow
rg "schema\.input|\.parse\(|\.safeParse" -t ts | grep -i ltx

Repository: vargHQ/sdk

Length of output: 36


🏁 Script executed:

# let's see the full fal provider ltx2 section
grep -n "ltx\|LTX" src/providers/fal.ts -i | head -20

Repository: vargHQ/sdk

Length of output: 372


🏁 Script executed:

# check where the inputs object comes from and how it's constructed
sed -n '200,350p' src/providers/fal.ts

Repository: vargHQ/sdk

Length of output: 3954


🏁 Script executed:

# check if the schema definitions are actually used for input validation anywhere
rg "definition\.schema|ltxA2v.*parse|ltxA2v.*safeParse" -t ts -B 3 -A 3

Repository: vargHQ/sdk

Length of output: 543


🏁 Script executed:

# look at how model definitions are used in the core system
find . -name "types.ts" -o -name "schema.ts" | xargs grep -l "ModelDefinition\|ZodSchema" | head -5

Repository: vargHQ/sdk

Length of output: 80


🏁 Script executed:

# check the base provider or schema usage
grep -n "definition\|schema\|parse" src/core/schema/types.ts | head -40

Repository: vargHQ/sdk

Length of output: 354


🏁 Script executed:

# find the validateAndPrepare function
rg "validateAndPrepare" -t ts -B 5 -A 20

Repository: vargHQ/sdk

Length of output: 7683


🏁 Script executed:

# check the executor to see how inputs are validated and used
sed -n '1,150p' src/core/executor/executor.ts | tail -100

Repository: vargHQ/sdk

Length of output: 2624


🏁 Script executed:

# search for where the validated inputs go after validation
rg "validation\|validateAndPrepare" -t ts -A 10 | head -80

Repository: vargHQ/sdk

Length of output: 36


🏁 Script executed:

# let's see the full validator to understand how defaults are applied
cat src/core/schema/validator.ts

Repository: vargHQ/sdk

Length of output: 2168


🏁 Script executed:

# check the job runner to see how inputs flow to the provider
rg "jobRunner\|inputs" src/core/executor/job.ts -B 3 -A 3 | head -80

Repository: vargHQ/sdk

Length of output: 36


🏁 Script executed:

# verify by checking how the input flows through the provider to fal
sed -n '430,480p' src/providers/fal.ts | cat -n

Repository: vargHQ/sdk

Length of output: 2096


🏁 Script executed:

# check if there's any downstream filtering of num_frames when match_audio_length is true
rg "match_audio_length.*true|num_frames" src/providers/fal.ts -B 2 -A 2

Repository: vargHQ/sdk

Length of output: 485


🏁 Script executed:

# check how job runner passes inputs to the provider
cat src/core/executor/job.ts | head -100

Repository: vargHQ/sdk

Length of output: 2322


🏁 Script executed:

# verify the complete flow from executor to fal provider
rg "jobRunner\.run|provider\.submit|provider\.\w+Audio" src/core/executor/job.ts -B 2 -A 5

Repository: vargHQ/sdk

Length of output: 285


🏁 Script executed:

# let me check how the inputs actually flow - is there a direct mapping between validated inputs and provider args?
rg "ltx2AudioToVideo\|provider\." src/core/executor/job.ts -B 3 -A 3

Repository: vargHQ/sdk

Length of output: 36


🏁 Script executed:

# search for where ltx2AudioToVideo gets called from
rg "ltx2AudioToVideo" -t ts

Repository: vargHQ/sdk

Length of output: 378


🏁 Script executed:

# check if there's special routing or method calling based on model name
rg "ltx.*audio\|ltx2\|ltx-2" -t ts -i | grep -E "method|function|call" | head -20

Repository: vargHQ/sdk

Length of output: 36


🏁 Script executed:

# check the base provider implementation
cat src/providers/base.ts | head -150

Repository: vargHQ/sdk

Length of output: 3667


🏁 Script executed:

# specifically check how submit method handles inputs
rg "class.*Provider|submit\(" src/providers/base.ts -B 2 -A 10 | head -80

Repository: vargHQ/sdk

Length of output: 969


🏁 Script executed:

# final check: verify what the actual fal.queue.submit receives
grep -n "fal\.queue\|fal\.subscribe" src/providers/fal.ts | head -20

Repository: vargHQ/sdk

Length of output: 968


🏁 Script executed:

# check how FalProvider.submit actually works - does it do any model-specific handling?
sed -n '1,100p' src/providers/fal.ts | grep -A 30 "async submit"

Repository: vargHQ/sdk

Length of output: 1041


keep num_frames out of the default path. meow

match_audio_length defaults to true, but num_frames also defaults to 121, so the validated payload always carries both knobs. that makes the schema contradict the intended "match audio length by default" behavior. when inputs flow through the executor, they get validated with defaults applied and sent directly to fal without any filtering—so fal receives both knobs set, which can pin callers to 121 frames.

suggested fix
-const ltxA2vInputSchema = z.object({
+const ltxA2vInputSchema = z
+  .object({
   prompt: z.string().describe("The prompt to generate the video from"),
   audio_url: urlSchema.describe(
     "The URL of the audio to generate the video from",
   ),
@@
   num_frames: z
     .number()
     .int()
     .min(9)
     .max(481)
     .optional()
-    .default(121)
     .describe(
       "Number of frames to generate (used when match_audio_length is false)",
     ),
@@
   video_quality: videoQualitySchema
     .optional()
     .default("high")
     .describe("Output video quality"),
-});
+  })
+  .refine(
+    ({ match_audio_length, num_frames }) =>
+      match_audio_length !== false || num_frames !== undefined,
+    {
+      path: ["num_frames"],
+      message: "num_frames is required when match_audio_length is false",
+    },
+  );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/definitions/models/ltx-a2v.ts` around lines 47 - 63, The schema currently
gives num_frames a default (121) which forces it into validated payloads despite
match_audio_length defaulting to true; remove the .default(121) on the
num_frames Zod definition (leave it .optional()/.number()/.int()/.min()/.max())
so that when match_audio_length is true the validated object does not include
num_frames by default; update any tests or callers expecting a default to
instead handle the absence of num_frames or set it explicitly when
match_audio_length is false (refer to the match_audio_length and num_frames
properties in the ltx-a2v schema).

@SecurityQQ SecurityQQ merged commit 089da73 into main Apr 2, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant