feat: add LTX-2 19B audio-to-video model (ltx-2-a2v)#193
Conversation
Add LTX-2 audio-to-video as a lipsync model — audio in, video out, optional image as first frame. Registered across model schema, ai-sdk provider, old provider, and sync action definition. - New per-model Zod schema in definitions/models/ltx-a2v.ts - Registered in LIPSYNC_MODELS with prompt passthrough - Added ltx2AudioToVideo method to FalProvider - Added ltx-2-a2v to sync action model enum (image now optional)
📝 Walkthroughwalkthroughthis pr adds support for the ltx-2 19b audio-to-video model across the sdk. it includes a new model definition with comprehensive schema, fal provider integration with the changes
sequence diagramsequenceDiagram
participant user as User
participant sync as LipsyncAction
participant falprov as FalProvider
participant falapi as Fal API
user->>sync: call lipsync(audioUrl, model: ltx-2-a2v, image?)
sync->>sync: validate inputs & model guard
alt ltx-2-a2v model
sync->>falprov: ltx2AudioToVideo({audioUrl, imageUrl?, ...options})
falprov->>falprov: upload audioUrl & imageUrl (if present)
falprov->>falapi: fal.subscribe('fal-ai/ltx-2-19b/audio-to-video', input)
falapi-->>falprov: video generation result
falprov-->>sync: {video, seed, prompt}
else other models (omnihuman-v1.5, etc)
sync->>falprov: call model-specific method (required image)
falprov-->>sync: result
end
sync-->>user: LipsyncResult
rect rgba(100, 200, 150, 0.5)
note right of falprov: conditional imageUrl included only when image provided for ltx-2-a2v
end
estimated code review effort🎯 3 (moderate) | ⏱️ ~20 minutes possibly related prs
poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/providers/fal.ts (1)
432-447: document this new public api and name the args type.
ltx2AudioToVideois now part of the public provider surface, but its parameter shape is anonymous and the new entry point is undocumented. a named interface + jsdoc would make this a lot easier to consume, meow.suggested refactor
+export interface Ltx2AudioToVideoArgs { + prompt: string; + audioUrl: string; + imageUrl?: string; + matchAudioLength?: boolean; + numFrames?: number; + videoSize?: string; + useMultiscale?: boolean; + fps?: number; + guidanceScale?: number; + numInferenceSteps?: number; + seed?: number; + enablePromptExpansion?: boolean; + audioStrength?: number; + imageStrength?: number; +} + +/** + * generate video from audio and prompt, with an optional first-frame image. + */ - async ltx2AudioToVideo(args: { - prompt: string; - audioUrl: string; - imageUrl?: string; - matchAudioLength?: boolean; - numFrames?: number; - videoSize?: string; - useMultiscale?: boolean; - fps?: number; - guidanceScale?: number; - numInferenceSteps?: number; - seed?: number; - enablePromptExpansion?: boolean; - audioStrength?: number; - imageStrength?: number; - }) { + async ltx2AudioToVideo(args: Ltx2AudioToVideoArgs) {as per coding guidelines, "use interfaces for object type definitions in typescript" and "ensure all public functions and classes have jsdoc comments".
Also applies to: 755-757
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/providers/fal.ts` around lines 432 - 447, Create a named interface (e.g., Ltx2AudioToVideoArgs) for the anonymous parameter object used by ltx2AudioToVideo and replace the inline type with that interface; add JSDoc above the interface describing each field (prompt, audioUrl, imageUrl, matchAudioLength, numFrames, videoSize, useMultiscale, fps, guidanceScale, numInferenceSteps, seed, enablePromptExpansion, audioStrength, imageStrength) and a JSDoc comment for the ltx2AudioToVideo method explaining the function, parameter type, and return value; also apply the same interface/documentation pattern to the other two occurrences referenced (around the code at the other spots noted, e.g., the lines mentioned) so all public entry points use a named, documented interface.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai-sdk/providers/fal.ts`:
- Around line 179-180: The "ltx-2-a2v" entry in the provider map should get its
own API path instead of reusing the generic lip-sync mapping: change the mapping
for the "ltx-2-a2v" key in the provider map (found in
src/ai-sdk/providers/fal.ts) to a dedicated path and handler so it does not
inherit the generic lip-sync behavior; update the request creation logic for
that key to match src/definitions/models/ltx-a2v.ts semantics (prefer image_url
over video_url, do not drop or omit prompt when it is missing/empty, and ensure
seed and fps are passed through without being gated by the old isLtx2 check).
Also apply the same separate-path fix for the duplicate mappings around the
other occurrence mentioned (lines referenced 565-570) so both places use the
dedicated "ltx-2-a2v" path and payload rules.
In `@src/definitions/actions/sync.ts`:
- Around line 19-25: The schema currently makes image optional unconditionally
which mismatches runtime checks in lipsync; update syncInputSchema so image is
required for all models except "ltx-2-a2v" (use a conditional/when on the model
enum to make filePathSchema.required() for the other values), and update the
LipsyncOptions type to reflect the same conditional (e.g., a discriminated union
or narrower type where image is required unless model === "ltx-2-a2v"); ensure
the image field name and the lipsync function's runtime checks remain consistent
with the schema so validation fails early instead of during execution.
In `@src/definitions/models/ltx-a2v.ts`:
- Around line 47-63: The schema currently gives num_frames a default (121) which
forces it into validated payloads despite match_audio_length defaulting to true;
remove the .default(121) on the num_frames Zod definition (leave it
.optional()/.number()/.int()/.min()/.max()) so that when match_audio_length is
true the validated object does not include num_frames by default; update any
tests or callers expecting a default to instead handle the absence of num_frames
or set it explicitly when match_audio_length is false (refer to the
match_audio_length and num_frames properties in the ltx-a2v schema).
---
Nitpick comments:
In `@src/providers/fal.ts`:
- Around line 432-447: Create a named interface (e.g., Ltx2AudioToVideoArgs) for
the anonymous parameter object used by ltx2AudioToVideo and replace the inline
type with that interface; add JSDoc above the interface describing each field
(prompt, audioUrl, imageUrl, matchAudioLength, numFrames, videoSize,
useMultiscale, fps, guidanceScale, numInferenceSteps, seed,
enablePromptExpansion, audioStrength, imageStrength) and a JSDoc comment for the
ltx2AudioToVideo method explaining the function, parameter type, and return
value; also apply the same interface/documentation pattern to the other two
occurrences referenced (around the code at the other spots noted, e.g., the
lines mentioned) so all public entry points use a named, documented interface.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e0bac096-140d-47f6-978c-2b74d4fbb1ae
📒 Files selected for processing (5)
src/ai-sdk/providers/fal.tssrc/definitions/actions/sync.tssrc/definitions/models/index.tssrc/definitions/models/ltx-a2v.tssrc/providers/fal.ts
| "ltx-2-a2v": "fal-ai/ltx-2-19b/audio-to-video", | ||
| }; |
There was a problem hiding this comment.
give ltx-2-a2v its own path here.
adding this model to the generic lip-sync bucket makes it inherit behavior that does not match src/definitions/models/ltx-a2v.ts: the branch still prefers video_url over image_url, silently drops prompt when it is missing or empty, and later keeps seed/fps behind the old isLtx2 gate. that means valid-looking sdk calls can either submit an invalid payload or lose supported controls.
suggested fix
- const isLtx2 = this.modelId === "ltx-2-19b-distilled";
+ const isLtx2 = this.modelId === "ltx-2-19b-distilled";
+ const isLtxA2v = this.modelId === "ltx-2-a2v";
@@
- if (videoFile) {
- input.video_url = await fileToUrl(videoFile);
- } else if (imageFile) {
+ if (isLtxA2v && videoFile) {
+ throw new Error(
+ "ltx-2-a2v accepts audio with an optional image, not video input",
+ );
+ }
+ if (imageFile) {
input.image_url = await fileToUrl(imageFile);
+ } else if (videoFile) {
+ input.video_url = await fileToUrl(videoFile);
}
if (audioFile) {
input.audio_url = await fileToUrl(audioFile);
}
- if (
- prompt &&
- (this.modelId === "omnihuman-v1.5" || this.modelId === "ltx-2-a2v")
- ) {
+ if (isLtxA2v && !prompt) {
+ throw new Error("ltx-2-a2v requires a prompt");
+ }
+ if (prompt && (this.modelId === "omnihuman-v1.5" || isLtxA2v)) {
input.prompt = prompt;
}
@@
- if (isLtx2) {
+ if (isLtx2 || isLtxA2v) {
input.seed = options.seed;
}
@@
- if (isLtx2) {
+ if (isLtx2 || isLtxA2v) {
input.fps = options.fps;
}Also applies to: 565-570
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai-sdk/providers/fal.ts` around lines 179 - 180, The "ltx-2-a2v" entry in
the provider map should get its own API path instead of reusing the generic
lip-sync mapping: change the mapping for the "ltx-2-a2v" key in the provider map
(found in src/ai-sdk/providers/fal.ts) to a dedicated path and handler so it
does not inherit the generic lip-sync behavior; update the request creation
logic for that key to match src/definitions/models/ltx-a2v.ts semantics (prefer
image_url over video_url, do not drop or omit prompt when it is missing/empty,
and ensure seed and fps are passed through without being gated by the old isLtx2
check). Also apply the same separate-path fix for the duplicate mappings around
the other occurrence mentioned (lines referenced 565-570) so both places use the
dedicated "ltx-2-a2v" path and payload rules.
| .enum(["wan-25", "omnihuman-v1.5", "veed-fabric-1.0", "ltx-2-a2v"]) | ||
| .optional() | ||
| .default("wan-25") | ||
| .describe("Lip sync / avatar backend model"), | ||
| image: filePathSchema.describe("Input image"), | ||
| image: filePathSchema | ||
| .optional() | ||
| .describe("Input image (optional for ltx-2-a2v)"), |
There was a problem hiding this comment.
keep the image requirement in the contract.
syncInputSchema and LipsyncOptions now allow image to be missing for every model, then lines 90-92 throw for three of them. that widens the public action contract and pushes a validation error into execution, which will confuse any cli/docs/ui that read the definition. please encode the image requirement in the schema/type instead of only re-checking it inside lipsync.
Also applies to: 57-58, 90-92
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/definitions/actions/sync.ts` around lines 19 - 25, The schema currently
makes image optional unconditionally which mismatches runtime checks in lipsync;
update syncInputSchema so image is required for all models except "ltx-2-a2v"
(use a conditional/when on the model enum to make filePathSchema.required() for
the other values), and update the LipsyncOptions type to reflect the same
conditional (e.g., a discriminated union or narrower type where image is
required unless model === "ltx-2-a2v"); ensure the image field name and the
lipsync function's runtime checks remain consistent with the schema so
validation fails early instead of during execution.
| match_audio_length: z | ||
| .boolean() | ||
| .optional() | ||
| .default(true) | ||
| .describe( | ||
| "When enabled, num_frames is calculated from audio duration and FPS", | ||
| ), | ||
| num_frames: z | ||
| .number() | ||
| .int() | ||
| .min(9) | ||
| .max(481) | ||
| .optional() | ||
| .default(121) | ||
| .describe( | ||
| "Number of frames to generate (used when match_audio_length is false)", | ||
| ), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /tmp && find . -name "ltx-a2v.ts" -type f 2>/dev/null | head -5Repository: vargHQ/sdk
Length of output: 36
🏁 Script executed:
# first let's find the file
find . -name "*.ts" -path "*/definitions/models/*" 2>/dev/null | head -20Repository: vargHQ/sdk
Length of output: 748
🏁 Script executed:
# search for the specific file
git ls-files | grep -E "definitions/models" | grep -v node_modulesRepository: vargHQ/sdk
Length of output: 710
🏁 Script executed:
# read the schema file
cat -n src/definitions/models/ltx-a2v.ts 2>/dev/null || echo "file not found"Repository: vargHQ/sdk
Length of output: 6373
🏁 Script executed:
# check how ltxA2vInputSchema is used downstream
rg "ltxA2vInputSchema|ltxA2v" --type ts --type tsx -B 2 -A 2 | head -100Repository: vargHQ/sdk
Length of output: 81
🏁 Script executed:
# check if there are tests for this schema
find . -name "*.test.ts" -o -name "*.spec.ts" | xargs grep -l "ltx.*a2v" 2>/dev/null || echo "no tests found"Repository: vargHQ/sdk
Length of output: 68
🏁 Script executed:
# check the zod version in package.json
grep -A 2 -B 2 "zod" package.jsonRepository: vargHQ/sdk
Length of output: 414
🏁 Script executed:
# check if num_frames is referenced elsewhere in the codebase
rg "num_frames" --type ts --type tsx | head -30Repository: vargHQ/sdk
Length of output: 81
🏁 Script executed:
# search for how the schema is used in the codebase
rg "ltxA2vInputSchema|ltx-a2v" -t ts | grep -v "node_modules" | head -50Repository: vargHQ/sdk
Length of output: 475
🏁 Script executed:
# check if there's any parsing or validation of this schema
rg "parse|safeParse" -t ts -A 3 -B 1 | grep -A 3 -B 1 "ltx"Repository: vargHQ/sdk
Length of output: 36
🏁 Script executed:
# verify zod 4 behavior with optional + default
cat > /tmp/test_zod.ts << 'EOF'
import { z } from "zod";
const schema = z.object({
match_audio_length: z
.boolean()
.optional()
.default(true),
num_frames: z
.number()
.int()
.optional()
.default(121),
});
// Test 1: empty object
const result1 = schema.parse({});
console.log("empty object:", JSON.stringify(result1));
// Test 2: only match_audio_length
const result2 = schema.parse({ match_audio_length: false });
console.log("match_audio_length=false:", JSON.stringify(result2));
// Test 3: both fields
const result3 = schema.parse({ match_audio_length: true, num_frames: 50 });
console.log("both fields:", JSON.stringify(result3));
EOF
cd /tmp && node -e "$(cat test_zod.ts | sed 's/import.*zod.*/const z = require("zod").z;/')" 2>&1 || echo "failed to run zod test"Repository: vargHQ/sdk
Length of output: 879
🏁 Script executed:
# check package.json for more context on how dependencies are structured
cat package.json | grep -A 10 -B 5 "devDependencies\|dependencies"Repository: vargHQ/sdk
Length of output: 901
🏁 Script executed:
# check how the schema definition is used in exports
rg "ltxA2v" -t ts | grep -E "export|import"Repository: vargHQ/sdk
Length of output: 227
🏁 Script executed:
# look for usage in actual model invocation or processing
rg "ltxA2vInputSchema" -t ts -A 5 -B 2Repository: vargHQ/sdk
Length of output: 1110
🏁 Script executed:
# check if there's any request building or parameter processing
rg "match_audio_length|num_frames" -t ts -B 3 -A 3 | head -60Repository: vargHQ/sdk
Length of output: 3179
🏁 Script executed:
# look for any provider-specific handling or API calls
fd -e ts | xargs grep -l "fal-ai/ltx" | head -5Repository: vargHQ/sdk
Length of output: 136
🏁 Script executed:
# let's check how the schema is validated before being sent to the provider
rg "ltxA2vInputSchema|parse|safeParse" -t ts -B 3 -A 5 | grep -A 10 -B 10 "ltx"Repository: vargHQ/sdk
Length of output: 2747
🏁 Script executed:
# check the full fal provider implementation for ltx-2
sed -n '1,250p' src/providers/fal.ts | head -100Repository: vargHQ/sdk
Length of output: 3181
🏁 Script executed:
# see the full context around match_audio_length and num_frames handling
rg -B 10 -A 10 "match_audio_length.*true" src/providers/fal.tsRepository: vargHQ/sdk
Length of output: 720
🏁 Script executed:
# check how the ai-sdk provider handles it
sed -n '1,200p' src/ai-sdk/providers/fal.ts | grep -A 20 "isLtx2"Repository: vargHQ/sdk
Length of output: 36
🏁 Script executed:
# look for where ltxA2vInputSchema is actually parsed/validated
rg "ltxA2vInputSchema" -t ts -A 10 -B 5Repository: vargHQ/sdk
Length of output: 2280
🏁 Script executed:
# check if the schema is used in any actual parsing/validation flow
rg "schema\.input|\.parse\(|\.safeParse" -t ts | grep -i ltxRepository: vargHQ/sdk
Length of output: 36
🏁 Script executed:
# let's see the full fal provider ltx2 section
grep -n "ltx\|LTX" src/providers/fal.ts -i | head -20Repository: vargHQ/sdk
Length of output: 372
🏁 Script executed:
# check where the inputs object comes from and how it's constructed
sed -n '200,350p' src/providers/fal.tsRepository: vargHQ/sdk
Length of output: 3954
🏁 Script executed:
# check if the schema definitions are actually used for input validation anywhere
rg "definition\.schema|ltxA2v.*parse|ltxA2v.*safeParse" -t ts -B 3 -A 3Repository: vargHQ/sdk
Length of output: 543
🏁 Script executed:
# look at how model definitions are used in the core system
find . -name "types.ts" -o -name "schema.ts" | xargs grep -l "ModelDefinition\|ZodSchema" | head -5Repository: vargHQ/sdk
Length of output: 80
🏁 Script executed:
# check the base provider or schema usage
grep -n "definition\|schema\|parse" src/core/schema/types.ts | head -40Repository: vargHQ/sdk
Length of output: 354
🏁 Script executed:
# find the validateAndPrepare function
rg "validateAndPrepare" -t ts -B 5 -A 20Repository: vargHQ/sdk
Length of output: 7683
🏁 Script executed:
# check the executor to see how inputs are validated and used
sed -n '1,150p' src/core/executor/executor.ts | tail -100Repository: vargHQ/sdk
Length of output: 2624
🏁 Script executed:
# search for where the validated inputs go after validation
rg "validation\|validateAndPrepare" -t ts -A 10 | head -80Repository: vargHQ/sdk
Length of output: 36
🏁 Script executed:
# let's see the full validator to understand how defaults are applied
cat src/core/schema/validator.tsRepository: vargHQ/sdk
Length of output: 2168
🏁 Script executed:
# check the job runner to see how inputs flow to the provider
rg "jobRunner\|inputs" src/core/executor/job.ts -B 3 -A 3 | head -80Repository: vargHQ/sdk
Length of output: 36
🏁 Script executed:
# verify by checking how the input flows through the provider to fal
sed -n '430,480p' src/providers/fal.ts | cat -nRepository: vargHQ/sdk
Length of output: 2096
🏁 Script executed:
# check if there's any downstream filtering of num_frames when match_audio_length is true
rg "match_audio_length.*true|num_frames" src/providers/fal.ts -B 2 -A 2Repository: vargHQ/sdk
Length of output: 485
🏁 Script executed:
# check how job runner passes inputs to the provider
cat src/core/executor/job.ts | head -100Repository: vargHQ/sdk
Length of output: 2322
🏁 Script executed:
# verify the complete flow from executor to fal provider
rg "jobRunner\.run|provider\.submit|provider\.\w+Audio" src/core/executor/job.ts -B 2 -A 5Repository: vargHQ/sdk
Length of output: 285
🏁 Script executed:
# let me check how the inputs actually flow - is there a direct mapping between validated inputs and provider args?
rg "ltx2AudioToVideo\|provider\." src/core/executor/job.ts -B 3 -A 3Repository: vargHQ/sdk
Length of output: 36
🏁 Script executed:
# search for where ltx2AudioToVideo gets called from
rg "ltx2AudioToVideo" -t tsRepository: vargHQ/sdk
Length of output: 378
🏁 Script executed:
# check if there's special routing or method calling based on model name
rg "ltx.*audio\|ltx2\|ltx-2" -t ts -i | grep -E "method|function|call" | head -20Repository: vargHQ/sdk
Length of output: 36
🏁 Script executed:
# check the base provider implementation
cat src/providers/base.ts | head -150Repository: vargHQ/sdk
Length of output: 3667
🏁 Script executed:
# specifically check how submit method handles inputs
rg "class.*Provider|submit\(" src/providers/base.ts -B 2 -A 10 | head -80Repository: vargHQ/sdk
Length of output: 969
🏁 Script executed:
# final check: verify what the actual fal.queue.submit receives
grep -n "fal\.queue\|fal\.subscribe" src/providers/fal.ts | head -20Repository: vargHQ/sdk
Length of output: 968
🏁 Script executed:
# check how FalProvider.submit actually works - does it do any model-specific handling?
sed -n '1,100p' src/providers/fal.ts | grep -A 30 "async submit"Repository: vargHQ/sdk
Length of output: 1041
keep num_frames out of the default path. meow
match_audio_length defaults to true, but num_frames also defaults to 121, so the validated payload always carries both knobs. that makes the schema contradict the intended "match audio length by default" behavior. when inputs flow through the executor, they get validated with defaults applied and sent directly to fal without any filtering—so fal receives both knobs set, which can pin callers to 121 frames.
suggested fix
-const ltxA2vInputSchema = z.object({
+const ltxA2vInputSchema = z
+ .object({
prompt: z.string().describe("The prompt to generate the video from"),
audio_url: urlSchema.describe(
"The URL of the audio to generate the video from",
),
@@
num_frames: z
.number()
.int()
.min(9)
.max(481)
.optional()
- .default(121)
.describe(
"Number of frames to generate (used when match_audio_length is false)",
),
@@
video_quality: videoQualitySchema
.optional()
.default("high")
.describe("Output video quality"),
-});
+ })
+ .refine(
+ ({ match_audio_length, num_frames }) =>
+ match_audio_length !== false || num_frames !== undefined,
+ {
+ path: ["num_frames"],
+ message: "num_frames is required when match_audio_length is false",
+ },
+ );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/definitions/models/ltx-a2v.ts` around lines 47 - 63, The schema currently
gives num_frames a default (121) which forces it into validated payloads despite
match_audio_length defaulting to true; remove the .default(121) on the
num_frames Zod definition (leave it .optional()/.number()/.int()/.min()/.max())
so that when match_audio_length is true the validated object does not include
num_frames by default; update any tests or callers expecting a default to
instead handle the absence of num_frames or set it explicitly when
match_audio_length is false (refer to the match_audio_length and num_frames
properties in the ltx-a2v schema).
Summary
Add LTX-2 19B audio-to-video as a new lipsync model (
ltx-2-a2v). Takes audio + prompt as required inputs, optional image as first frame, and generates a synced video.fal-ai/ltx-2-19b/audio-to-videoltx-2-a2vChanges
definitions/models/ltx-a2v.ts): Full Zod input/output schema covering all fal API params (camera LoRA, multiscale, audio strength, etc.)definitions/models/index.ts)ai-sdk/providers/fal.ts): Added toLIPSYNC_MODELSmap + prompt passthrough (like omnihuman)providers/fal.ts): Newltx2AudioToVideo()method with all relevant paramsdefinitions/actions/sync.ts): Addedltx-2-a2vto model enum, madeimageoptional (only ltx-2-a2v doesn't require it), added validation for non-ltx modelsKey design decisions
fal-ai/ltx-2-19b/audio-to-video) since there's no distilled variant for a2vmatch_audio_length: trueby default so video auto-matches audio durationCompanion PR
feature/ltx-2-a2vbranch (model routing, pricing, prompt passthrough, OpenAPI spec)