Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .changeset/quiet-plums-speak.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
"@workflow/sveltekit": patch
"@workflow/builders": patch
"@workflow/errors": patch
"@workflow/core": patch
"@workflow/next": patch
---

Increase flow route limit to max fluid duration and fail run if a single replay exceeds 300s
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: The changeset says "exceeds 300s" but REPLAY_TIMEOUT_MS is 240_000 (240s). Should be:

Increase flow route limit to max fluid duration and fail run if a single replay exceeds 240s

2 changes: 1 addition & 1 deletion packages/builders/src/vercel-build-output-api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ export class VercelBuildOutputAPIBuilder extends BaseBuilder {
// Create package.json and .vc-config.json for workflows function
await this.createPackageJson(workflowsFuncDir, 'commonjs');
await this.createVcConfig(workflowsFuncDir, {
maxDuration: 60,
maxDuration: 'max',
experimentalTriggers: [WORKFLOW_QUEUE_TRIGGER],
runtime: this.config.runtime,
});
Expand Down
38 changes: 37 additions & 1 deletion packages/core/src/runtime.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,10 @@ import {
WorkflowRuntimeError,
} from '@workflow/errors';
import { classifyRunError } from './classify-error.js';
import { MAX_QUEUE_DELIVERIES } from './runtime/constants.js';
import {
MAX_QUEUE_DELIVERIES,
REPLAY_TIMEOUT_MS,
} from './runtime/constants.js';
import { parseWorkflowName } from '@workflow/utils/parse-name';
import {
type Event,
Expand Down Expand Up @@ -161,6 +164,37 @@ export function workflowEntrypoint(

const spanLinks = await linkToCurrentContext();

// --- Replay timeout guard ---
// If the replay takes longer than the timeout, fail the run and exit.
// This must be lower than the function's maxDuration (180s) to ensure
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: This comment says maxDuration (180s) but the flow route is now maxDuration: 'max' (this very PR changes it). The comment should be updated to reflect reality — something like:

// This must be lower than the function's maxDuration to ensure
// the failure is recorded before the platform kills the function.
// With maxDuration: 'max', the platform limit depends on the plan
// (e.g. 300s on Pro). 240s leaves at least 60s of headroom on Pro.

// the failure is recorded before the platform kills the function.
const replayTimeout = setTimeout(async () => {
runtimeLogger.error('Workflow replay exceeded timeout', {
workflowRunId: runId,
timeoutMs: REPLAY_TIMEOUT_MS,
});
try {
const world = getWorld();
await world.events.create(
runId,
{
eventType: 'run_failed',
specVersion: SPEC_VERSION_CURRENT,
eventData: {
error: {
message: `Workflow replay exceeded maximum duration (${REPLAY_TIMEOUT_MS / 1000}s)`,
},
errorCode: RUN_ERROR_CODES.REPLAY_TIMEOUT,
},
},
{ requestId }
);
} catch {
// Best effort — process exits regardless
}
process.exit(1);
}, REPLAY_TIMEOUT_MS);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: This is the first use of process.exit(1) in the runtime (as opposed to CLI code). It's acceptable here since the timeout is a last resort, but consider adding replayTimeout.unref() after this line (or on line 196 itself). An unref'd timer won't keep the Node.js event loop alive, so if the normal workflow path completes but the .finally(() => clearTimeout(...)) somehow doesn't execute (e.g., an uncaught exception in the promise chain), the process can still exit naturally instead of hanging for 240s.

This is purely defensive — the .finally() should always run in practice — but it's cheap and eliminates a class of potential hangs.


// Invoke user workflow within the propagated trace context and baggage
return await withTraceContext(traceContext, async () => {
// Set workflow context as baggage for automatic propagation
Expand Down Expand Up @@ -525,6 +559,8 @@ export function workflowEntrypoint(
); // End trace
}
); // End withWorkflowBaggage
}).finally(() => {
clearTimeout(replayTimeout);
}); // End withTraceContext
}
);
Expand Down
7 changes: 7 additions & 0 deletions packages/core/src/runtime/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,10 @@
// At 48 attempts the total elapsed time is approximately 20 hours, which is
// safely under the 24-hour message visibility limit.
export const MAX_QUEUE_DELIVERIES = 48;

// Maximum time allowed for a single workflow replay execution (in ms).
// If a replay exceeds this duration, the run is failed and the process exits.
// This must be lower than the function's maxDuration to ensure the
// timeout handler has time to post the run_failed event before the platform
// kills the function.
export const REPLAY_TIMEOUT_MS = 240_000;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: Worth noting that on the Hobby plan, maxDuration: 'max' resolves to 60s, so the 240s timeout will never fire — the platform hard-kills the function first, leaving no run_failed event. The run will eventually fail via MAX_QUEUE_DELIVERIES (48 retries), but there won't be a clear REPLAY_TIMEOUT error code.

This is probably acceptable — Hobby plan workflows are expected to be short-lived — but a short comment here explaining the plan-dependent behavior would help future readers. For example:

// Note: On plans where maxDuration < REPLAY_TIMEOUT_MS (e.g., Hobby at 60s),
// the platform will kill the function before this fires. In that case, VQS
// retries handle the failure via MAX_QUEUE_DELIVERIES.

2 changes: 2 additions & 0 deletions packages/errors/src/error-codes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ export const RUN_ERROR_CODES = {
RUNTIME_ERROR: 'RUNTIME_ERROR',
/** Run exceeded the maximum number of queue deliveries */
MAX_DELIVERIES_EXCEEDED: 'MAX_DELIVERIES_EXCEEDED',
/** Workflow replay exceeded the maximum allowed duration */
REPLAY_TIMEOUT: 'REPLAY_TIMEOUT',
} as const;

export type RunErrorCode =
Expand Down
2 changes: 1 addition & 1 deletion packages/next/src/builder-deferred.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1109,7 +1109,7 @@ export async function getNextBuilderDeferred() {
experimentalTriggers: [STEP_QUEUE_TRIGGER],
},
workflows: {
maxDuration: 60,
maxDuration: 'max',
experimentalTriggers: [WORKFLOW_QUEUE_TRIGGER],
},
};
Expand Down
2 changes: 1 addition & 1 deletion packages/next/src/builder-eager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -420,7 +420,7 @@ export async function getNextBuilderEager() {
experimentalTriggers: [STEP_QUEUE_TRIGGER],
},
workflows: {
maxDuration: 60,
maxDuration: 'max',
experimentalTriggers: [WORKFLOW_QUEUE_TRIGGER],
},
};
Expand Down
2 changes: 1 addition & 1 deletion packages/sveltekit/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ process.on('beforeExit', () => {
{
file: '.vercel/output/functions/.well-known/workflow/v1/flow.func/.vc-config.json',
config: {
maxDuration: 60,
maxDuration: 'max',
experimentalTriggers: [
{
type: 'queue/v2beta',
Expand Down
Loading