Skip to content

Retries & Error Handling

Every schedule and job has two retry-related fields:

FieldDefaultDescription
max_retries3Number of retry attempts after first failure. 0 disables retries.
timeout30Seconds before an execution is considered timed out.

Set these per schedule or per job at creation time, or update them on an existing schedule via PATCH.

EventRetries?Why
Push: endpoint returns 5xxYesServer error, likely transient
Push: endpoint times outYesMay be temporary overload
Push: network error (DNS, connection refused)YesInfrastructure issue, likely transient
Push: endpoint returns 4xxNoClient error. Won’t fix itself on retry
Pull: handler throws an errorYesReported as failure, retry scheduled
Pull: lease expires (no result reported)YesWorker may have crashed
Retries exhaustedNoJob transitions to failed permanently

Retries use exponential backoff with jitter, capped at 1 hour:

baseDelay = min(1000ms × 2^(attempt - 1), 3,600,000ms)
jitter = random(0, baseDelay)
delay = min(baseDelay + jitter, 3,600,000ms)
AttemptBase delayActual range
11s1-2s
22s2-4s
34s4-8s
48s8-16s
516s16-32s
10~17 min17-34 min
13+1 hourexactly 1 hour (capped)

The jitter prevents thundering-herd when many jobs fail simultaneously.

Execution fails
├── Retries remaining? ──▶ Yes: schedule new execution after backoff delay
│ Job status → retrying
│ New execution created with trigger: system_retry
└── Retries exhausted? ──▶ Job status → failed (terminal)

Each retry creates a new execution. ctx.attempt tells your handler which attempt this is (1-indexed).

{
"name": "Fire-and-forget notification",
"handler": "notify",
"cron": "0 * * * *",
"max_retries": 0
}
{
"name": "Monthly billing",
"handler": "charge-customer",
"cron": "0 0 1 * *",
"max_retries": 8,
"timeout": 60
}

With 8 retries, the final attempt happens roughly 4–8 hours after the first failure (due to exponential backoff).

One-off jobs can have different retry settings than their schedule pattern:

Terminal window
curl -X POST https://api.chronos.sh/v1/jobs \
-H "Authorization: Bearer chrns_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "Critical charge",
"handler": "charge-customer",
"max_retries": 10,
"timeout": 120,
"payload": { "invoiceId": "inv_123" }
}'

If your endpoint doesn’t respond within timeout seconds, Chronos aborts the request and marks the execution as timeout. A retry is scheduled if attempts remain.

The SDK does not enforce timeouts. Your handler runs as long as it needs. However, Chronos tracks a lease on the server:

  1. When a job is claimed, a lease is set: lease_expires_at = now + timeout
  2. A background sweep checks for expired leases every 30 seconds
  3. If the lease expired without a result, the execution is marked timeout
  4. A retry is scheduled if attempts remain

If your handler routinely takes longer than 30 seconds, increase timeout so the lease doesn’t expire while you’re still working.

Your handler’s behavior determines the execution outcome:

chronos.worker.handle('process-payment', async (ctx) => {
// Throw to trigger a retry (if attempts remain)
const result = await chargeCustomer(ctx.payload.customerId);
if (!result.success) {
throw new Error(`Charge failed: ${result.error}`);
}
// Return to mark as completed
return { chargeId: result.id };
});
  • Return a value (or void) → execution completed
  • Throw an error → execution failed, error message captured (truncated to 4KB), retry scheduled if attempts remain

Distinguish retryable vs terminal failures

Section titled “Distinguish retryable vs terminal failures”

If you know a failure is permanent (bad data, invalid state), you might want to avoid wasting retries. Since Chronos always retries on handler failure (until exhausted), design your handler to handle terminal cases gracefully:

chronos.worker.handle('send-email', async (ctx) => {
const user = await db.users.findById(ctx.payload.userId);
// Terminal: user deleted — no point retrying
if (!user) {
console.warn(`User ${ctx.payload.userId} not found, skipping`);
return { skipped: true, reason: 'user_not_found' };
}
// This might throw on transient network issues → retry is appropriate
await emailService.send(user.email, ctx.payload.template);
return { sent: true };
});

When building around the SDK, these error types help you handle different failure modes:

ErrorWhenYour action
ChronosConfigErrorInvalid options at constructionFix your config. Thrown at startup
ChronosApiErrorAPI returned an error (non-2xx or success: false)Check .status and .code
ChronosNetworkErrorFetch failed (DNS, TCP, TLS)Transient. SDK retries poll automatically
ChronosHandlerErrorYour handler threwLogged internally, failure reported to API

The SDK handles poll-loop errors internally (logs + retries after retryDelayMs). You don’t need to catch these. The worker keeps running.

Use the executions list endpoint to find failed jobs:

Terminal window
curl "https://api.chronos.sh/v1/executions?status=failed" \
-H "Authorization: Bearer chrns_your_api_key"

Each failed execution includes:

  • error: the error message (from your handler’s thrown error or the HTTP response)
  • response_code: HTTP status code (push delivery only)
  • duration_ms: how long the execution ran before failing
  • trigger: system (first attempt) or system_retry (retry)