Job Manager Configuration
This page details the configuration options for the AsyncEndpoints Job Manager, which handles job lifecycle management, retry logic, and claim operations.
Overview
The Job Manager configuration controls how jobs are managed throughout their lifecycle, including retry mechanisms, timeout handling, and job claiming behavior. These settings are crucial for reliability and performance.
Job Manager Configuration Properties
DefaultMaxRetries
- Type:
int - Default:
3 - Description: Maximum number of retries for failed jobs before marking as failed permanently
- Impact: Affects system resilience and job completion rates
// No retries - fail immediately
options.JobManagerConfiguration.DefaultMaxRetries = 0;
// Standard 3 retries (default)
options.JobManagerConfiguration.DefaultMaxRetries = 3;
// More retries for critical operations
options.JobManagerConfiguration.DefaultMaxRetries = 10;
RetryDelayBaseSeconds
- Type:
double - Default:
2.0 - Description: Base delay in seconds for exponential backoff retry mechanism
- Impact: Affects retry timing and system load during retries
The retry delay follows an exponential backoff pattern: (2 ^ retryCount) * baseDelay
// Fast retries for transient issues
options.JobManagerConfiguration.RetryDelayBaseSeconds = 0.5; // 0.5s, 1s, 2s, 4s...
// Standard retry timing (default)
options.JobManagerConfiguration.RetryDelayBaseSeconds = 2.0; // 2s, 4s, 8s, 16s...
// Slow retries for expensive operations
options.JobManagerConfiguration.RetryDelayBaseSeconds = 10.0; // 10s, 20s, 40s, 80s...
JobClaimTimeout
- Type:
TimeSpan - Default:
TimeSpan.FromMinutes(5) - Description: Timeout for job claim operations to prevent indefinite claiming
- Impact: Affects distributed job recovery timing
// Short timeout for fast recovery
options.JobManagerConfiguration.JobClaimTimeout = TimeSpan.FromMinutes(1);
// Standard timeout (default)
options.JobManagerConfiguration.JobClaimTimeout = TimeSpan.FromMinutes(5);
// Long timeout for complex operations
options.JobManagerConfiguration.JobClaimTimeout = TimeSpan.FromMinutes(15);
MaxConcurrentJobs
- Type:
int - Default:
10 - Description: Maximum number of concurrent jobs that can be processed
- Impact: Controls resource usage and system load
// Conservative processing
options.JobManagerConfiguration.MaxConcurrentJobs = 5;
// Standard processing (default)
options.JobManagerConfiguration.MaxConcurrentJobs = 10;
// High throughput processing
options.JobManagerConfiguration.MaxConcurrentJobs = 50;
JobPollingIntervalMs
- Type:
int - Default:
1000(1 second) - Description: Polling interval in milliseconds for job status checks
- Impact: Affects responsiveness vs. system load
// Fast polling for responsive systems
options.JobManagerConfiguration.JobPollingIntervalMs = 100; // 100ms
// Standard polling (default)
options.JobManagerConfiguration.JobPollingIntervalMs = 1000; // 1s
// Conservative polling to reduce load
options.JobManagerConfiguration.JobPollingIntervalMs = 5000; // 5s
MaxClaimBatchSize
- Type:
int - Default:
10 - Description: Maximum number of jobs to claim in a single batch operation
- Impact: Affects batch processing efficiency
// Small batches for responsiveness
options.JobManagerConfiguration.MaxClaimBatchSize = 1;
// Standard batch size (default)
options.JobManagerConfiguration.MaxClaimBatchSize = 10;
// Large batches for throughput
options.JobManagerConfiguration.MaxClaimBatchSize = 50;
StaleJobClaimCheckInterval
- Type:
TimeSpan - Default:
TimeSpan.FromMinutes(1) - Description: Interval for checking for stale job claims during distributed recovery
- Impact: Affects recovery frequency for stuck jobs
// Frequent checks for fast recovery
options.JobManagerConfiguration.StaleJobClaimCheckInterval = TimeSpan.FromSeconds(30);
// Standard check interval (default)
options.JobManagerConfiguration.StaleJobClaimCheckInterval = TimeSpan.FromMinutes(1);
// Less frequent checks to reduce load
options.JobManagerConfiguration.StaleJobClaimCheckInterval = TimeSpan.FromMinutes(5);
Retry Configuration Examples
Fast Retry Strategy
For operations that fail due to transient issues:
options.JobManagerConfiguration.DefaultMaxRetries = 5;
options.JobManagerConfiguration.RetryDelayBaseSeconds = 1.0; // Quick retries: 1s, 2s, 4s, 8s, 16s
Conservative Retry Strategy
For operations where failures might be persistent:
options.JobManagerConfiguration.DefaultMaxRetries = 2;
options.JobManagerConfiguration.RetryDelayBaseSeconds = 5.0; // Slower retries: 5s, 10s, 20s
Aggressive Retry Strategy
For critical operations that must succeed:
options.JobManagerConfiguration.DefaultMaxRetries = 10;
options.JobManagerConfiguration.RetryDelayBaseSeconds = 2.0; // Standard timing with many retries
Configuration for Different Scenarios
Development Configuration
options.JobManagerConfiguration.DefaultMaxRetries = 1; // Fewer retries for faster debugging
options.JobManagerConfiguration.RetryDelayBaseSeconds = 1.0; // Faster retries during development
options.JobManagerConfiguration.JobClaimTimeout = TimeSpan.FromMinutes(1); // Faster timeouts
options.JobManagerConfiguration.MaxConcurrentJobs = 5; // Lower limits for development
options.JobManagerConfiguration.JobPollingIntervalMs = 500; // More frequent polling
options.JobManagerConfiguration.MaxClaimBatchSize = 5; // Smaller batches
options.JobManagerConfiguration.StaleJobClaimCheckInterval = TimeSpan.FromMinutes(1); // Standard checks
Production Configuration for Critical Operations
options.JobManagerConfiguration.DefaultMaxRetries = 5; // More retries for resilience
options.JobManagerConfiguration.RetryDelayBaseSeconds = 2.0; // Standard exponential backoff
options.JobManagerConfiguration.JobClaimTimeout = TimeSpan.FromMinutes(10); // Longer for complex ops
options.JobManagerConfiguration.MaxConcurrentJobs = 50; // Higher concurrency for throughput
options.JobManagerConfiguration.JobPollingIntervalMs = 2000; // Balanced polling
options.JobManagerConfiguration.MaxClaimBatchSize = 20; // Larger batches for efficiency
options.JobManagerConfiguration.StaleJobClaimCheckInterval = TimeSpan.FromMinutes(2); // Regular checks
High-Throughput Configuration
options.JobManagerConfiguration.DefaultMaxRetries = 3; // Standard retries
options.JobManagerConfiguration.RetryDelayBaseSeconds = 1.5; // Faster initial retries
options.JobManagerConfiguration.JobClaimTimeout = TimeSpan.FromMinutes(5); // Standard timeout
options.JobManagerConfiguration.MaxConcurrentJobs = 100; // High concurrency
options.JobManagerConfiguration.JobPollingIntervalMs = 500; // Fast polling
options.JobManagerConfiguration.MaxClaimBatchSize = 50; // Large batches
options.JobManagerConfiguration.StaleJobClaimCheckInterval = TimeSpan.FromMinutes(1); // Regular checks
Performance Considerations
Retry Settings Impact
- Higher
DefaultMaxRetries: Better success rate but longer processing time - Lower
RetryDelayBaseSeconds: Faster retries but more system load during failures - Higher
RetryDelayBaseSeconds: Less load during failures but longer total processing time
Concurrency Settings Impact
- Higher
MaxConcurrentJobs: Better throughput but higher resource usage - Lower
MaxConcurrentJobs: Lower resource usage but potential queue buildup
Polling Settings Impact
- Higher
JobPollingIntervalMs: Less system load but less responsive - Lower
JobPollingIntervalMs: More responsive but higher system load
Distributed System Considerations
For multi-instance deployments:
// In larger clusters, you might reduce some settings to avoid contention
options.JobManagerConfiguration.MaxClaimBatchSize = 5; // Smaller batches reduce contention
options.JobManagerConfiguration.JobPollingIntervalMs = 2000; // Reduce check frequency
options.JobManagerConfiguration.StaleJobClaimCheckInterval = TimeSpan.FromMinutes(2); // Balance recovery time
Monitoring and Adjustment
Configure based on observed behavior:
// Example: Adjust based on failure patterns
if (failureRate > 0.1) // If more than 10% of jobs fail
{
options.JobManagerConfiguration.DefaultMaxRetries = 5; // Increase retries
options.JobManagerConfiguration.RetryDelayBaseSeconds = 3.0; // Slower retries
}
else
{
options.JobManagerConfiguration.DefaultMaxRetries = 2; // Reduce retries for healthy system
options.JobManagerConfiguration.RetryDelayBaseSeconds = 1.0; // Faster retries
}
Error Handling Configuration
The job manager configuration also impacts error handling patterns:
// For operations where some failures are acceptable
options.JobManagerConfiguration.DefaultMaxRetries = 1;
options.JobManagerConfiguration.RetryDelayBaseSeconds = 1.0;
// For operations where success is critical
options.JobManagerConfiguration.DefaultMaxRetries = 7;
options.JobManagerConfiguration.RetryDelayBaseSeconds = 2.0;
Integration with Worker Configuration
Job manager settings should align with worker configuration:
options.WorkerConfigurations.MaximumConcurrency = 10;
options.JobManagerConfiguration.MaxConcurrentJobs = 15; // Slightly higher to allow for queuing
options.WorkerConfigurations.JobTimeoutMinutes = 30;
options.JobManagerConfiguration.JobClaimTimeout = TimeSpan.FromMinutes(15); // Shorter than job timeout
Troubleshooting Common Issues
Too Many Retries
- Reduce
DefaultMaxRetriesif failures are persistent - Increase
RetryDelayBaseSecondsto reduce system load during retries
Slow Recovery from Failures
- Decrease
RetryDelayBaseSecondsfor faster initial retries - Increase
StaleJobClaimCheckIntervalif recovery is too aggressive
Resource Exhaustion
- Decrease
MaxConcurrentJobsto limit concurrent processing - Increase
JobPollingIntervalMsto reduce system load
Job Stuck in Processing
- Verify
JobClaimTimeoutis appropriate for your processing times - Adjust
StaleJobClaimCheckIntervalfor faster detection of stuck jobs
Proper job manager configuration is essential for achieving the right balance between system resilience, performance, and resource utilization.