Auto-Scaling Policies

Auto-scaling policies enable your environment to automatically adjust the number of running instances based on real-time performance metrics. This ensures your application maintains optimal performance during traffic spikes while reducing costs during low-demand periods.

Quant Cloud supports auto-scaling based on three key metrics:

CPU Utilization Scales based on average CPU usage across all instances in the environment. Ideal for CPU-intensive applications or when processing load correlates with CPU demand.

Memory Utilization Scales based on average memory consumption across instances. Useful for applications with variable memory requirements or data processing workloads.

Requests Per Second (RPS) Scales based on incoming request volume to your application. Perfect for web applications where traffic patterns directly indicate scaling needs.

Setting Up Auto-Scaling

Access Scaling Configuration

Navigate to your environment’s details page
Click “Edit Config” to modify scaling settings
Locate the scaling configuration section

Enable Auto-Scaling

If currently using fixed instance count, switch to auto-scaling mode
Set Min Instances: The minimum number of instances that will always run
Set Max Instances: The maximum number of instances during peak demand
Your application will scale between these boundaries

Configure Scaling Policies For each scaling metric you want to use:

Select Metric: Choose from CPU Utilization, Memory Utilization, or Requests Per Second
Set Target Value: Define the target threshold that triggers scaling:
- CPU Utilization: Percentage (e.g., 75 for 75% average CPU usage)
- Memory Utilization: Percentage (e.g., 70 for 70% average memory usage)
- Requests Per Second: Number per task (e.g., 10 requests/sec per instance)
Save Policy: Click “Save” to activate the scaling policy

Multiple Scaling Policies You can configure multiple policies simultaneously:

Scale-out occurs when any policy threshold is exceeded
Scale-in happens when all metrics are below their targets
Each policy operates independently to provide comprehensive scaling coverage

How Auto-Scaling Works

Scale-Out Behavior When any configured metric exceeds its target value for a sustained period, Quant Cloud automatically launches additional instances up to your defined maximum. This ensures your application can handle increased demand without performance degradation.

Scale-In Behavior
When metrics fall below target values and remain stable, the platform removes instances down to your defined minimum. This optimizes costs during low-demand periods while maintaining service availability.

Scaling Logic

Scaling decisions are based on average metrics across all running instances
The platform includes built-in cooldown periods to prevent rapid scaling fluctuations
Multiple policies work together to provide comprehensive coverage of different load patterns

Monitoring Auto-Scaling Performance

Environment Details Track current scaling status through your environment’s Details tab:

Current running instance count
Real-time CPU, memory, and RPS metrics
Scaling status and recent scaling events

Detailed Metrics Analysis For comprehensive performance monitoring and historical scaling data, use the metrics and logging tools covered in the Monitoring Application Health section. These tools provide:

Historical performance trends
Scaling event correlation with metrics
Detailed resource utilization analysis

Best Practices

Start Conservative Begin with moderate target values and observe your application’s behavior. You can always adjust thresholds based on real performance data.

Monitor and Adjust Regularly review scaling events and metric trends to ensure your policies are effective. Look for patterns of excessive scaling or insufficient responsiveness.

Consider Application Characteristics

Startup Time: Applications with longer startup times may need more aggressive scale-out policies
Resource Usage: Match scaling metrics to your application’s primary resource constraints
Traffic Patterns: Configure policies that align with your typical usage patterns

Test Your Configuration Use load testing tools to validate your scaling policies under controlled conditions before relying on them in production.

Auto-scaling provides an effective way to balance performance and cost efficiency, automatically adapting your infrastructure to match demand patterns.