Skip to content

Auto-Scaling Policies

Auto-scaling policies enable your environment to automatically adjust the number of running instances based on real-time performance metrics. This ensures your application maintains optimal performance during traffic spikes while reducing costs during low-demand periods.

Quant Cloud supports auto-scaling based on three key metrics:

CPU Utilization Scales based on average CPU usage across all instances in the environment. Ideal for CPU-intensive applications or when processing load correlates with CPU demand.

Memory Utilization Scales based on average memory consumption across instances. Useful for applications with variable memory requirements or data processing workloads.

Requests Per Second (RPS) Scales based on incoming request volume to your application. Perfect for web applications where traffic patterns directly indicate scaling needs.

Setting Up Auto-Scaling

Access Scaling Configuration

  1. Navigate to your environment’s details page
  2. Click “Edit Config” to modify scaling settings
  3. Locate the scaling configuration section

Enable Auto-Scaling

  1. If currently using fixed instance count, switch to auto-scaling mode
  2. Set Min Instances: The minimum number of instances that will always run
  3. Set Max Instances: The maximum number of instances during peak demand
  4. Your application will scale between these boundaries

Configure Scaling Policies For each scaling metric you want to use:

  1. Select Metric: Choose from CPU Utilization, Memory Utilization, or Requests Per Second

  2. Set Target Value: Define the target threshold that triggers scaling:

    • CPU Utilization: Percentage (e.g., 75 for 75% average CPU usage)
    • Memory Utilization: Percentage (e.g., 70 for 70% average memory usage)
    • Requests Per Second: Number per task (e.g., 10 requests/sec per instance)
  3. Save Policy: Click “Save” to activate the scaling policy

Multiple Scaling Policies You can configure multiple policies simultaneously:

  • Scale-out occurs when any policy threshold is exceeded
  • Scale-in happens when all metrics are below their targets
  • Each policy operates independently to provide comprehensive scaling coverage

How Auto-Scaling Works

Scale-Out Behavior When any configured metric exceeds its target value for a sustained period, Quant Cloud automatically launches additional instances up to your defined maximum. This ensures your application can handle increased demand without performance degradation.

Scale-In Behavior
When metrics fall below target values and remain stable, the platform removes instances down to your defined minimum. This optimizes costs during low-demand periods while maintaining service availability.

Scaling Logic

  • Scaling decisions are based on average metrics across all running instances
  • The platform includes built-in cooldown periods to prevent rapid scaling fluctuations
  • Multiple policies work together to provide comprehensive coverage of different load patterns

Monitoring Auto-Scaling Performance

Environment Details Track current scaling status through your environment’s Details tab:

  • Current running instance count
  • Real-time CPU, memory, and RPS metrics
  • Scaling status and recent scaling events

Detailed Metrics Analysis For comprehensive performance monitoring and historical scaling data, use the metrics and logging tools covered in the Monitoring Application Health section. These tools provide:

  • Historical performance trends
  • Scaling event correlation with metrics
  • Detailed resource utilization analysis

Best Practices

Start Conservative Begin with moderate target values and observe your application’s behavior. You can always adjust thresholds based on real performance data.

Monitor and Adjust Regularly review scaling events and metric trends to ensure your policies are effective. Look for patterns of excessive scaling or insufficient responsiveness.

Consider Application Characteristics

  • Startup Time: Applications with longer startup times may need more aggressive scale-out policies
  • Resource Usage: Match scaling metrics to your application’s primary resource constraints
  • Traffic Patterns: Configure policies that align with your typical usage patterns

Test Your Configuration Use load testing tools to validate your scaling policies under controlled conditions before relying on them in production.

Auto-scaling provides an effective way to balance performance and cost efficiency, automatically adapting your infrastructure to match demand patterns.