# Warm Pool

C3's warm pool is the key to achieving a **"mounted-like" GPU development experience**. Instead of waiting minutes for a VM to provision, your code starts running in seconds.

## The Problem with Cold Provisioning[​](#the-problem-with-cold-provisioning "Direct link to The Problem with Cold Provisioning")

Traditional cloud GPU workflows require spinning up a fresh VM for every job:

1. **Request VM** from cloud provider
2. **Wait for allocation** (1-10 min)
3. **Boot the VM** (1-5 min)
4. **Initialize GPU drivers** (1-10 min)
5. **Download your code** (seconds to minutes)
6. **Finally run your job**

This adds up to **5-45 minutes of waiting** before your code even starts. For iterative development—tuning hyperparameters, debugging training loops, testing model changes—this latency kills productivity.

## How the Warm Pool Works[​](#how-the-warm-pool-works "Direct link to How the Warm Pool Works")

C3 maintains a fleet of **pre-provisioned GPUs**. These VMs are:

* Already booted and initialized
* GPU drivers loaded and ready
* C3 agent running and polling for work
* Network configured with fast access to our control plane

When you submit a job, the scheduler finds an idle warm GPU and assigns it inline—**allocation takes under a second**, with total startup of \~20 seconds (agent heartbeat + code download):

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           C3 WARM POOL ARCHITECTURE                         │
└─────────────────────────────────────────────────────────────────────────────┘

                                    ┌─────────────┐
                                    │   Your Job  │
                                    │  c3 deploy  │
                                    └──────┬──────┘
                                           │
                                           ▼
┌──────────┐                      ┌─────────────────┐
│          │    Job submitted     │                 │
│   You    │ ──────────────────▶  │  C3 Control     │
│          │   allocation < 1s    │  Plane          │
└──────────┘                      └────────┬────────┘
                                           │
                                           ▼
                              ┌──────────────────────┐
                              │   GPU PROVIDER(S)    │
                              │  ┌────┐ ┌────┐       │
                              │  │GPU │ │GPU │  ...  │
                              │  │ ✓  │ │ ✓  │       │
                              │  └────┘ └────┘       │
                              │    Warm Pool         │
                              └──────────┬───────────┘
                                         │
                                         ▼
                              ┌───────────────────┐
                              │  Job runs on      │
                              │  first available  │
                              │  warm GPU         │
                              └───────────────────┘
```

### Warm Path vs Cold Path[​](#warm-path-vs-cold-path "Direct link to Warm Path vs Cold Path")

| Metric              | Warm Pool    | Cold Provision                  |
| ------------------- | ------------ | ------------------------------- |
| **Allocation time** | < 1 second   | 5-45 minutes                    |
| **Total startup**   | \~20 seconds | 5-45 minutes                    |
| **When used**       | Most jobs    | Pool exhausted / rare GPU types |

## Sub-Second Allocation[​](#sub-second-allocation "Direct link to Sub-Second Allocation")

When a warm GPU is available, job allocation is **nearly instantaneous**:

1. **You submit** → Job hits the C3 control plane
2. **Sub-second allocation** → Scheduler finds an idle warm GPU and assigns your job
3. **Agent pickup (\~5s)** → Agent picks up the job on its next heartbeat
4. **Code download** → Bundle is fetched and extracted
5. **Your code runs** → Execution begins, logs stream back immediately

This transforms GPU development from a **batch workflow** into an **interactive one**. It feels like having a GPU mounted to your local machine.

## Pool Scaling[​](#pool-scaling "Direct link to Pool Scaling")

The warm pool scales with demand. C3 uses a baseline target per GPU profile plus demand-based scaling:

* **Low demand**: Pool holds a baseline number of warm GPUs
* **High demand**: Pool scales up to meet job volume (with a configurable cap)
* **Burst load**: Overflow goes to cold provisioning for currently available profiles (still works, just slower)

## When Cold Provisioning Happens[​](#when-cold-provisioning-happens "Direct link to When Cold Provisioning Happens")

Sometimes jobs use cold provisioning instead:

* **Pool exhausted**: All warm GPUs are busy during high demand
* **Rare GPU type**: Specialized hardware not kept warm
* **Unusual demand patterns**: Spikes beyond the baseline pool capacity

For currently available GPU profiles, cold provisioning still works—your job will run, it just takes longer to start. C3 automatically falls back to cold provisioning when needed.

## Best Practices[​](#best-practices "Direct link to Best Practices")

### Maximize Warm Pool Benefits[​](#maximize-warm-pool-benefits "Direct link to Maximize Warm Pool Benefits")

1. **Match your plan to your GPU** — Pro and Max keep L40 warm; Free uses cold provisioning only.
2. **Keep jobs small** — Finish faster, return GPU to pool for others
3. **Use `--follow`** — See real-time logs as your job runs
4. **Submit during active hours** — Pool is warmest when others are using it too

### Understand the Timing[​](#understand-the-timing "Direct link to Understand the Timing")

* **Allocation time**: How long from submission until a GPU is assigned (seconds for warm, minutes for cold)
* **Total startup**: Time from submission until your code starts running (\~20 seconds for warm, minutes for cold)
* **Runtime**: Your actual code execution
* **Total time**: Everything from submit to completion

## The Development Experience[​](#the-development-experience "Direct link to The Development Experience")

With the warm pool, your workflow becomes:

```
┌─────────────────────────────────────────────────────────────────┐
│                    ITERATIVE GPU DEVELOPMENT                    │
└─────────────────────────────────────────────────────────────────┘

    ┌──────────┐      ┌──────────┐      ┌──────────┐
    │  Edit    │      │  Submit  │      │   See    │
    │  Code    │ ───▶ │   Job    │ ───▶ │ Results  │
    │ Locally  │      │ c3 deploy│      │  quickly │
    └──────────┘      └──────────┘      └──────────┘
          ▲                                   │
          │                                   │
          └───────────────────────────────────┘
                    Iterate rapidly
```

This is especially powerful for:

* **ML experimentation** — Try different hyperparameters quickly
* **Debugging** — Add print statements, see output fast
* **Prototyping** — Test ideas without provisioning overhead
* **Education** — Learn GPU programming interactively

The warm pool makes cloud GPUs feel local.
