Modal
Serverless Python for GPU workloads.
Freemium 16–80 GB VRAM
- Min VRAM
- 16 GB
- GPU class
- Datacenter GPU
- Quant
- —
Actually FreeWatermark-FreeHobbyist-FriendlyAPI
Run any open-source model with one API call.
Replicate is the serverless GPU platform optimised around 'I want to call an open-source model as an API right now'. Thousands of community-deployed models behind a single REST/Python SDK, pay-per-second pricing, and Cog (their open containerisation tool) for shipping your own.
Free credits on signup; pay-per-second after.
Serverless Python for GPU workloads.
On-demand GPU pods for ComfyUI, vLLM, training.