gpt-load is a transparent proxy gateway that routes API requests to multiple AI providers—including OpenAI, Google Gemini, and Anthropic Claude—through a single endpoint while preserving each provider's native format and authentication. It acts as a centralized routing layer, allowing applications to switch between AI services by changing only the base URL without modifying any client code or business logic.
The proxy distinguishes itself through intelligent traffic management across pools of API keys, offering automatic key rotation, weighted or round-robin load balancing, and failover that detects unhealthy keys and reroutes requests to healthy ones within the same group. Configuration is managed dynamically through a web interface or REST API with hot-reload, applying changes immediately without service restarts, and settings are organized across three tiers—environment, system, and group—for flexible deployment. API keys are stored encrypted in a MySQL database with support for enabling, disabling, or rotating the encryption key, and separate authentication tokens secure the management interface and proxy requests.
For production deployments, gpt-load supports high-availability cluster setups with a master-slave architecture, horizontal scaling, configuration synchronization across nodes, and graceful shutdown protocols. Real-time monitoring provides health checks, request logs, and usage statistics through a web dashboard, while in-memory caching and rate-limiting use fast key-value stores for low-latency access. The project includes a web-based management interface for viewing and modifying settings, managing API keys and provider groups, and monitoring cluster health without editing files or restarting the service.