Modelplane: Open-source control plane for AI inference

Organizations that run open-weight models on hardware they own operate GPU fleets spread across clouds, neoclouds, and on-premise data centers. Each fleet handles model placement, replica scaling, infrastructure provisioning, weight distribution, and traffic routing. Teams have built this coordination layer by hand, one operator at a time.

Modelplane AI inference control plane

Upbound, the company behind the Crossplane project, released Modelplane, an open-source control plane that manages fleet-wide coordination for AI inference. The software installs in a user’s own environment and orchestrates models, the serving stack, and the infrastructure beneath them. It runs across cloud, neocloud, and on-premise systems, from a single GPU to multi-node deployments. The first public version, v0.1.0, carries the Apache 2.0 license.

Built on Crossplane

Modelplane builds on Crossplane, a Cloud Native Computing Foundation graduated project used to run internal platforms at organizations including Apple, Nike, SAP, IBM, and Akamai. Modelplane runs as a control plane on its own cluster, above the inference clusters that serve models. The system continuously reconciles a fleet toward a state the operator declares, provisioning clusters, scheduling deployments onto compatible clusters, scaling replicas, caching weights, and routing traffic through one gateway.

“Kubernetes became the standard control plane for compute. Crossplane extended that model to cloud infrastructure,” said Bassam Tabbara, CEO and founder of Upbound. He said AI inference “needs the same layer.”

Two roles and one API

The software divides work into two roles. Platform teams create resources that define the GPU fleet and the hardware classes available on it, fronted by an inference gateway. Developers declare a model, its engine, and a replica count, and receive a single OpenAI-compatible endpoint. The endpoint supports weighted canary and A/B rollouts across the replicas it selects.

A developer deploys a model with a declarative manifest that names the model, the serving engine image, and the GPU memory a node must offer. Modelplane then schedules the replica onto a cluster with free, compatible GPUs and composes the resources between the two roles on the operator’s behalf. The system stays neutral about the serving engine, so one API can serve any container-based engine and any deployment topology. The engine flags a developer writes carry parallelism, quantization, and KV transfer.

Security and policy controls

The gateway routes inference requests and applies cost, compliance, and sovereignty policies, with fallback to managed providers. That control point carries weight for regulated and sovereign enterprises, where inference runs inside infrastructure a company governs directly for security, sovereignty, or compliance reasons. The repository ships with a published security policy, a code of conduct, and contribution guidelines.

Upbound names three groups of users for the project: neoclouds and AI factories that build managed inference services on their own hardware, regulated and sovereign enterprises, and AI-native companies with large inference spend moving to open-weight models.

Modelplane is available for free on GitHub.

Must read:

Subscribe to the Help Net Security ad-free monthly newsletter to stay informed on the essential open-source cybersecurity tools. Subscribe here!

More about

Don't miss