F5 and NVIDIA expand integration to improve AI inference efficiency
F5, a networking and application security company, and NVIDIA have expanded their technical integration, combining F5's BIG-IP Next for Kubernetes with NVIDIA's BlueField-3 data processing units (DPUs) to improve how enterprises run AI inference workloads at scale.
The joint solution is designed to address a growing problem in enterprise AI: GPU resources are often underutilized because networking, encryption, and traffic management tasks compete with inference workloads for the same compute capacity. By offloading those tasks to BlueField-3 DPUs, the setup frees GPUs to run AI models.
BIG-IP Next for Kubernetes uses telemetry from NVIDIA's NIM and Dynamo runtime to route inference requests to the most suitable accelerators in real time. The system also supports multi-tenant GPU environments through network-level isolation, allowing organizations to share infrastructure across teams or customers without performance interference.
The integration also supports NVIDIA's DOCA Platform Framework, which simplifies the deployment and management of BlueField DPUs in Kubernetes environments.
Kunal Anand, Chief Product Officer, F5, said, “Together with NVIDIA, we are enabling AI factories to treat token production as a measurable business metric. BIG-IP Next for Kubernetes provides the intelligence and governance required to increase GPU yield, reduce cost per token, and scale shared AI platforms confidently.”
“NVIDIA’s accelerated computing infrastructure, coupled with F5’s AI-aware Application Delivery and Security Platform, unlocks superior AI factory tokenomics delivering scalable and cost-effective inference without making any changes to the models,” said Kevin Deierling, SVP, Networking, NVIDIA.

