Building High-Performance Networks for Secure and Scalable Kubernetes Clusters with Azure CNI powered by Cilium

Unleashing the Power of AI and HPC with High-Speed Networking

The convergence of Generative AI and cloud computing is revolutionizing how organizations design and manage their infrastructure. This surge in data-intensive workloads, particularly in high-performance computing (HPC) and AI environments, places immense demands on networking infrastructure. Kubernetes, with its scalability and flexibility, shines as the preferred platform for managing complex workloads. However, it also introduces unique networking challenges that require innovative solutions.

This blog series dives into practical strategies for building secure and scalable Kubernetes clusters on Azure infrastructure, specifically focusing on high-performance networking.

Demystifying the Needs of High-Performance Workloads

HPC and AI workloads, like training large language models (LLMs), necessitate robust networking platforms with exceptional input/output (I/O) capabilities. Low latency and high bandwidth are crucial for efficient data handling and processing. As datasets grow in size and complexity, the networking infrastructure must adapt seamlessly to maintain performance and reliability.

Azure Kubernetes Service (AKS): Powering AI Innovation

Leveraging AKS empowers developers to effortlessly deploy and manage containerized AI models, ensuring consistent performance and rapid iteration. AKS integrates seamlessly with Azure’s high-performance storage, networking, and security features, optimizing AI workload processing. Additionally, AKS supports advanced GPU scheduling, enabling the utilization of specialized hardware for training and inference, accelerating the development of sophisticated Generative AI applications.

Introducing Azure CNI powered by Cilium: The Foundation for High-Performance Networking

Let’s explore the latest cluster networking features designed to deliver a high-performance network datapath architecture. Azure CNI powered by Cilium offers the perfect foundation to address these requirements, with comprehensive integrations into Azure’s extensive networking capabilities.

Unlocking High-Performance with eBPF

Azure CNI powered by Cilium leverages eBPF (Extended Berkeley Packet Filter), a Linux technology that empowers the execution of sandboxed programs within the kernel with high efficiency and minimal overhead. This makes it ideal for advanced networking tasks, delivering numerous advantages:

Low Latency: eBPF minimizes data transfer delays, ensuring near-native performance.
High Throughput: It facilitates efficient data processing by handling large volumes of traffic seamlessly.
Scalability: eBPF adapts to accommodate growing network demands effortlessly.

Mastering IP Addressing with Flexibility and Future-Proofing

Planning IP addressing is fundamental for building dynamic data workloads on AKS. Azure CNI powered by Cilium, enabled by default in AKS clusters (version 1.30 onwards), supports both overlay and Vnet addressing for direct pod access through overlay mode. Additionally, it offers dual-stack IP addressing, allowing IPv4 and IPv6 protocols to coexist within the same network. This flexibility is crucial for supporting legacy applications while enabling the adoption of the more efficient IPv6. By utilizing dual-stack configurations, organizations can ensure compatibility and smooth interoperability, reducing the burden of maintaining separate network infrastructures. Furthermore, mixed IP addressing facilitates a smoother transition to IPv6, enhancing future-proofing and scalability as network demands evolve.

Enhancing Security and Observability with Advanced Features

Azure CNI powered by Cilium strengthens in-cluster security and observability through several key features:

Granular Network Policies: Define precise access controls for pods and services.
Enhanced Visibility: Gain deeper insights into network traffic patterns.

Enabling Advanced Network Security with FQDN-Based Policies

Unlock the recently introduced observability and FQDN-based features by enabling Advanced Container Networking Services (ACNS) on your AKS clusters. Let’s explore how to leverage CiliumNetworkPolicy (CNP) and DNS Proxy for FQDN filtering with minimal disruption to DNS resolution. Imagine you have a Kubernetes pod labeled “app: genai_backend” and want to control its outbound traffic. You can restrict access to all destinations except “myblobstorage.com” and enable DNS queries to the “kube-dns” service.

Beyond the Basics: High-Speed Interfaces for Demanding Applications

Kubernetes-based data applications also necessitate high-performance networking from the container networking platform. The underlying networks often require high throughput and low latency, translating to high-speed interfaces configured with technologies like Infiniband. These interfaces can deliver bandwidths exceeding 100 Gbps, significantly reducing data transfer times and boosting application performance.

Seamless Configuration Management of High-Speed Interfaces

Managing configurations for multiple interfaces can be cumbersome, involving network fabric setup, traffic flow management, and ensuring compatibility with existing infrastructure. Azure CNI addresses this challenge by offering the flexibility to securely configure these high-speed interfaces using native Kubernetes constructs like Custom Resource Definitions (CRDs). Additionally, Azure CNI supports SR-IOV (Single Root I/O Virtualization) technologies, which provide dedicated

Sources: Undercode Ai & Community, Wikipedia, Techcommunity.microsoft.com, Internet Archive, Digital Transformation Hub
Image Source: OpenAI, Undercode AI DI v2Featured Image