Deploying a high-performance Kubernetes platform for 5G workloads is only half the job. You need to validate that the hardware acceleration actually works — that DPDK SMC's are reachable, SR-IOV resources are allocatable, and MacVLAN pods get the connectivity they need. This post covers the test manifests and validation strategy for the EIB-Customer platform.
What We're Testing
Three distinct network acceleration paths need validation:
- DPDK — userspace packet processing via
vfio-pciSMC's andrancher.io/dpdkresource - SR-IOV Netdevice — kernel-mode SMC's via
intel.com/sriov_netdeviceresource - MacVLAN — Layer 2 secondary network via Multus NAD
Each requires different resource requests, security contexts, and network attachments.
Test 1: DPDK Validation
What to verify
- Pod schedules successfully (DPDK SMC's available as resources)
- Hugepages mounted correctly at
/hugepages-1Gi - Network annotation
suse-dpdkattaches a secondary interface - Required capabilities granted (
NET_ADMIN,NET_RAW,IPC_LOCK)
apiVersion: apps/v1
kind: Deployment
metadata:
name: dpdk-test
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: dpdk-test
template:
metadata:
labels:
app: dpdk-test
annotations:
k8s.v1.cni.cncf.io/networks: suse-dpdk
spec:
containers:
- name: dpdk
image: registry.suse.com/bci/bci-busybox:16.0
command: ["sh", "-c", "echo 'DPDK pod running' && sleep 3600"]
resources:
limits:
rancher.io/dpdk: "1"
hugepages-1Gi: 2Gi
memory: 2Gi
requests:
rancher.io/dpdk: "1"
hugepages-1Gi: 2Gi
memory: 2Gi
securityContext:
capabilities:
add: ["NET_ADMIN", "NET_RAW", "IPC_LOCK"]
volumeMounts:
- mountPath: /hugepages-1Gi
name: hugepages
volumes:
- name: hugepages
emptyDir:
medium: HugePages-1Gi
Validation commands
# Check pod scheduled and running
kubectl get pods -l app=dpdk-test
# Verify hugepages mounted
kubectl exec dpdk-test-xxx -- mount | grep hugepages
# Verify secondary interface attached
kubectl exec dpdk-test-xxx -- ip addr show
# Should show eth0 (Calico) and net1 (DPDK SMC)
# Check resource allocation on node
kubectl describe node node1 | grep -A 5 Allocated
Test 2: SR-IOV Netdevice Validation
What to verify
intel.com/sriov_netdeviceresource available on nodes- Pod receives a SMC with working kernel network interface
- VLAN 538 connectivity reachable from pod
apiVersion: v1
kind: Pod
metadata:
name: sriov-test
annotations:
k8s.v1.cni.cncf.io/networks: suse-sriov-netdevice
spec:
containers:
- name: sriov
image: registry.suse.com/bci/bci-busybox:16.0
command: ["sleep", "3600"]
resources:
limits:
intel.com/sriov_netdevice: "1"
requests:
intel.com/sriov_netdevice: "1"
securityContext:
capabilities:
add: ["NET_ADMIN"]
Validation commands
# Verify SR-IOV resources advertised
kubectl get nodes -o json | \
jq '.items[].status.allocatable | with_entries(select(.key | contains("sriov")))'
# Expected:
# {
# "intel.com/sriov_dpdk": "8",
# "intel.com/sriov_netdevice": "8"
# }
# Verify interface in pod
kubectl exec sriov-test -- ip addr show net1
# Should show IP from 192.168.27.128/27 range
Test 3: MacVLAN Validation
apiVersion: v1
kind: Pod
metadata:
name: macvlan-test
annotations:
k8s.v1.cni.cncf.io/networks: macvlan-conf
spec:
containers:
- name: macvlan
image: registry.suse.com/bci/bci-busybox:16.0
command: ["sleep", "3600"]
# Verify Layer 2 connectivity
kubectl exec macvlan-test -- ip addr show net1
# Expected: IP from 192.168.41.48/28 range
kubectl exec macvlan-test -- ping -c 3 192.168.41.1
# Expected: gateway reachable
Cluster Health Validation
Before running network tests, verify the full cluster stack is healthy:
# All system pods running
kubectl get pods -A | grep -v Running | grep -v Completed
# Expected: empty (no non-running pods)
# All Helm charts deployed
kubectl get helmcharts -n kube-system
# Expected: all show DEPLOYED status
# SR-IOV operator ready
kubectl get pods -n sriov-system
kubectl get sriovnetworknodepolicies -n sriov-system
# Longhorn storage healthy
kubectl get pods -n longhorn-system
kubectl get storageclass
# Expected: longhorn (default) StorageClass present
Troubleshooting: Common Failures
DPDK pod stuck in Pending
# Check if DPDK resources advertised
kubectl describe node | grep rancher.io/dpdk
# If missing: SR-IOV operator may not have bound SMC's
# Check SR-IOV operator logs
kubectl logs -n sriov-system -l app=sriov-network-config-daemon
# Verify SMC driver binding on node
cat /sys/class/net/p3p1/device/virtfn8/driver_override
# Expected: vfio-pci
SR-IOV SMC's not created
# Check the custom SMC creation service
systemctl status sriov-custom-auto-vfs.service
# Manually apply SR-IOV policies
kubectl apply -f /opt/sriov/
# Check SMC's on the physical NIC
ip link show p3p1
# Look for "vf 0 ... vf 7 ..." entries
Node fails to join cluster
# Check API VIP reachable from node2
ping 192.168.41.30
# Verify token matches on both nodes
cat /etc/rancher/rke2/config.yaml | grep token
# Check firewall allows required ports
iptables -L INPUT | grep -E "6443|9345"
# RKE2 agent logs
journalctl -u rke2-agent -f
Performance Benchmarks
Once functional validation passes, establish baseline performance metrics:
| Test | Tool | Target |
|---|---|---|
| CPU latency jitter | cyclictest | <10μs max on isolated cores |
| DPDK throughput | testpmd | 95+ Gbps line rate |
| SR-IOV latency | iperf3 + sockperf | <50μs RTT |
| Storage I/O | fio | Baseline for NVMe + Longhorn |
| Memory bandwidth | stream | 125 GB/s (NUMA-local) |
Litmus Chaos Engineering
The platform ships with Litmus Chaos Operator v3.10.0 pre-installed. Define chaos experiments to validate resilience:
- Node restart — does the cluster recover automatically?
- Network partition — does the API VIP failover correctly?
- Pod kill — do Kubernetes restarts converge correctly?
- Storage disruption — does Longhorn replication heal?
Litmus is installed but no experiments were defined at handover — this is a gap worth addressing before production traffic lands on the platform.