Deploying a high-performance Kubernetes platform for 5G workloads is only half the job. You need to validate that the hardware acceleration actually works — that DPDK SMC's are reachable, SR-IOV resources are allocatable, and MacVLAN pods get the connectivity they need. This post covers the test manifests and validation strategy for the EIB-Customer platform.

What We're Testing

Three distinct network acceleration paths need validation:

  1. DPDK — userspace packet processing via vfio-pci SMC's and rancher.io/dpdk resource
  2. SR-IOV Netdevice — kernel-mode SMC's via intel.com/sriov_netdevice resource
  3. MacVLAN — Layer 2 secondary network via Multus NAD

Each requires different resource requests, security contexts, and network attachments.

Test 1: DPDK Validation

What to verify

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dpdk-test
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: dpdk-test
  template:
    metadata:
      labels:
        app: dpdk-test
      annotations:
        k8s.v1.cni.cncf.io/networks: suse-dpdk
    spec:
      containers:
      - name: dpdk
        image: registry.suse.com/bci/bci-busybox:16.0
        command: ["sh", "-c", "echo 'DPDK pod running' && sleep 3600"]
        resources:
          limits:
            rancher.io/dpdk: "1"
            hugepages-1Gi: 2Gi
            memory: 2Gi
          requests:
            rancher.io/dpdk: "1"
            hugepages-1Gi: 2Gi
            memory: 2Gi
        securityContext:
          capabilities:
            add: ["NET_ADMIN", "NET_RAW", "IPC_LOCK"]
        volumeMounts:
        - mountPath: /hugepages-1Gi
          name: hugepages
      volumes:
      - name: hugepages
        emptyDir:
          medium: HugePages-1Gi
Requests and limits must be identical (Guaranteed QoS class) for hugepages and DPDK resources. The kubelet will reject pods where these differ.

Validation commands

# Check pod scheduled and running
kubectl get pods -l app=dpdk-test

# Verify hugepages mounted
kubectl exec dpdk-test-xxx -- mount | grep hugepages

# Verify secondary interface attached
kubectl exec dpdk-test-xxx -- ip addr show
# Should show eth0 (Calico) and net1 (DPDK SMC)

# Check resource allocation on node
kubectl describe node node1 | grep -A 5 Allocated

Test 2: SR-IOV Netdevice Validation

What to verify

apiVersion: v1
kind: Pod
metadata:
  name: sriov-test
  annotations:
    k8s.v1.cni.cncf.io/networks: suse-sriov-netdevice
spec:
  containers:
  - name: sriov
    image: registry.suse.com/bci/bci-busybox:16.0
    command: ["sleep", "3600"]
    resources:
      limits:
        intel.com/sriov_netdevice: "1"
      requests:
        intel.com/sriov_netdevice: "1"
    securityContext:
      capabilities:
        add: ["NET_ADMIN"]

Validation commands

# Verify SR-IOV resources advertised
kubectl get nodes -o json | \
  jq '.items[].status.allocatable | with_entries(select(.key | contains("sriov")))'

# Expected:
# {
#   "intel.com/sriov_dpdk": "8",
#   "intel.com/sriov_netdevice": "8"
# }

# Verify interface in pod
kubectl exec sriov-test -- ip addr show net1
# Should show IP from 192.168.27.128/27 range

Test 3: MacVLAN Validation

apiVersion: v1
kind: Pod
metadata:
  name: macvlan-test
  annotations:
    k8s.v1.cni.cncf.io/networks: macvlan-conf
spec:
  containers:
  - name: macvlan
    image: registry.suse.com/bci/bci-busybox:16.0
    command: ["sleep", "3600"]
# Verify Layer 2 connectivity
kubectl exec macvlan-test -- ip addr show net1
# Expected: IP from 192.168.41.48/28 range

kubectl exec macvlan-test -- ping -c 3 192.168.41.1
# Expected: gateway reachable

Cluster Health Validation

Before running network tests, verify the full cluster stack is healthy:

# All system pods running
kubectl get pods -A | grep -v Running | grep -v Completed
# Expected: empty (no non-running pods)

# All Helm charts deployed
kubectl get helmcharts -n kube-system
# Expected: all show DEPLOYED status

# SR-IOV operator ready
kubectl get pods -n sriov-system
kubectl get sriovnetworknodepolicies -n sriov-system

# Longhorn storage healthy
kubectl get pods -n longhorn-system
kubectl get storageclass
# Expected: longhorn (default) StorageClass present

Troubleshooting: Common Failures

DPDK pod stuck in Pending

# Check if DPDK resources advertised
kubectl describe node | grep rancher.io/dpdk
# If missing: SR-IOV operator may not have bound SMC's

# Check SR-IOV operator logs
kubectl logs -n sriov-system -l app=sriov-network-config-daemon

# Verify SMC driver binding on node
cat /sys/class/net/p3p1/device/virtfn8/driver_override
# Expected: vfio-pci

SR-IOV SMC's not created

# Check the custom SMC creation service
systemctl status sriov-custom-auto-vfs.service

# Manually apply SR-IOV policies
kubectl apply -f /opt/sriov/

# Check SMC's on the physical NIC
ip link show p3p1
# Look for "vf 0 ... vf 7 ..." entries

Node fails to join cluster

# Check API VIP reachable from node2
ping 192.168.41.30

# Verify token matches on both nodes
cat /etc/rancher/rke2/config.yaml | grep token

# Check firewall allows required ports
iptables -L INPUT | grep -E "6443|9345"

# RKE2 agent logs
journalctl -u rke2-agent -f

Performance Benchmarks

Once functional validation passes, establish baseline performance metrics:

TestToolTarget
CPU latency jittercyclictest<10μs max on isolated cores
DPDK throughputtestpmd95+ Gbps line rate
SR-IOV latencyiperf3 + sockperf<50μs RTT
Storage I/OfioBaseline for NVMe + Longhorn
Memory bandwidthstream125 GB/s (NUMA-local)
Document your baselines. Without measured baselines at deployment time, you have no reference point when performance degrades during operation. Run these tests before handing the platform over.

Litmus Chaos Engineering

The platform ships with Litmus Chaos Operator v3.10.0 pre-installed. Define chaos experiments to validate resilience:

Litmus is installed but no experiments were defined at handover — this is a gap worth addressing before production traffic lands on the platform.