5G vRAN workloads running L1 PHY signal processing have timing budgets measured in microseconds. A single OS timer interrupt or RCU callback on the wrong core can cause a missed deadline, resulting in dropped radio frames. This post covers the complete CPU isolation and real-time tuning stack used in the EIB-Customer platform.
The Problem: OS Noise
In a default Linux system, every CPU core is subject to:
- Periodic timer ticks (CONFIG_HZ, typically 250–1000/s)
- RCU (Read-Copy-Update) callbacks — kernel lock-free data structure updates
- Hardware interrupt delivery
- Kernel thread migrations
- Scheduler load balancing
Each of these introduces latency jitter. For a 5G DU, jitter exceeding ~10μs on processing cores is unacceptable. The solution: partition the CPUs into housekeeping and isolated sets.
CPU Partitioning: 64 Cores, Two Roles
NUMA Node 0 (Cores 0–31) NUMA Node 1 (Cores 32–63) ┌────────────────────┐ ┌────────────────────┐ │ Core 0 │ │ Core 32 │ │ Housekeeping │ │ Housekeeping │ │ │ │ │ │ Cores 1–30 │ │ Cores 33–62 │ │ ISOLATED │ │ ISOLATED │ │ (Workloads) │ │ (Workloads) │ │ │ │ │ │ Core 31 │ │ Core 63 │ │ Housekeeping │ │ Housekeeping │ └────────────────────┘ └────────────────────┘
Housekeeping cores (0, 31, 32, 63): Handle all OS activity — kernel threads, system services, Kubernetes system pods, interrupt handling.
Isolated cores (1–30, 33–62): Reserved exclusively for workload pods — DPDK PMDs, 5G RAN functions, real-time applications. 60 of 64 cores are isolated.
Kernel Boot Parameters
CPU isolation is configured in the kernel command line, injected via edge-cluster.yaml:
isolcpus=domain,nohz,managed_irq:1-30,33-62
nohz_full=1-30,33-62
rcu_nocbs=1-30,33-62
irqaffinity=0,31,32,63
isolcpus
Removes cores from the general scheduler pool. The domain flag removes them from load balancing domains. nohz enables per-core tickless mode. managed_irq moves managed interrupts away from isolated cores.
nohz_full
Enables adaptive ticks — the timer interrupt is suppressed on isolated cores when only one runnable task is present. Eliminates the most common source of OS noise on application cores.
rcu_nocbs
Offloads RCU callbacks from isolated cores to dedicated RCU threads running on housekeeping cores. Without this, RCU work can interrupt DPDK poll loops at unpredictable intervals.
irqaffinity
Pins hardware interrupt delivery to housekeeping cores only. NIC interrupts, timers, and other hardware events are guaranteed not to land on isolated cores.
Kubernetes CPU Management
Kernel isolation alone isn't sufficient — Kubernetes must also respect core boundaries. Two kubelet policies enforce this:
# /etc/kubelet-conf/topologymgr.conf
cpuManagerPolicy: static
topologyManagerPolicy: single-numa-node
Static CPU Manager Policy
Pods requesting integer CPU counts in the Guaranteed QoS class receive exclusive CPU allocation. The kubelet pins the pod's processes to specific physical cores — they won't migrate, ever.
Single-NUMA-Node Topology Policy
Ensures that CPUs, memory, and devices (SR-IOV SMC's, hugepages) allocated to a pod all come from the same NUMA node. Cross-NUMA memory access adds ~100ns latency per access — unacceptable for real-time workloads.
TuneD: cpu-partitioning Profile
The cpu-partitioning.service systemd unit applies the TuneD cpu-partitioning profile at boot. This consolidates and reinforces the kernel boot parameters and adds:
- CPU governor set to
performanceon all cores - C-states disabled (no CPU idle states)
- P-state tuning for maximum frequency
- Additional kernel parameters for real-time scheduling
# Verify active profile
tuned-adm active
# Expected: Current active profile: cpu-partitioning
# Verify CPU governor
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Expected: all lines read "performance"
Huge Pages
DPDK requires huge pages for its memory allocator. Each huge page reduces TLB (Translation Lookaside Buffer) pressure by mapping 1GB instead of 4KB per entry — critical when the DPDK application is constantly accessing large packet buffers.
# In edge-cluster.yaml kernel cmdline:
hugepagesz=1G hugepages=40
# Verify at runtime
grep HugePages /proc/meminfo
# HugePages_Total: 40
# HugePages_Free: 38 (2 allocated to running DPDK pods)
Real-Time Kernel (PREEMPT_RT)
The base OS is SLE Micro 6.1 RT — built with the PREEMPT_RT patchset. Key differences from a standard kernel:
- Most kernel spinlocks converted to mutexes (preemptible)
- Interrupt handlers run in preemptible thread context
- High-resolution timers for sub-millisecond precision
- Priority inheritance for PI-aware mutexes
This gives worst-case latency bounds rather than just good average-case performance.
Performance Optimisation Summary
| Component | Optimisation | Effect |
|---|---|---|
| Kernel | PREEMPT_RT variant | Deterministic scheduling, <1ms worst-case |
| CPU Governor | performance | Max frequency, no power saving |
| CPU Isolation | 60 cores isolated | Exclusive allocation to workloads |
| NUMA Policy | single-numa-node | Enforce memory locality |
| Huge Pages | 40GB (1GB pages) | Reduce TLB misses by 90%+ |
| nohz_full | Isolated cores tickless | Eliminate timer interrupts |
| rcu_nocbs | RCU offloaded | Remove RCU jitter from app cores |
| irqaffinity | Pinned to cores 0,31,32,63 | Predictable interrupt handling |
Measuring Latency
# cyclictest — measure timer latency on isolated cores
taskset -c 1 cyclictest -p 99 -m -n -i 200 -D 10m
# Expected on a well-tuned system:
# T: 0 ( pid) I:200 C:3000000 Min: 2 Act: 3 Avg: 3 Max: 8
# Max latency < 10μs is the target for DPDK workloads
This tuning stack — kernel RT, CPU isolation, NUMA topology management, and huge pages — is what makes this platform suitable for 5G vRAN rather than just general-purpose workloads.