Gino Eising
Nerd by Nature
Apr 15, 2026 12 min read

The NPU Nobody Talks To: Mapping the CIX P1's Hidden AI Engine

Cover: after Vilhelm Hammershøi, an empty Danish interior — the architecture of being unused — a chip on a chair in a quiet room, a faint amber LED the only point of warmth.

The Orange Pi 6 Plus sits in the rack, humming along with the quiet confidence of a board that knows it is being underutilised. Inside the CIX P1 SoC is a dedicated Neural Processing Unit (NPU)—a piece of silicon specifically etched to crunch tensors and accelerate inference. The hardware is real, the chip is powered, and the transistors are awake. But in the world of the Linux kernel, it is a ghost. There is no driver to wake it up, no sysfs entry to query it, and no user-space API to feed it data. It is a high-performance engine idling in a vacuum; the chip is awake, but nobody’s home.

This silent silicon only became a priority thanks to a spectacular own goal: a nearby server node melted itself in an AI-induced CPU death spiral while trying to handle embedding workloads. Load average hit 153 on a 16-core machine — caused by the alertmanager feedback loop I had accidentally built. When the smoke cleared, the OP6+ was already running 24/7 and largely idle. A note on the setup: the production homelab services — Nextcloud, Immich, Mailu, Authentik — live in the Kubernetes cluster, not on this board. The Incus LXC containers I had running on the OP6+ were an earlier experiment in whether LXC could serve as a simpler alternative to Kubernetes for some workloads. They weren’t load-bearing. The board’s main contribution at that point was running a small Ollama embedding service we had just offloaded there. It seemed like a waste of electricity to keep paying x86 server rates for embedding work that a board drawing 15 watts could potentially handle — and it made you wonder what else that NPU silicon might be capable of, if anyone could actually talk to it.

April 2026 — investigating the NPU that shipped before its driver did

What the CIX P1 Actually Is

Let’s talk about the beast itself: the Orange Pi 6 Plus, powered by the CIX P1 CD8160 SoC. If you haven’t heard of it, you’re not alone. CIX Technology is a fabless semiconductor outfit from Nanjing, and they’ve put together a rather interesting package. The CPU side of things is actually quite impressive: a 12-core ARMv9.2-a monster, split between eight Cortex-A720s humming along at 2.6GHz and four Cortex-A520s at 2.2GHz. What really caught my eye are the features: SVE2, i8mm (that’s INT8 matrix multiply hardware), and bf16 support. These aren’t your grandpa’s ARM cores; these are modern, ML-capable units, sharing the same class of inferencing muscle found in Apple’s M-series chips. Then there’s the NPU. It’s not a ghost; it’s physically present on the board, with three distinct clusters (CRE0/CRE1/CRE2), and it even registers in ACPI as NPU0/CIXH4000:0. The ARM SMMU v3 (the IOMMU) is active and covers its address space, confirming it’s a first-class citizen in the hardware topology. You can even see its physical address, 0x14260000, peeking out in /sys/class/devfreq/14260000.aipu/. It’s definitely there, it’s powered, it’s real. The problem? No Linux driver has ever been taught to speak its language.

The NPU Question

Embedding generation is a critical component of many modern AI workloads, and when a compute node began struggling with the load, it highlighted the need for more efficient processing. The Orange Pi 6 Plus was already on 24/7, running an embedding service and serving as a low-power offload target, making it a natural candidate for further investigation without incurring any incremental energy costs. Our core hypothesis was simple: could this dedicated NPU silicon provide a tangible acceleration for our vector embedding workloads? Given the lack of official documentation, our investigation had to remain strictly read-only, focusing on introspection rather than potentially destructive experimentation.

How You Investigate Hardware With No Documentation

When you’re faced with hardware that has all the documentation of a classified alien artifact, you have to get creative. My methodology here is to start from userspace and work inward, always mindful of the risk hierarchy. Strings and strace inside an Incus container are about as safe as it gets; they can’t crash your host, which is a comforting thought when you’re poking around unknown binaries. The goal is simple: find out what the userspace library expects. What’s the device path it’s trying to open? What IOCTL numbers does it use to communicate with a kernel driver it assumes exists? What error messages does it spit out when that driver is conspicuously absent? Container isolation is your best friend here. Worst case, you bork the container and just spin up a new one; your host kernel remains pristine and your production services — Nextcloud, Immich, Mailu, running in the Kubernetes cluster on entirely separate nodes — are completely oblivious to your shenanigans. No production outage, then. The specific tools were straightforward: dpkg-deb to inspect the .deb packages without actually installing them, strings to rip raw text out of the shared library, strace (with a carefully constructed LD_LIBRARY_PATH to ensure the correct libnoe.so.0.6.0 was loaded) to see what system calls it was making, and finally, reading the C header file that comes with the SDK for the API surface. This approach isn’t unique to the CIX P1; it’s a universal playbook for any SBC that ships with a shiny new, utterly undocumented accelerator.

What We Found in the Library

This is where the detective story really picks up. The cix-noe-umd and cix-common-misc packages install cleanly, though their usefulness without a driver is debatable. Crucially, the Python bindings import without complaint, and a simple dir() call on the module exposes over a hundred symbols, including a complete list of IOCTLs and error codes. But the real smoking gun came from strings libnoe.so.0.6.0. There, in plain ASCII, were the critical clues: open /dev/aipu [fail], poll /dev/aipu [fail], and simply /dev/aipu. This confirmed the exact character device path the userspace library was designed to interact with. Further digging showed the physical address 14260000.aipu was indeed visible in the devfreq sysfs path, solidifying the NPU’s hardware presence. The internal C++ namespace aipudrv:: pointed to the ARM AIPU architecture family, indicating this NPU isn’t some entirely novel, custom design but shares lineage with other ARM-based accelerators. The IOCTL list, exposed by the Python bindings, mapped out a clear programming model: DMA-BUF memory management, job submission, status polling, and even tick counters for performance measurement. The header file even proudly declared an Apache 2.0 license. Someone at CIX put months of work into this, and a debug symbol path leak in the binaries even showed an engineer named zhiquan working in /home/zhiquan/noe_repo/component/cix_private/. This wasn’t some half-baked project; it was a complete userspace SDK, just waiting for its kernel counterpart.

The Kernel Driver Gap

Here’s the rub: a kernel driver isn’t some optional accessory you can bolt on later. It’s the absolutely essential bridge between the NPU’s hardware registers and the /dev/aipu character device that the userspace library so desperately wants to open. Without it, the NPU is just an impressive hunk of silicon drawing power. The driver would need to implement a character device node at /dev/aipu, handle the entire suite of IOCTLs we found (from NOE_IOCTL_GET_VERSION to complex DMA-BUF and job management calls), set up a DMA-BUF allocator for efficient memory transfers, and integrate with the devfreq subsystem for frequency scaling. The clue from the 14260000.aipu address is actually the easy part: the hardware is ACPI-enumerated, meaning the kernel doesn’t need to go on a scavenger hunt to discover it; its static address is handed over on a silver platter. The hard part, and the reason this NPU remains dormant for us, is that someone has to actually write all that kernel-level code and make it publicly available. The orangepi-build repository’s README explicitly warning “DO NOT INSTALL” for the NOE packages speaks volumes. It suggests the driver exists internally, perhaps in a state that’s not ready for public consumption, or tied to a specific internal workflow. Until that code sees the light of day, the CIX P1 NPU will remain a tantalizing promise, a silent powerhouse awaiting its voice.

What It Would Mean

So, what would a working CIX P1 NPU driver actually mean? Let’s be concrete, because “AI acceleration” is a phrase that has been thoroughly emptied of meaning.

At an estimated 6-12 TOPS INT8, the CIX P1 NPU sits in the same performance class as the Rockchip RK3588 NPU (6 TOPS) that already powers Immich’s machine-learning container in my cluster. On that chip, face recognition and CLIP image embeddings run at a fraction of the CPU cost. The CIX P1 NPU could do the same class of work — and potentially more.

For embedding models specifically, a 384-dimensional model like BAAI/bge-small-en-v1.5 (90MB) would run at well under 10ms per inference on NPU hardware, versus ~2ms on the warm A720 CPU. The gain sounds modest until you’re running a batch of 50,000 chunks. More interesting are the larger embedding models — nomic-embed-text at 768 dimensions, or the new 1024-dim models starting to appear — where CPU inference gets expensive and NPU silicon starts to earn its keep.

For small language models, 6-12 TOPS puts you in the range of serious INT8 inference on models up to roughly 3-4 billion parameters. Think Phi-3 Mini (3.8B), Qwen2.5 1.5B, or the small Llama variants. On the RK3588 NPU, Qwen1.5-1.8B runs at around 10-15 tokens per second — fast enough for local, private inference of the kind that’s genuinely useful: document summarisation, alert triage, question answering over local context. The CIX P1, with its higher estimated TOPS and newer architecture, could potentially push that envelope further.

For vision tasks, CLIP encoding of images (what Immich uses for semantic photo search), YOLO object detection, and face embedding all fall comfortably within NPU territory. A board that could handle its own ML workloads for photo management would meaningfully reduce the load on the main x86 compute node.

The energy angle is real at €0.40/kWh in the Netherlands. The board draws 15-25W under load versus 65-125W for the x86 server. It is already on. The incremental cost of running NPU inference on a board you are already paying to keep alive is close to zero.

Right now, for my specific workload, the CPU is fast enough — 2ms warm embedding latency is not a problem worth solving. But a working NPU driver opens the door to running meaningful language models locally on a 15W ARM board that fits in a rack. That is a different conversation entirely.

The Probe Script

To keep an eye on this situation, I put together noe_probe.py. It’s a small Python script that uses the existing libnoe library to attempt NPU initialization. Right now, if you run it on an Orange Pi 6 Plus without a kernel driver, it’ll tell you precisely which device file it tried to open (likely /dev/aipu) and why it failed. The moment a working kernel driver finally lands and the NPU can be initialized, this script will switch to printing the firmware version instead. It’s a simple, unambiguous signal. The repository holding this is private, so here it is inline:

#!/usr/bin/env python3
import os, sys
os.environ['LD_LIBRARY_PATH'] = '/usr/share/cix/lib:' + os.environ.get('LD_LIBRARY_PATH', '')
sys.path.insert(0, '/usr/local/lib/python3.12/dist-packages')

DEVICE_PATH = '/dev/aipu'

def probe_npu():
    if not os.path.exists(DEVICE_PATH):
        print(f'[-] {DEVICE_PATH} does not exist — kernel driver missing')
        print('NPU: NOT READY'); return False
    try:
        import libnoe
    except ImportError as e:
        print(f'[-] libnoe import failed: {e}'); print('NPU: NOT READY'); return False
    print(f'[+] libnoe loaded, {DEVICE_PATH} present')
    try:
        ctx     = libnoe.noe_create_context()
        version = libnoe.noe_get_version()
        count   = libnoe.noe_get_cluster_count(ctx)
        print(f'[+] Driver version: {version}, clusters: {count}')
        print('NPU: READY'); return True
    except Exception as e:
        print(f'[-] Init failed: {e}'); print('NPU: NOT READY'); return False

if __name__ == '__main__':
    probe_npu()

For the truly impatient, a one-liner checks for the device node:

[ -c /dev/aipu ] && echo NPU_READY || echo NPU_WAITING

A word of warning about what this actually tells you: it checks the kernel you are currently running, nothing more. On the Orange Pi 6 Plus, there is no apt upgrade path to NPU support. The CIX P1 uses a proprietary vendor kernel — 6.6.89-cix — and getting it running at all required compiling the entire kernel and bootable image from scratch using the orangepi-build scripts. There is no package manager shortcut. When the NPU driver eventually lands, the route from NPU_WAITING to NPU_READY will still involve pulling the driver source, rebuilding the kernel with it compiled in, and flashing the result. The one-liner is a useful sentinel, but it is the beginning of a process, not the end of one. I’d love to hear from anyone who gets there first.

The Wait

So, where do things stand today? Honestly, the CIX P1 NPU kernel driver doesn’t exist publicly. The quality and structure of the userspace libraries strongly suggest CIX Technology has it working internally; this isn’t vapourware. The “DO NOT INSTALL” warning on their GitHub is almost certainly a commercial release decision, not an engineering limitation. The Apache 2.0 license on the header files is a positive, open signal – they’re not trying to completely hide the interface. The ARM SBC community has been here before. Rockchip’s RKNPU was once a closed, proprietary affair, then it wasn’t. MediaTek’s APU drivers followed a similar, if often circuitous, path to community support. The CIX P1 is newer and less widely known than those, but the pattern is familiar. This investigation has pushed the limits of what’s feasible from userspace. What comes next will require either CIX to release an official driver, or someone with considerable kernel development expertise and a generous amount of free time to decide this particular board is worth the effort. Both are possible. Neither is guaranteed. That’s the honest state of play.

The door is perfectly framed. The lock is installed. We now know the exact shape of the key; we just don’t happen to have the key in our hands yet. Investigating SBC hardware is always a strange exercise in patience—half detective work, half waiting for a developer in a different timezone to push a specific commit to a mailing list.

If you are running an Orange Pi 6 Plus and want to know the exact second this hardware becomes useful, you don’t need to poll forums. Run the probe script, keep the one-liner in your crontab, and come back when it says READY — bearing in mind that READY requires a kernel rebuild, not just a package update. Until then, we wait. The meta-lesson here is that hardware is only as useful as the software that speaks its language.

References

The noe_probe.py script: https://gitlab.com/djieno/fluxcd/-/raw/main/apps/base/cluster-cortex/fastembed-service/noe_probe.py
The alert death spiral post: https://djieno.com/blog/alert-death-spiral/
The orangepi-xunlong/component_cix-next repository: https://github.com/orangepi-xunlong/component_cix-next
The cluster-shepherd post: https://djieno.com/blog/cluster-shepherd-ai-ops/