Hello! now
HK In FortuneFree Shipping Over$200
Follow Us:

The Components a Complete AI System Is Built From

6/3/2026 1:43:13 AM

An AI system on a device is built from more than the chip that runs the model. The model has to be fed data, given power, kept cool, and handed clean inputs to work on, and each of those jobs falls to a different part of the bill of materials. The accelerator gets top billing in the block diagram. The parts around it decide whether that accelerator ever reaches the throughput printed on its first page.

Choose the silicon first and treat everything else as support, and the board has a way of coming back to the bench. A converter that sags on a load step, a flash too slow to stream weights, a sealed case that traps heat until the part clocks itself down: any one of them holds the finished system well under what the compute was rated to do, usually at the least glamorous point on the board.

None of that shows up in a TOPS number.

Where the work runs

The first thing to settle is what executes the model. A dedicated accelerator or a piece of edge-AI silicon earns its place once the frame rate or the power budget puts the job out of reach of a general processor. Below that line the extra chip is cost and board area that buy nothing, and where the line falls comes from the model and the duty cycle, not from the peak rating on the front page. The form the accelerator takes matters as much as its rating; a part built around convolution can stall on the attention blocks in a transformer, so the fit between the silicon and the shape of the model is part of the choice.

An edge-AI board carrying the accelerator, memory, power conversion and active cooling on one assembly

A large share of jobs never need a separate part at all. A single MCU carrying a small neural block can run the model itself, a keyword spotter or a vibration-anomaly check, while it still handles the housekeeping the product needs. Many of these MCUs add a small vector or DSP extension that lifts the math throughput without a second chip, and the on-chip RAM, often a few hundred kilobytes, ends up being the wall the model hits first.

A third case turns up when the network has not settled. The flexibility of an FPGA keeps the datapath open to change after the board already exists, which is worth real power draw and design hours when the alternative is committing hardware to an architecture that may not survive the year.

Memory that keeps it fed

A model runs at the speed its weights can arrive, not the speed the arithmetic can be carried out. Every layer reads a block of parameters, multiplies them against the incoming activations, and writes the result for the next layer to pick up. On anything past a tiny network the parameters do not all fit in on-chip SRAM, so they sit in external memory and stream in as each layer runs, and the rate they stream at is fixed by the memory interface. That rate, far more often than the multiply count, is what sets the frame rate measured on the bench. It is the reason two parts with the same headline TOPS can post very different real throughput once a model is loaded: the one with the wider or faster memory bus keeps its arithmetic busy, while the other spends much of its time waiting on the next block of weights. Quantizing a model to eight-bit integers helps here for a reason that has little to do with accuracy. It cuts the bytes moved per inference by roughly half or three-quarters against a floating-point build, and on a memory-bound design that drop in traffic turns straight into a higher frame rate. The same pressure shapes how a model is laid out. Splitting a wide layer into tiles that fit the on-chip buffers, holding activations on-chip instead of writing them back out, loading the next tile while the current one computes: these are the levers that recover throughput no spec sheet ever promised. On the smallest parts the model and its scratch space share the same few hundred kilobytes, so the network gets designed around the memory it will run in. There is a quieter cost at the far end of the run, too. The model has to be copied from non-volatile storage into working memory before the first inference, and on a part booting a large network out of a slow serial flash that copy can add a visible lag to power-on, which matters on a device meant to wake, infer, and sleep to save its battery. Working through how memory bandwidth caps real inference speed is the step that decides whether the throughput on the datasheet holds up once a real model is sitting in the part.

Bandwidth is one question. Where the code and the model actually live is another. A small device boots its firmware from a serial NOR flash. The model and any logged data go in higher-density NAND or eMMC, and the part runs out of SDRAM or LPDDR sized to the working set. Choosing the boot and data storage parts comes down to density, interface, and how many write cycles the data pattern will see across the product's life.

Power that rides through the load

An inference load is bursty. The core sits near idle, pulls a large current in a few microseconds as a layer fires, then falls back. A supply sized for the average current lets the rail droop on that step, and a deep enough droop reads to the chip as a brownout it has to act on.

Holding the rail steady under that step is the job of a converter sized for the transient it will actually see, with a control loop fast enough to catch the edge. On a high-current core that usually means a multiphase design, or a dense power module that puts the inductor in its own package right next to the load. A board with several rails also needs them brought up in the right order, which is where a power-management IC with sequenced outputs takes the place of a handful of discrete regulators.

A buck regulator module with its inductor and bulk capacitors beside the switching device

The converter does not manage it alone. Bulk capacitance covers the first microseconds before the loop catches up, ceramics at the pins absorb the fast edge, and a ferrite bead keeps the switching noise off the analog supply. Getting power integrity right with the passives is what keeps a clean rail under a chip that can swing tens of amps between idle and full load in well under a microsecond.

The figures that drive all of this are the size of the current step and how fast it arrives. Both sit in the accelerator's documentation, in the section few people read until the rail starts misbehaving.

The thermal ceiling

Heat sets a ceiling that moves. A part holding full clock at room temperature pulls that clock back as the junction warms, and the inference rate falls with it, so the number on an open bench and the number after an hour in a closed enclosure are not the same number. The path runs from the die through the package to a spreader or a heatsink, and the thermal interface material in each gap carries as much of the budget as the metal does. Cooling silicon that throttles when it gets hot belongs in the spec from the start, sized from the real power draw and the still-air temperature inside the case, not bolted on after a prototype runs hot and slow.

From a board to something you can ship

The enclosure decides more than its size suggests. A power and thermal ceiling sets whether the box can run without a fan, how large it has to be to shed its heat, and the workload it can hold hour after hour.

A fanless design buys silence and a sealed case and pays for them in sustained compute. The same silicon that turns in a high frame rate on an open bench will throttle inside a closed box once the heat has nowhere to go, so the choice of cooling and the choice of enclosure are really one decision taken twice.

A development board proves the idea. It does not prove the product. Moving from a dev board to a board you can ship brings in the work the eval kit hid: a real power tree, EMC, mechanical fit, a bring-up plan, a layout that respects the high-speed and power rules the reference design followed quietly. A reference layout that passed radiated-emissions testing did so with its exact stackup and placement, and a redrawn board earns that result again from scratch.

One part of that work pays back only later, well after bring-up. A component that is easy to buy in a prototype run can be the one on twelve-month allocation when the order is for ten thousand units. Settling component selection and long-term supply early, with a second source named in the BOM before the first build, is what keeps a later shortage from stopping the line.

This is the point where the sourcing side of the work shows up. A part picked for a ten-year product carries a supply question that has little to do with whether it runs on the bench, and on a long-lived design that question gets answered at selection, before the first order is ever placed.

The data the model sees

A vision model is only as good as the frames it is handed. The image sensor and the optics in front of it set the dynamic range and the low-light behavior. They also decide whether a fast-moving object lands sharp or arrives smeared by a rolling shutter. The quoted frame rate assumes a lane count and a clock the host actually has to provide, and a sensor starved of MIPI lanes runs below its headline. No depth of network recovers detail the sensor never captured.

Motion and robotics work sits between the model and the mechanism. The drive and feedback parts are what let a model command a joint and read back where the joint actually went, and the quality of that feedback bounds how tightly any control loop above it can close.

Voice starts at the microphone. A clean audio front end takes the digital output of a MEMS microphone and hands the model a signal that has not already lost its quiet words to noise. For a far-field wake-word system that is most of the battle.

Between the sensors and the compute is the plain matter of moving the data fast enough. The video output and high-speed lanes carry a camera stream into the accelerator or a processed frame out to a display, over interfaces like MIPI, HDMI, and PCIe that bring their own signal-integrity rules along with them.

Some devices have to play audio back as well as listen for it. A spoken prompt or a stored clip needs the parts that decode and convert audio, a DAC or a codec sized to the speaker it drives and the power it is allowed to spend.

None of these blocks is the AI system on its own. The system is the set of them, weighed against one another, and on a lot of designs the part that ends up setting the ceiling is the power rail, the flash, or the case. The accelerator that got chosen first is rarely the thing holding it back.

Related information

HK In Fortune

Search

HK In Fortune

Products

HK In Fortune

Phone

HK In Fortune

User