IoT Middleware vs Library: Why Your Protocol Translation Layer Doesn't Scale

A founder we interviewed last month said this, unprompted, near the end of the call:

"Each machine has their own protocol. So not only languages, but even like system architecture and so on. And one of the problems that the industry faces. You have all those machines are amazing. They're solving individual problems, but they're at the root, they're different. And you cannot just like, if you want to build a middleware, that middleware is going to be massive because you have to transcribe this protocol and that protocol. You can't update it."

He was describing a middleware layer he had been about to build, then stepped back from. He is right. Most teams writing a protocol-translation layer between Modbus, BLE, MQTT, CoAP, OPC-UA, and whatever proprietary stack the latest vendor shipped do not realize they have built middleware until it is too big to change. This post walks through the architectural reason, the math behind it, and why a library shape collapses the maintenance curve that middleware cannot.

Why middleware grows linearly (and then worse)

Start with the simplest framing. You have N protocols on the device side, and K fields in your internal canonical model. A middleware that translates between them is, at the minimum, an N times K mapping table. Four protocols and thirty fields gives you 120 mapping rules. Add a fifth protocol, you write 30 new rules. Add a 31st field, you write 4 new rules. Linear in each axis, multiplicative across both.

That alone is manageable. The problem is that K grows too. Every time a new device class lands (a new sensor type, a new actuator, a new gateway model), the canonical model gains fields. Every time a vendor revs a protocol minor version, your mapping rules for that protocol fork. So the steady-state curve is not N times K, it is N times K times V, where V is the number of versions you have to support across the fleet at any given time.

Concretely, here is what a single mapping entry tends to look like once it has lived in production for a year:

# canonical: telemetry.temperature_c (float, IEEE-754, no unit conversion)
# vendor X, protocol modbus, firmware >= 2.4
MAPPINGS["modbus"]["X"]["temperature_c"] = {
    "register": 0x0042,
    "type": "int16",
    "scale": 0.1,           # raw / 10 = degrees C
    "byte_order": "big",
    "valid_range": (-40, 125),
    "stale_after_ms": 5000,
    "fallback": None,
    "fw_min": "2.4.0",
    "fw_max": "3.x",
}

That is one cell in the table. Multiply by N times K times V. Now imagine a vendor changes the register layout in firmware 3.2. You add a new entry. Now you have two entries that both claim authority over telemetry.temperature_c for vendor X, gated by firmware version. The dispatch logic that decides which one to use becomes its own piece of code that nobody owns. This is the linear-growth tax compounding into a quadratic-feeling cliff. The reader who has been here knows the feeling.

It is not bytes. It is system architecture.

The deeper problem (the one Salim named explicitly) is that protocols are not just wire formats. They are full system architectures with their own assumptions about who speaks first, who waits, and what guarantees the wire provides.

Modbus TCP is request/response, polling-driven, no native push, no QoS, ordering is implicit in the request stream. The master decides when to read. The slave answers or times out. There is no concept of "the device pushed a value at 12:01:03.412Z."

MQTT is publish/subscribe over a persistent TCP session, with QoS levels (0, 1, 2), retained messages, and an LWT for disconnection signalling. The device decides when to publish. The broker fans out. Ordering is per-topic per-publisher.

BLE GATT is connection-oriented, with services and characteristics, notifications and indications, ATT MTU negotiation, and an explicit connection-interval contract that the central scheduler has to honor or the link dies.

OPC-UA is session-based with a structured information model, monitored items, subscriptions with publishing intervals, and built-in security at the protocol layer.

A middleware that translates between these has to bridge the architecture, not just the bytes. A Modbus poll loop that wants to feed an MQTT-shaped consumer needs a synthetic timestamp generator, a delta-detection layer (because Modbus does not tell you when a value changed, only what it currently is), and an artificial QoS contract (because the underlying poll has no acknowledgement semantics that map cleanly to MQTT QoS 1). If your canonical model expects "events," and Modbus only gives you "snapshots," you are inventing the events. That invention is code. It lives somewhere. It rots.

Here is a sketch of the bridging logic for one direction:

// Modbus poll snapshot -> canonical event stream
func bridgeModbusToEvents(prev, curr Snapshot, gw GatewayMeta) []Event {
    var out []Event
    for reg, val := range curr.Registers {
        if old, ok := prev.Registers[reg]; ok && old == val {
            continue // no change, no event (a Modbus snapshot is not an event)
        }
        out = append(out, Event{
            Source:    "modbus",
            Gateway:   gw.ID,
            Register:  reg,
            Value:     val,
            Timestamp: time.Now(), // synthetic; Modbus did not give us one
            QoS:       1,          // synthetic; Modbus has no QoS
        })
    }
    return out
}

That eight-line function carries three lies the rest of the system will believe forever (the synthetic timestamp, the synthetic QoS, the assumption that "no change since last poll" means "nothing happened in between"). Every protocol in your middleware ships its own set of lies. The middleware's job is to make them all interchangeable. That job is not small, and it does not get smaller.

The update problem is the real killer

"If you want to build a middleware, that middleware is going to be massive because you have to transcribe this protocol and that protocol. You can't update it."

(Founder we interviewed, unprompted insight)

This is the line that should make every architect pause. The mapping table is large, fine, large is manageable. The problem is that vendors update their protocols asynchronously, the fleet is mixed, and the middleware has to keep working through the transition.

Concretely, when a vendor revs a Modbus register layout for one of your device classes, here is what a clean middleware needs to do:

Locate every translation rule that references the changed registers (across mapping tables, validators, fallback paths, alerting rules).
Author new rules gated on firmware version, without breaking the old rules for devices still on the old firmware.
Re-test every canonical field that depends on the changed registers, including any derived fields downstream.
Coordinate the deploy with the firmware rollout, since the middleware update has to land before any device on the new firmware reports, but cannot break the old fleet.
Track which devices are on which firmware so the dispatch logic picks the right rule.
Roll back cleanly if the new rules turn out to be wrong, without rolling back the rules for unaffected protocols.

Most middleware I have looked at has zero of these as automated workflows. The mapping table is a Python dict, a YAML file, or a SQL row. The dependency graph between rules and fields is in the head of whoever wrote it. Contract tests cover the happy path of one firmware version. Schema versioning is "we'll add a column when we need to." Deployment fences (the rule that says "do not deploy mapping vN+1 until firmware vN+1 has at least Q% adoption") do not exist.

The reason most teams do not build these workflows is not laziness. It is that building them is a substantial second project, layered on top of the first project (the actual translation logic). It is the kind of thing that, in retrospect, you wish you had built before the maintenance pain forced you to. By the time the pain forces you to, the middleware is too entangled to retrofit cleanly.

Why a library shape is structurally different

The library answer to this is not magic. It is a different distribution of who-knows-what.

In a middleware shape, the central translation layer knows about every protocol and every field. It is the only place where Modbus and MQTT meet. Every change touches it.

In a library shape, each protocol adapter is its own module with its own state machine, its own version contract, its own test surface, and its own documented assumptions. The library exposes them as composable building blocks. The application (yours, the one that actually has opinions about the data model) composes them into a pipeline that fits its semantics.

// Application composes adapters; each adapter owns its protocol fully.
func main() {
    modbus := modbusadapter.New(modbusadapter.Config{
        PollInterval: 500 * time.Millisecond,
        DeltaOnly:    true,
    })
    ble := bleadapter.New(bleadapter.Config{
        Subscriptions: []GATT{tempSensor, batteryService},
    })
    mqtt := mqttadapter.New(mqttadapter.Config{
        QoS:    1,
        Topics: []string{"plant/+/telemetry"},
    })
 
    canonical := pipeline.Merge(
        modbus.Events(),
        ble.Events(),
        mqtt.Events(),
    )
    sink.Consume(canonical)
}

When the Modbus vendor revs a register layout, the change lives inside modbusadapter. The BLE adapter does not know and does not care. The MQTT adapter does not know and does not care. The application's composition does not change. The canonical event type does not change unless the application chooses to change it.

This is not "middleware is bad and libraries are good." It is that, for the specific problem of bridging heterogeneous protocols with their own system architectures, the maintenance unit a library exposes (one adapter, one version, one contract) matches the maintenance unit reality hands you (one vendor, one firmware bump, one protocol change). Middleware mismatches that unit. Every change has to enter through the central layer, where every other change also lives. That is the structural reason it does not scale.

When middleware is fine anyway

I want to be honest here, because the library answer is not universally right.

If you have two or three protocols, all stable, all from vendors who do not aggressively rev firmware, and your canonical model has settled, a middleware shape is genuinely simpler. One mapping table, one dispatch loop, one deploy. You will not feel the linear-growth tax until you scale, and if your scale is bounded, you may never feel it.

The decision is not about the shape, it is about your trajectory. Five honest questions:

How many distinct device protocols will you support 18 months from now? (Be pessimistic.)
Of those, how many are owned by vendors who rev their wire protocol on a schedule you do not control?
Does your canonical model add a field every quarter, or has it stabilized?
Can your team afford to fence deploys behind firmware-rollout adoption metrics today?
When a vendor breaks a register layout, do you find out before or after a customer's data goes weird?

If the answers are "two or three, mostly stable, model is settled, fences exist, we catch it," you are fine. Build the middleware, ship it, move on. If the answers trend the other way (five-plus, several are moving, model is still growing, fences do not exist, you find out after), the library shape is the more durable architecture. Not because libraries are universally better, but because for your trajectory the maintenance economics of middleware are bad enough that the library shape wins on its own merits.

How SCADABLE applies this

SCADABLE was built library-first for exactly this reason. Each protocol adapter (Modbus, MQTT, BLE, OPC-UA, the rest) is a separate module with its own version contract and its own state machine. The gateway runtime composes them. Adding a new protocol does not require touching the existing ones. We did this because we watched our first customer hit the middleware wall and realized that hitting it is the failure mode of the whole category, not a thing that one team got unlucky with.

If your translation layer is getting harder to update

If you are staring at a protocol-translation layer that is getting harder to update, or you are about to build one and you would like an outside perspective on what to keep simple and what to factor out, we run 30-min architecture reviews. Builder to builder. We look at your current setup, name the parts that will rot first, and walk through whether a refactor toward adapter-shaped boundaries is worth doing now or later for your timeline and team size.

No demo, no slides, no sales pitch. Just an architecture conversation with someone who has been on the other side of this exact wall.

Book a slot at https://cal.com/rahbaral/quick-chat.