SCADABLE

Why Your Hand-Rolled BLE Multi-Device Pipeline Stops Scaling Around 10 Devices

If your ESP32 BLE central works fine at 3 peripherals and falls apart at 10, the problem is structural, not a tuning issue. Here is what actually breaks.


A founder we work with described his early BLE pipeline like this: "three devices at once connected sending information over Bluetooth to a central processing unit." At three devices it worked. At ten it limped. By the time he was prototyping a system with a dozen sensor modules, he was rewriting the central firmware every time a new module type showed up.

If you are running 2 to 5 BLE peripherals into one ESP32 or Raspberry Pi central and trying to push that to 10, 20, or 50, this post is for you. I am going to walk through exactly where the wall is, why it is in a different place than most builders expect, and what your real options are when you hit it. None of the four fixes are free. I will be honest about each one.

The wall starts at the BLE central, not the peripheral

Most teams assume the bottleneck is on the sensor side: radio range, battery, advertising interval. It almost never is. The wall lives in the BLE central, and specifically in the connection table.

If you are on an ESP32 with ESP-IDF and NimBLE, look at your sdkconfig:

// Default in ESP-IDF v5.1, NimBLE host
CONFIG_BT_NIMBLE_MAX_CONNECTIONS=3

Three. That is the out-of-the-box ceiling. You can crank it higher, and most teams discover this the first time they try to connect a fourth peripheral and silently get BLE_HS_ENOMEM. The Apache NimBLE stack on ESP32 supports up to 9 simultaneous connections in central role on ESP32-WROOM, and the Bluedroid stack tops out around 7 by default. Bumping the limit looks like this:

// sdkconfig.defaults
CONFIG_BT_NIMBLE_MAX_CONNECTIONS=9
CONFIG_BT_NIMBLE_ROLE_CENTRAL=y
CONFIG_BT_NIMBLE_HOST_TASK_STACK_SIZE=5120

Done, right? Not quite. Each additional connection allocates a ble_hs_conn struct, a per-connection ATT context, and event queue space. By the time you are at 8 active connections you are eating into the 320 KB of internal SRAM you wanted for your application logic, and you start seeing watchdog resets under sustained notify traffic. A Raspberry Pi with BlueZ has more headroom (the kernel manages the host stack), but the radio and the controller firmware on the CYW43455 still cap practical concurrent links somewhere between 7 and 10 before throughput collapses.

So the first wall is structural: the BLE 4.2 / 5.0 spec allows up to 20+ concurrent connections on paper, but real silicon and real host stacks ship with much lower defaults, and pushing past those defaults costs RAM and stability. This is the part that surprises most people building their first ble central peripheral architecture at scale.

Connection intervals and scheduling collapse

Even after you raise the connection limit, the radio is still a single-threaded resource. The BLE link layer time-slices. Every connected peripheral negotiates a connection interval (the minimum is 7.5 ms, the practical default is 15 to 30 ms), and the central has to service every link inside its own interval window or the peripheral marks the connection as supervision-timed-out.

The math is brutal. If you have 10 peripherals all asking for a 15 ms connection interval, the central needs to fit 10 RX/TX windows into roughly 150 ms of wall clock. Add some advertising scan windows, add the ATT MTU exchange overhead (peripheral wants 247-byte MTU, default is 23, you renegotiate on connect), and the radio scheduler starts dropping events.

What you see in production:

# logcat from a bench rig, 9 connected peripherals at 15ms interval
I (12483) NIMBLE: GAP procedure initiated: connect
W (12491) NIMBLE: connection event missed, conn_handle=4
W (12603) NIMBLE: connection event missed, conn_handle=7
W (12715) NIMBLE: notify dropped, conn_handle=2 reason=ENOMEM
E (13002) NIMBLE: connection supervision timeout, conn_handle=4

The standard Band-Aid is connection interval staggering. Push half your peripherals to 30 ms intervals, the other half to 50 ms, and force an offset on each one so their TX windows do not collide. This buys you maybe a 1.5x headroom factor. It works at 12 peripherals. It fails at 20, because now the slow peripherals are reporting at 50 ms, your data freshness budget is blown, and the staggering logic itself becomes a piece of code you have to maintain.

This is the answer to the very common search query how many ble devices can one gateway handle: roughly 7 to 10 with a single radio and a hand-rolled scheduler, with throughput and latency degrading sharply across that range. Anyone telling you 20+ on a single ESP32 is either using BLE 5.0 extended advertising in a one-way fashion (no GATT subscriptions, no ACK), or measuring on the bench with no real payload.

The maintainability trap

This is the part that breaks teams faster than the radio ever does. You can throw money at the radio problem (better SoC, second BLE chip, mesh). You cannot throw money at the part of the codebase that is rotting.

A hand-rolled BLE central that supports a single sensor type is clean. Add a second type and it is still clean. Add a sixth and you are in trouble. Here is what literally has to change in the central every time a new sensor module is added to the fleet:

  1. New GATT service and characteristic UUIDs registered in the discovery whitelist.
  2. New parser for that sensor's payload format (endianness, scaling factor, packed struct layout).
  3. New schema for the data emitted to whatever sits behind the central (cloud, local storage, MQTT).
  4. New conditional branch in the central's notify dispatch loop: if conn.service_uuid == 0xFEAA: parse_temp(); elif ... .
  5. New error path for that sensor's failure modes (sensor X reports 0xFFFF for "not ready", sensor Y disconnects instead).
  6. New connection parameter profile (sensor X wants 30 ms, sensor Y wants 100 ms, sensor Z is intermittent and you need a reconnect backoff specifically for it).
  7. Updated test fixtures, because your existing mocks do not know about this characteristic.

Multiply that list by 6 to 12 sensor types and the central firmware becomes a 4000-line switch statement with no clear seams. The founder I quoted at the top of this post said it cleanly:

"It's not a scalable solution. If I ever need to add a new sensor or remove a sensor, I'd have to change so many things. And so it was not very dynamic."

(Founder we work with, mid-rebuild conversation, 2026)

He also said something that does not get enough attention: "each machine has their own protocol. So not only languages, but even like system architecture." That is the deeper version of the maintainability trap. It is not just that the parsers are different. The whole shape of how each peripheral expects to be talked to is different, and a hand-rolled central is the place where all those shapes get reconciled by hand. Every. Single. Time.

This is why the bluetooth sensor pipeline scale problem is rarely solved by buying a bigger radio. The radio gets you to 10 devices. The codebase has already given up by then.

The four shapes of fix

When you hit this wall, you have four real options. None of them are free. I have watched teams pick each of these and I will tell you what I have seen.

1. Add a second BLE central plus a host bridge. Run two ESP32s, each owning half the peripheral fleet, and bridge them over UART or SPI to a single host (often a Pi or an ARM Linux SBC). Cheap in hardware, cheap to start. Extends you to roughly 18 to 20 devices. Doubles your firmware maintenance because now you have two centrals to keep in sync, and you have a host bridge protocol you also have to maintain. Good move if you are at 8 devices and shipping in 6 weeks. Bad move if you are headed to 50.

2. Move to BLE mesh. This is the spec answer to scale. BLE mesh handles hundreds of nodes, uses managed flooding, and is built for exactly this problem. The catch: BLE mesh is genuinely complex (provisioning, friendship, low-power nodes, model layer), not all chips support it well, and mesh stability under packet loss is a known sore spot in production deployments. If you have one firmware engineer and a 4-month runway, BLE mesh will eat the runway. If you have a team that has shipped mesh before, it is the right answer.

3. Rebuild around an MQTT broker on a Linux gateway. The most common path. You stop trying to scale the BLE central and instead make each "peripheral" a small networked node (still BLE on the local link, but now it talks to a Linux gateway running an MQTT broker like Mosquitto or NanoMQ). The Linux gateway is the new hub. This works. The catch: real engineering work to migrate, and the edge gateway is now your single point of failure. You have just moved the wall, not removed it. Worth it if you are scaling to 50+ devices and you want a path to cloud telemetry.

4. Use a library or SDK that handles the multi-protocol fan-in for you. Instead of building the central, the bridge, the broker, and the cloud pipe yourself, drop in something that already handles that surface. This is what we are building at SCADABLE: an SDK plus edge runtime where you write the device-specific parser once and the rest (multi-protocol fan-in, gateway, telemetry, command pipe) is handled. I will not pitch hard here. Just naming it as the fourth real option, because building it yourself is also a real option, and you should know what you are choosing between.

When to rewrite vs when to wrap

The honest decision tree is not "rewrite is always right" or "library is always right." It depends on where you are.

If you are at 5 devices, shipping is 3 months out, and you have one firmware engineer: wrap. A rewrite at this stage will eat your whole runway and you will arrive at the same architectural decisions a 6-person team made before you, but slower and with more bugs. The rewrite-it-yourself path optimizes for control you do not need yet.

If you are at 50 devices, the pipeline is in production, and the rewrite is itself a 6-month project: stop and evaluate before you commit. A 6-month rewrite of a working-but-fragile system is the most expensive thing a small hardware team can do. The version of the question Salim faced ("code it myself or hire a co-op student to do it") is the wrong frame. The right frame is: what is the smallest change that buys me 12 more months of headroom, and what is the right architecture if I had to start clean today? Sometimes those are the same answer. Often they are not.

The teams that get burned hardest are the ones who pick option 1 (second BLE central) when they should have picked option 3 or 4, because option 1 is cheap to start and expensive to walk back. Six months later they have two centrals, a host bridge, no abstraction over the sensor protocols, and the rewrite they were trying to avoid is now twice as big.

If you are hitting this wall

If you are looking at your own central firmware and the patterns in this post feel familiar (the connection count that does not match what the spec says is possible, the switch statement that grew teeth, the dread of adding the next sensor type), we run free 30-minute architecture reviews. Builder to builder. We look at your current BLE setup, tell you honestly what breaks at 10x and 50x, and walk through whether a rewrite, a wrap, or a staged migration makes sense for your timeline and team size.

No demo, no slides, no sales pitch. Just an architecture conversation with someone who has seen this exact wall.

Book a slot at https://cal.com/rahbaral/quick-chat.