Closing the Loop — The Heater Obeys

2026-03-09

The Shelly Plug S arrived. The radiator is now under software control.

The smart plug

The Shelly Plus Plug UK sits between the wall socket and the radiator heater. It exposes a local HTTP API — curl to turn the relay on or off. No cloud, no app, no subscription. Just a GET request with Digest auth.

First test from the Pi:

curl -s --digest -u admin:$SHELLY_PASS http://192.168.1.200/relay/0?turn=on
# → {"ison": true, ...}

The relay clicks. The radiator starts warming. Another curl, it stops. This is the moment the system transitions from passive monitoring to active control.

Why not let Claude control the heater?

The obvious approach: give the Claude agent direct access to the Shelly, add decision rules to its CLAUDE.md, and let it turn the heater on and off every 15 minutes.

The problem is thermal inertia. A room doesn’t respond to a heater instantly — there’s a lag of minutes between turning the heater on and seeing the temperature rise at the sensor. A 15-minute control loop is too slow. By the time the agent sees the temperature has overshot, it’s been overshooting for 14 minutes. And with only 4 readings per hour, the control signal is too coarse for tight regulation.

You need a fast loop for the physics and a slow loop for the strategy.

Two-layer architecture

FAST LOOP (thermostat.py, every 30s)
  Reads sensor → compares to config → controls Shelly relay
  Simple, deterministic, no AI

SLOW LOOP (Claude agent, every 15 min)
  Reviews thermostat performance → adjusts config if needed
  Intelligent, adaptive, can reason about trends

Layer 1 is a Python daemon running as a systemd service. Every 30 seconds, it reads the latest temperature from the Flask API, compares it against a setpoint with hysteresis bands, and turns the relay on or off. Bang-bang control — the simplest algorithm that works for a room heater.

The configuration lives in a JSON file:

{
    "setpoint": 22.8,
    "lower_band": 22.6,
    "upper_band": 23.0,
    "enabled": true
}

The thermostat re-reads this file every cycle. Change the setpoint, and within 30 seconds the new value is active. No restart needed.

Layer 2 is the Claude agent, which already runs every 15 minutes. Instead of directly controlling the heater, it now reviews the thermostat’s log — how often is the relay cycling? Is the temperature holding near the setpoint? Are the hysteresis bands too narrow (causing rapid cycling) or too wide (causing big swings)? If adjustment is needed, it modifies the JSON config file and the thermostat picks it up on the next cycle.

The agent can also disable the thermostat entirely by setting "enabled": false — a kill switch for when something goes wrong.

Layer 3 is the outer loop on the laptop — me and Claude reviewing both the thermostat and the agent, adjusting strategy, planning experiments. Three layers, three timescales: 30 seconds, 15 minutes, on-demand.

Fail-safe first

The thermostat was designed around one principle: if anything goes wrong, the heater turns off. A cold room is survivable. An overheated room kills cultures.

Sensor API unreachable → heater OFF
Sensor reading older than 5 minutes → heater OFF
Shelly unreachable → heater OFF
Config file unreadable → heater OFF
Process crashes → systemd ExecStopPost curls heater OFF
Process receives SIGTERM → signal handler turns heater OFF

Every failure mode defaults to cold.

The first real failure

Within minutes of deploying the thermostat, it proved its value — by doing nothing.

The ESP32 had been offline since 16:50. The last sensor reading was 7 hours old: 18.1°C / 59.9%. The thermostat detected the stale timestamp and refused to act:

[23:40:27] [ERROR] FAIL-SAFE — reading 24605s old (stale), heater OFF | temp=18.1

The FUMMDUS LCD hygrometer — visible in the camera photos — showed the room was actually 20.8°C. The stale 18.1°C reading would have triggered the heater needlessly. The staleness check prevented it.

Meanwhile, the Claude agent correctly reported what was happening:

- **Thermostat:** FAIL-SAFE — reading ~24700s stale, heater held OFF; correct behavior

Two independent systems — one dumb, one smart — both arrived at the right conclusion: don’t act on bad data.

Debugging the ESP32

Why was the ESP32 offline? The USB device was still present (/dev/ttyUSB0), the ESP32 responded to ping at its WiFi IP, but it wasn’t posting readings.

First attempt: reset the ESP32 via the serial DTR line. The CP2102 USB-UART bridge connects DTR to the ESP32’s EN (reset) pin — the same mechanism esptool uses before flashing. A quick Python script:

import serial, time
s = serial.Serial("/dev/ttyUSB0", 115200)
s.dtr = False
time.sleep(0.1)
s.dtr = True
s.close()

The ESP32 rebooted. Connected to WiFi. Then: Error: failed to read from DHT22 sensor. Every 30 seconds, the same error. The sensor was returning NaN on every read.

The fix wasn’t software. The firmware has no retry logic, but that wasn’t the root cause — the DHT22 had a loose wire. A physical reseat of the jumper wires, another DTR reset, and readings started flowing immediately:

[23:51:58] [ACTION] temp=20.5 | heater=ON | ACTION=ON (temp 20.5 < lower 22.6)
[23:52:28] [INFO] temp=20.5 | heater=ON
[23:52:58] [INFO] temp=21.4 | heater=ON
[23:53:28] [INFO] temp=21.6 | heater=ON

The thermostat saw 20.5°C — below the 22.6°C lower band — and turned the heater on. Within two minutes, the temperature was climbing.

Lesson: When a sensor fails, check the wires before writing software workarounds. The DTR reset was a useful diagnostic tool (it confirmed the ESP32 itself was fine), but the fix was physical. Software can’t fix a disconnected wire, and an auto-reset watchdog would have just boot-looped the ESP32 every 10 minutes while the DHT22 stayed broken.

Flask proxy endpoints

The Shelly’s local IP (192.168.1.200) is only reachable on the home network. To control the heater remotely — from the office, from a phone, or from the outer-loop laptop — the Flask server now proxies Shelly requests:

# From anywhere via Cloudflare tunnel
ssh daniel@ssh.egc.land "curl -s http://localhost:5000/shelly?key=tclab2026"
ssh daniel@ssh.egc.land "curl -s http://localhost:5000/shelly/on?key=tclab2026"

The Flask proxy handles Digest authentication internally. The outer loop doesn’t need the Shelly’s password — it authenticates to Flask with the API key, and Flask talks to the Shelly.

Similarly, the thermostat config is accessible remotely:

# Read config and recent actions
ssh daniel@ssh.egc.land "curl -s http://localhost:5000/thermostat?key=tclab2026"

# Adjust setpoint remotely
ssh daniel@ssh.egc.land 'curl -s -X POST -H "Content-Type: application/json" \
  -d "{\"setpoint\": 23.0}" http://localhost:5000/thermostat?key=tclab2026'

Where things stand

Item	Status
Blog	Live at blog.egc.land
Agent	Running 24/7, now reviews thermostat
Thermostat	Running 24/7, bang-bang with hysteresis
Shelly	Connected at 192.168.1.200, Digest auth
Camera	4608×2592, auto-capture every cycle
Sensor cross-ref	DHT22 vs FUMMDUS LCD, validated
ESP32	Back online after wire reseat
Environment	Active temperature control, targeting 22.8°C

The system now has closed-loop temperature control. The thermostat reads the sensor, decides, acts, and the room responds. The Claude agent watches the thermostat, and I watch the agent. Three loops, three timescales, one room temperature.

Next: wait for the temperature to stabilise and see how tight the thermostat holds the setpoint. If the hysteresis bands are right, the relay should cycle a few times per hour and the temperature should stay within ±0.2°C of 22.8°C.