TC Lab Blog

The Agent Goes Autonomous


The biggest day yet. Five sessions spanning 20 hours, from setting up an autonomous agent loop on the Pi to reviewing its first 17 hours of independent operation.

Session 1: The inner agent

Until now, Claude Code ran on my laptop. I’d SSH into the Pi, pull data, analyse it. The human was in the loop for every reading. Today that changed.

Installed Node.js 20, tmux, and Claude Code on the Pi itself. The Pi is ARM64 (aarch64) running Debian 13 — Claude Code’s npm install worked without any workarounds. Authenticated via browser OAuth (the one step that needed a human).

Created an agent loop: a bash script that invokes Claude Code every 15 minutes with a task prompt. The prompt tells Claude to read the sensors, check actuators, analyse trends, and log a structured entry. Each cycle runs with --print --dangerously-skip-permissions — non-interactive, fully autonomous.

The agent has its own CLAUDE.md on the Pi — a separate “brain” from my laptop’s CLAUDE.md. The inner agent’s brain contains target ranges (20-25°C, 40-70% humidity), decision rules (heater ON if below 20°C), log format specifications, and actuator config. It doesn’t need SSH instructions or WSL2 context — it only talks to localhost.

First test cycle at 03:39 UTC: read 22.8°C and 40.2% humidity, analysed trend as stable, decided no action needed, logged a structured entry. The agent was alive.

The two-loop architecture

INNER LOOP (Pi, autonomous, 24/7)
  Claude Code runs every 15 min
  Reads sensors → analyses → decides → logs

OUTER LOOP (laptop, on-demand)
  Human + Claude Code review inner agent logs
  Adjust protocols, update the agent's brain

The inner loop is the lab technician — reliable, always on, following protocol. The outer loop is the lab director — reviewing performance, adjusting strategy, planning experiments. The inner agent’s brain (CLAUDE.md) is the protocol document that the outer loop rewrites when strategy changes.

Session 2: Surviving reboots

The Pi was power-cycled. The Flask server and Cloudflare tunnel came back automatically (systemd services). The agent loop, running in a plain tmux session, did not.

Created a systemd service for the agent: Type=forking (tmux forks into background), Requires=lab-server.service (the agent needs the Flask API), Restart=on-failure with a 60-second delay. The agent now auto-starts on boot and survives power cuts.

Key lesson: use Requires= AND After= together for hard dependencies. After= alone only orders startup — it doesn’t prevent starting if the dependency failed. And when systemd manages a tmux session, all control must go through systemctl. Having wrapper scripts that also manage tmux directly causes conflicts.

Also discovered the wrong item had been delivered — which turned out to be a much bigger story than expected.

The camera saga

I ordered an official Raspberry Pi Camera Module 3 (SC0872) from Amazon UK — £35.43, “Dispatched from Amazon, Sold by Feels Like Christmas” (253 ratings, 100% positive). What arrived was a bare IMX708 sensor chip. Just the imaging sensor on a tiny flex connector. No green carrier PCB, no 15-pin ribbon cable, no lens housing, no autofocus mechanism. The bare sensor’s connector is completely incompatible with the Pi’s CSI port — different size, different pin count. It’s like ordering a complete engine and receiving a spark plug.

I checked the seller’s 1-star reviews (filtered from the default view, which hides them): “item was not the same as ordered,” “already been opened and used,” “arrived damaged.” Every negative review had the same note: “Amazon takes responsibility for this fulfillment experience.” Because Amazon’s warehouse did the picking, not the seller.

This is the commingled inventory problem. All sellers using “Dispatched from Amazon” (Fulfilled by Amazon / FBA) share the same warehouse bins. The seller ships their stock to Amazon’s warehouse, where it gets mixed with identical SKUs from every other seller. When you order, Amazon picks from the shared bin. The seller you chose never touches the product. You could order from five different “sellers” and get the same item from the same bin.

I ordered a replacement from JAF Enterprises (822 ratings, 100% positive) — same price, same “Dispatched from Amazon” — meaning almost certainly the same warehouse bin. Simultaneously ordered from The Pi Hut (£24, Cambridge-based official Raspberry Pi retailer) as the guaranteed backup. The Pi Hut holds their own stock in their own warehouse. No commingling.

Strategy: if JAF sends the right thing, keep both cameras (one for cultures, one for room overview). If wrong, return both Amazon orders and rely on The Pi Hut.

Lesson: For specialist hardware, buy from specialist retailers who hold their own stock. Amazon marketplace sellers using FBA are all drawing from the same warehouse bin — the “seller” is a fiction.

Session 3: Verbose tracing

The agent was running but its output was opaque — just the final response from each cycle. I needed to see every tool call, every command, every reasoning step.

Tested Claude Code’s CLI flags:

Built a bash+jq script (format-trace.sh) that parses the stream-json output into human-readable logs:

[INIT] session=9b9faceb... model=claude-opus-4-6
[TOOL CALL] Bash: curl -s http://localhost:5000/reading?key=...
[TOOL OUTPUT] {"humidity":40.5,"temperature":22.6,...}
[REASONING] Temp 22.6C in range. Humidity 40.5% near lower bound.
[RESULT] turns=6 duration=31901ms cost=$0.063

Each cycle: ~30 seconds, ~$0.06, ~37KB of verbose output. At 96 cycles/day that’s ~3.5MB of logs. Added daily log rotation — the filename includes the date and switches at midnight automatically. 30-day retention, cleaned on startup.

Session 4: A silent SQLite bug

During the director’s review, I noticed the agent was reporting “497 readings” for a 1-hour history request. At 30-second intervals, 1 hour should be ~120 readings. Something was very wrong.

The bug was in the Flask server’s history query:

WHERE timestamp > datetime('now', '-1 hours')

SQLite’s datetime() produces: 2026-03-07 05:00:00 (space between date and time). Python’s datetime.isoformat() stores: 2026-03-07T05:00:00.123456 (T separator).

SQLite does lexicographic string comparison. T is ASCII 84, space is ASCII 32. Every stored timestamp is “greater than” any datetime() cutoff. The WHERE clause matched everything — every reading since the Pi booted.

One-line fix:

WHERE timestamp > strftime('%Y-%m-%dT%H:%M:%S', 'now', '-1 hours')

strftime with an explicit format produces the T separator, matching the stored format. After the fix: ?hours=1 returned 120 readings spanning exactly 60 minutes.

This bug had been present since the server was first written but was invisible until the agent started reporting trends. The “1 hour trend” was actually a “trend since midnight.” No errors, no crashes — just silently wrong data. The agent happily analysed 500+ readings thinking it was 1 hour of data.

Lesson: Always sanity-check row counts against expected values. 30-second intervals × 60 minutes = ~120 rows per hour. If you’re getting 497, something is broken.

Session 5: This blog

Deployed an Astro site to Cloudflare Pages at blog.egc.land. Ran into a DNS caching issue — the ISP had cached an NXDOMAIN response from before the CNAME record existed, and ipconfig /flushdns only clears the local Windows cache, not the ISP’s. Fixed by switching to Cloudflare’s public DNS resolver (1.1.1.1).

The director’s review

With 17 hours of autonomous operation, I ran the first comprehensive review — the outer loop examining the inner loop.

61 cycles, zero failures

The agent completed every scheduled cycle across the full monitoring period. No missed cycles, no crashes, no incorrect decisions. The systemd service kept it running through the night.

Temperature: excellent

MetricValue
Min21.7°C
Max24.3°C
Mean22.96°C
Outside target0%

The full day’s range stayed well within the 20-25°C target with a 2°C margin on each side. Temperature control is not a concern.

Humidity: the weak point

MetricValue
Min37.3%
Max45.1%
Mean41.69%
Below 40% floor12.1% of readings

From roughly 02:00 to 08:30 UTC, humidity dropped below the 40% floor. The lowest point was 37.3% at 06:21. A sharp shift at ~06:10 — temperature jumped +1.5°C and humidity dropped ~3% in minutes. Likely the freestanding electric radiator in the room — the same one we identified as the primary environmental problem in our earlier data analysis.

The agent correctly identified and flagged this with WARNING status across 9 consecutive cycles. It tracked “2nd consecutive,” “3rd consecutive,” and noted the recovery at 08:28 as “first in-range reading after 9 consecutive low cycles.” Good emergent behaviour.

Decision quality: correct across all 61 cycles

Cost

MetricValue
Total cost$7.42
Mean per cycle$0.122
Projected daily (96 cycles)$11.67

Higher than the initial $0.063/cycle estimate. The agent developed two different strategies for appending log entries: an efficient cat >> approach (5 turns, ~$0.07) and a less efficient Read+Edit approach (7-9 turns, ~$0.14). It uses Read+Edit 72% of the time. Fixing the append strategy should bring costs down to ~$6.72/day.

Verdict: not ready for cultures

CriterionStatus
Temperature controlREADY
Humidity controlNOT READY
Monitoring systemREADY
Heater controlPENDING (Shelly arrives tomorrow)
Visual monitoringPENDING (camera replacement 8-10 March)
Data backupREADY

The monitoring and control infrastructure is solid. But the environment needs a humidifier before cultures can be safely started — humidity regularly dips below the 40% floor. Once the Shelly is connected and humidity is addressed, the system will be ready for mint.

What I learned today

  1. Claude Code runs fine on ARM64 Raspberry Pi. No special workarounds.
  2. --debug only works in interactive mode. For headless agent loops, use --verbose --output-format stream-json.
  3. tmux sessions don’t survive reboots. Always back them with systemd for 24/7 operation.
  4. SQLite’s datetime() and Python’s isoformat() produce different timestamp formats — space vs T separator. This causes silent data bugs with string comparison in WHERE clauses.
  5. An autonomous agent will develop its own strategies (like using Read+Edit instead of cat >>) that may not be optimal. The outer loop needs to review and course-correct.
  6. The first thing the agent got right without being told: when it saw 497 readings for “1 hour” of history, it autonomously piped them through python3 to extract first/last for trend analysis instead of processing all of them. Good emergent behaviour even while working with a bug.