This is a short update to My Agentic Workflow, May 2026.

The shape did not change. Codex with GPT-5.5 is still the daily driver, agent-scripts is still the base layer, and the loop is still conversational discovery first, then gates that force repo context, checks, review, and an explicit handoff.

What changed in the last month is the layer around the loop. Two things stand out: the review skill got rebuilt and consolidated, and the workflow I described in May as “still work in progress” actually landed.

The Review Skill Got Consolidated: codex-review → autoreview

In May the review gate lived in a skill called codex-review. It worked, but it was Codex-shaped: one skill, one engine, with a helper bolted on for parallel tests.

I replaced it with autoreview and removed the legacy codex-review skill, so all review routing now goes through one place.

The important changes:

  • It is engine-agnostic. Codex is still the default, but Claude, Pi, Copilot, and OpenCode are selectable when installed. My local CLI/model defaults stay in charge unless I explicitly override with --model.
  • It is explicitly one review. The helper builds one bundle, calls one selected engine, validates one structured result, and stops. No nested reviewers, no reviewer panels by default. Multi-reviewer panels are opt-in for when the risk justifies the spend.
  • Heartbeats are first-class. Lines like review still running: ... elapsed=... pid=... are treated as healthy progress, not a hang. Structured review can take up to 30 minutes with tools or web search active, so the rule is to be patient and not kill a quiet-but-alive run.

The contract is the same one that made the May version useful, just stated more sharply: review output is advisory, never blindly applied. Every finding gets verified against the real code path and adjacent files. Unrealistic edge cases, speculative risks, and broad rewrites get rejected. Real findings get fixed at the smallest sensible ownership boundary, and if a fix changes code, the focused tests and the review run again. It keeps looping until there are no accepted findings left.

flowchart LR Edit[Agent edits] --> Tests[Focused tests] Tests --> Review[autoreview: one engine, one bundle] Review --> Findings{Accepted findings?} Findings -- yes --> Fix[Fix at right boundary] Fix --> Tests Findings -- no --> Clean[Clean closeout]

Pointing the helper at the right target is a one-liner. Set the skill path once, then review dirty local work:

export AUTOREVIEW="$HOME/Projects/agent-scripts/skills/autoreview/scripts/autoreview"
"$AUTOREVIEW" --mode local

The point is unchanged from May: this is not about obeying a second model. It is about adding one more pressure test to the patch before I see it.

The Matt Pocock Workflow Actually Landed

In May I wrote that I was experimenting with Matt Pocock’s skills and that they were “not part of my complete workflow yet.”

That changed. They are in the daily loop now:

  • grill-with-docs for grounding a problem against real docs before building.
  • to-prd for turning current context into a GitHub issue-ready PRD.
  • to-issues for splitting a vague idea into concrete issues.
  • tdd for making behavior concrete before the implementation follows.

This is the same instinct I already had — make the behavior concrete first, then let the implementation follow — but now there are named skills the agent reaches for instead of me re-explaining the shape each time. The useful side effect is the artifacts: a PRD or a set of issues that another person could read and challenge.

What sold me was building a whole project this way. I created gohealthcli as a test of the full Matt Pocock loop, and I liked working that way enough that it stuck. More on that below.

New: A Maintainer Loop For The Boring Upkeep

The biggest genuinely new piece is bram-maintainer-loop, adapted from an upstream maintainer-orchestrator.

It is a control-plane skill, not an implementation skill. It does not write much code itself. It inspects, delegates, monitors, asks for decisions, and reports, while the actual repository work happens in separate worker threads.

The operating model:

  1. Use github-project-triage to map each flagged repo’s open issues, PRs, CI, latest release, and dirty state.
  2. Classify every queue item as Autonomous (clear fit, reproducible, bounded), Needs Bram (product call, security decision, missing credential, or destructive choice), or Ignored (only when I explicitly say so).
  3. Delegate independent repos to separate Codex threads, keep the coordinator thread lightweight, and monitor by reading state rather than steering.
  4. Continue until each autonomous item is merged with proof, and each decision item has a mergeable PR ready for my land/delete/access choice.

The rule I like most is the decision-ready queue rule: it should never bring me a rough contributor branch or a bare URL and ask me to decide. By the time it asks, the PR is reproduced, fixed, tested, reviewed with autoreview, and CI-green. My job collapses to one of: land it, close it, give one exact access step, grant one waiver, or pick between documented alternatives.

The flagged repo list lives in config/bram-loop-repos.txt, starting with gohealthcli and gobankcli — the read-only bank archive from the May post is now something the loop keeps current rather than something I babysit.

A Real Example: gohealthcli

gohealthcli is the June equivalent of last month’s gobankcli example, and it is the project that convinced me to keep both the Matt Pocock skills and the loop.

It is a local-first, read-only archive for Google Health data, and I built it end to end as a test of the full workflow rather than my old conversational-and-implement style.

The first commits were not features. They were the workflow itself:

2026-05-24 chore: initial commit
2026-05-25 docs: define foundation-grade first release
2026-05-25 docs: configure agent issue workflow

That second and third commit are the point. Before real code, the repo got a CONTEXT.md, a set of ADRs, and an explicit agent issue workflow: PRDs and issues live in GitHub Issues, with canonical triage labels (needs-triage, ready-for-agent, ready-for-human, and so on) that the engineering skills actually speak.

From there it ran as a PRD-and-issue machine. A feature became a PRD, the PRD got sliced into numbered issues, and each slice landed as its own reviewed PR. You can read the shape straight off the commit log:

feat(read): unified --db / --config resolver (PRD #144 slice 1)
feat(read): query --json JSON-column passthrough (PRD #144 slice 5)
feat(read): query default mode emits --plain shape (PRD #144 slice 7)
docs(read): refresh docs/commands + README for PRD #144 (slice 10)

One PRD, ten small slices, each one a reviewable unit instead of a giant patch. Across the project that adds up to roughly 178 commits and over 100 merged PRs, most of them issue-scoped and review-gated, including Copilot/Codex review comments addressed before merge.

That is the part I liked. The work was AFK-friendly: I could grill a problem against docs, turn it into a PRD, split it into ready-for-agent issues, and let agents implement slices that came back as small PRs I could land or bounce. The loop kept the boring upkeep moving without me sitting in the chair for every step.

The proof is the same as it was for gobankcli: the system stayed inside its constraints — read-only, local-first — and the history is legible. The difference in June is that I barely drove the keyboard. I drove the queue.

Smaller Changes Worth Noting

  • Live-first GitHub cache. A gh wrapper plus an offline prewarm helper now warm a local cache on normal reads and fall back to cached data only during outages, rate limits, or explicit offline requests. It skips repos I do not maintain so it does not burn cache budget.
  • Setup cleanup. I refreshed my own setup docs and CI, and removed the leftover Peter-specific and Claw/OpenClaw skills and broken symlinks that were never part of my workflow.

A Note On Models: Fable 5

This month was not only about the layer above the model. I also spent real time with Fable 5, until access got blocked for me.

I liked it a lot more than the earlier Opus models. It felt closer to how that lineage worked when Opus 4.5 first landed: better taste, nicer to actually work with, less of the trigger-happy behavior I complained about in May. On design work it looked genuinely promising, and it was strong on data analysis too — the kinds of work where I already reach outside Codex.

So why am I still mainly in GPT-5.5 for the coding loop? Partly inertia in the good sense: 5.5 is already wired into agent-scripts, autoreview, and the maintainer loop, and that whole layer is the point. But partly the benchmarks. The new DeepSWE benchmark — long-horizon, multi-file tasks from real repos that are deliberately kept out of training corpora — landed credibly enough that Artificial Analysis swapped it into their Coding Agent Index. On that index Fable 5 in Claude Code and GPT-5.5 in Codex sit one point apart at the top — essentially tied for the agentic coding I actually care about. When two models land that close, there is no strong coding reason to rip out a loop that already runs on 5.5.

That is the same stance as May, just with a new contender: I like Fable 5 a lot, especially for design and analysis, and if it pulls clearly ahead on the coding loop I can move that work back. For now, 5.5 stays the daily driver because the layer around it is.

Still The Same Bet

None of this is a model story this month. It is the layer above the model getting tighter: one review skill instead of a Codex-specific one, the discovery-to-issues-to-TDD workflow promoted from experiment to default, and a maintainer loop taking over the upkeep that used to sit on me.

The May bet still holds. Keep the important context in small files and skills, keep it portable, and make the agent prove what it changed before it hands work back. June just automated more of the boring parts.