Runcastle

Refactor with Adversarial Review

MIT0 downloads

by runcastle

v1.0.1

A Claude Code agent performs a refactor, a hard `npm test` gate proves nothing broke, then a Codex reviewer adversarially audits the change on the same warm branch — hunting regressions, weak tests, and shortcuts — fixing and looping until it approves.

Topology

Disclosures

Disclosures — declared side-effect surface

Everything below runs on your machine or inside the sandbox when you use this workflow. Mismatches between these declarations and the actual code block publishing.

Host hooks

Commands executed on YOUR host machine by Sandcastle lifecycle hooks.

None declared.

Sandbox hooks

Commands executed inside the sandbox container.

  • npm install

Network access

None. Both agents operate only on the local repository inside the sandbox; the sandbox hook runs `npm install` against your declared package registry.

Shell expansion

No shell-expansion blocks in prompt files.

Files

Diff vs the stock Sandcastle 0.12.0 template Dockerfile — green lines were added by the author, red lines were removed from stock.

+# Sandbox image for the Refactor with Adversarial Review workflow.
+# Installs both agent CLIs — Claude Code (implementer) and Codex (reviewer) —
+# on the stock Sandcastle base, running as a non-root `agent` user.
FROM node:22-bookworm
# System dependencies.
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
curl \
jq \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
-# Claude Code CLI (the agent runtime).
-RUN npm install -g @anthropic-ai/claude-code
+# Agent CLIs: Claude Code (implementer) and Codex (reviewer).
+RUN npm install -g @anthropic-ai/claude-code @openai/codex
# Non-root agent user. `sandcastle docker build-image` aligns AGENT_UID/GID to
# the host user via --build-arg to avoid permission errors on bind mounts.
+# node:22-bookworm already ships a "node" user at UID/GID 1000, so we RENAME it
+# (the stock Sandcastle template pattern) — groupadd/useradd would collide with
+# the existing IDs on a default build.
ARG AGENT_UID=1000
ARG AGENT_GID=1000
-RUN groupadd --gid ${AGENT_GID} agent \
- && useradd --uid ${AGENT_UID} --gid ${AGENT_GID} --create-home --shell /bin/bash agent
+RUN groupmod -o -g ${AGENT_GID} node \
+ && usermod -o -u ${AGENT_UID} -g ${AGENT_GID} -d /home/agent -m -l agent node
-USER agent
-WORKDIR /workspace
+USER ${AGENT_UID}:${AGENT_GID}
+WORKDIR /home/agent
+
+# Sandcastle bind-mounts the worktree and sets the working directory at
+# container start; the container just needs to stay alive until then.
+ENTRYPOINT ["sleep", "infinity"]
Show full Dockerfile (highlighted)
# Sandbox image for the Refactor with Adversarial Review workflow.
# Installs both agent CLIs — Claude Code (implementer) and Codex (reviewer) —
# on the stock Sandcastle base, running as a non-root `agent` user.
FROM node:22-bookworm

# System dependencies.
RUN apt-get update && apt-get install -y --no-install-recommends \
      git \
      curl \
      jq \
      ca-certificates \
 && rm -rf /var/lib/apt/lists/*

# Agent CLIs: Claude Code (implementer) and Codex (reviewer).
RUN npm install -g @anthropic-ai/claude-code @openai/codex

# Non-root agent user. `sandcastle docker build-image` aligns AGENT_UID/GID to
# the host user via --build-arg to avoid permission errors on bind mounts.
# node:22-bookworm already ships a "node" user at UID/GID 1000, so we RENAME it
# (the stock Sandcastle template pattern) — groupadd/useradd would collide with
# the existing IDs on a default build.
ARG AGENT_UID=1000
ARG AGENT_GID=1000
RUN groupmod -o -g ${AGENT_GID} node \
 && usermod -o -u ${AGENT_UID} -g ${AGENT_GID} -d /home/agent -m -l agent node

USER ${AGENT_UID}:${AGENT_GID}
WORKDIR /home/agent

# Sandcastle bind-mounts the worktree and sets the working directory at
# container start; the container just needs to stay alive until then.
ENTRYPOINT ["sleep", "infinity"]

README

Refactor with Adversarial Review

Let one model do the refactor and a different model try to tear it apart. This workflow pairs a Claude Code implementer with a Codex reviewer so your structural changes survive a genuinely independent second opinion — not the same model rubber-stamping its own work.

What it does

Refactors are risky precisely because they are supposed to change nothing observable. This pipeline enforces that discipline. A Claude Code agent performs a behaviour-preserving refactor (from your REFACTOR.md brief, or its own pick of the worst-structured module), a hard npm test gate proves the suite is still green, and then a Codex reviewer adversarially audits every commit — hunting behaviour drift, weak tests, regressions, and shortcuts. The reviewer fixes what it finds directly on the branch and only emits <promise>APPROVED</promise> once it is genuinely satisfied.

How it works

main.ts opens one warm Docker sandbox with createSandbox() and installs dependencies once via an onSandboxReady hook. Three steps run on the same warm branch: (1) the implementer (claude-sonnet-4-6) refactors over up to five iterations; (2) a hard sandbox.exec("npm test") gate aborts loudly if the tree is red, so the reviewer never audits broken code; (3) the reviewer (gpt-5.4 via Codex) loops for up to five passes, fixing and re-testing, until it approves. Using two providers means the critique is truly independent.

The topology is install → refactor → verify → review → (fix loop back to implement) → approved.

Requirements

Set both CLAUDE_CODE_OAUTH_TOKEN (run claude setup-token) and OPENAI_API_KEY in .sandcastle/.env. Your repo should expose a working npm test. Optionally add a REFACTOR.md describing the exact change you want. Build the image once with npx @ai-hero/sandcastle docker build-image, then run the workflow with npx tsx .sandcastle/main.ts.