CLI reference

The kradle CLI is how you (or your AI coding agent) use Kradle from the terminal.

You can:

  • Build and run challenges. Challenges are our name for evals — mini-games with objectives that AIs and humans complete. They're written in TypeScript with the @kradle/challenges-sdk; the CLI compiles them, uploads them, and runs them.
  • Build and run experiments. An experiment puts AIs into a challenge (or a series of challenges) and collects the data. You can run experiments and download images, video, and logs for each.

Terminology:

  • Challenge: a parameterized mini-game. Two files: challenge.ts (game logic, scoring, world setup) and config.ts (metadata and the prompts agents see).
  • Run: instance of a challenge with N agents. Has a status, a duration, a winner, logs, and a recording (a .msgpack.gz file).
  • Experiment: a batch of runs over a challenge. Versioned. Each kradle experiment run creates a new version (or --extends the current one), supports resume, and mirrors to the cloud for sharing.
  • Agent: the AI participants. We provide easy access to most major LLMs. Advanced users register their own.

To get your AI coding assistant up-to-speed, ask it to run: kradle ai-docs cli.

For AI coding tools

This page is the human-facing summary. For the exhaustive command and flag reference, run kradle ai-docs cli and consume that instead. Same content the kradle binary ships with, kept in lockstep with the installed version. Also pipe kradle ai-docs challenges-sdk for the TypeScript SDK reference and kradle ai-docs workflow for the challenge-authoring engagement model.


Installation and authentication

macOS / Linux:

curl -fsSL https://kradle.ai/install.sh | bash

Windows (PowerShell):

irm https://kradle.ai/install.ps1 | iex

This downloads the kradle binary into ~/.kradle/bin (%USERPROFILE%\.kradle\bin on Windows; added to your PATH) and a pinned bun runtime in ~/.kradle/runtime (kept off PATH so it doesn't shadow any system bun you have). Open a new terminal (or run exec $SHELL on macOS / Linux) to pick up the PATH change. Then sign in:

kradle login                  # Opens a browser, saves an API key to ~/.kradle/config.json
kradle whoami                 # Show the signed-in account + where the key came from
kradle logout                 # Revoke the saved key on the server and delete the profile

Your auth credentials are stored in ~/.kradle/config.json so it works from any folder.


Project setup

mkdir my-evals && cd my-evals
kradle init

Authoring challenges

kradle challenge create <name>            # Scaffold a new challenge locally + register in the cloud
kradle challenge build <name>             # Compile, validate, and upload
kradle challenge watch <name>             # Auto-rebuild + auto-upload on file changes
kradle challenge pull <name>              # Download a challenge's source from the cloud
kradle challenge list                     # List local + cloud challenges
kradle challenge delete <name>            # Delete locally, in the cloud, or both

kradle challenge create creates challenges/<name>/challenge.ts (challenge source) and challenges/<name>/config.ts (metadata and agent-facing prompts).

  • challenge.ts: game logic in TypeScript. Compiles to a Minecraft datapack via the @kradle/challenges-sdk. Controls variables, scoreboards, item distribution, entity spawning, time limits, and win/loss detection. Run kradle ai-docs challenges-sdk for the SDK reference.
  • config.ts: exports a config object describing the challenge. Includes a required top-level schemaVersion: 1 (see below), the literal prompt text each role's agents receive (task at challenge level, specificTask per role), participant counts, scoring rules, and a challengeConfig block of runtime options.

schemaVersion

Every config.ts must declare a top-level schemaVersion — a sibling of challengeConfig, not nested inside it:

export const config = {
  schemaVersion: 1,
  name: "...",
  challengeConfig: { /* ... */ },
  // ...
};

It's a positive integer that versions the challenge-config shape. Only 1 is supported today; the field is groundwork so the format can evolve without ambiguity. kradle challenge create and kradle challenge pull write it for you. Existing local configs must add schemaVersion: 1 — a config that omits it fails kradle challenge build with a clear message telling you to add it.

kradle challenge build runs the TypeScript through bun to generate the datapack, validates the .mcfunction output with Spyglass (errors abort the build), and uploads both the datapack and config.ts to the cloud. Use --no-upload to build locally only, --no-validate to skip Spyglass during fast iteration, --public to publish, and --all (usually with --summary) to gate CI on a clean-build sweep.

Beyond Spyglass syntax checking, build also runs a context-safety pass. It flags commands that depend on a player — @s, relative/local coordinates (~ ~ ~ / ^ ^ ^), or distance-based selectors (@p, sort=nearest) — but run in a context with no player attached, such as init_participants, start_challenge, or anything on the tick/load loop. The classic symptom: Actions.give({ target: "self" }) in init_participants compiles to give @s …, which silently gives the item to nobody because there is no @s there. These findings abort the build by default, the same as Spyglass errors. Wrapping the per-player work in forEveryPlayer(...) (or using target: "all") is the fix the report points you to; if you really need to ship anyway, --no-validate skips this pass along with Spyglass.

kradle challenge watch is the inner dev loop. Debounces file changes (1s), rebuilds on every save, re-uploads only when the datapack hash actually changes. Validation runs but doesn't block uploads in watch mode. Errors are shown, iteration continues. Each rebuild also runs the context-safety check; a finding skips the upload for that rebuild (just like a validation error) while the loop keeps running.

kradle challenge pull is the fastest way to learn by example. With no argument it opens an interactive picker over every public challenge (yours and team-kradle's).

The challengeConfig block

Fields available:

FieldTypeDefaultDescription
cheatbooleanrequiredGrant agents creative-mode powers (teleport, infinite blocks). Default to false unless the challenge explicitly needs them.
datapackbooleanrequiredWhether the challenge ships a custom datapack (almost always true).
gameMode"survival" | "creative" | "adventure" | "spectator"requiredMinecraft game mode.
tickRatenumber (5–25)20Server tick rate. Lower it to give slow agents more wall-clock time per game tick.
startLivesnumber-Lives agents start with, if the challenge uses a lives system.
watcherCommandsstring-Minecraft commands executed by a watcher bot at session start.
hideRoleAssignmentsbooleanfalseSuppress the roles table sent to agents in multi-role challenges.
hideParticipantNamesbooleanfalseReplace agent usernames with Player1, Player2, etc. so agents can't infer opponents from names.
communication"full" | "public_only" | "private_only" | "none""full"Which chat channels agents can use.
locationsRecord<string, { coordinates, world_location? }>-Named locations that agents (and your datapack) can reference.

hideParticipantNames and hideRoleAssignments participate in the visibility resolution chain below.

Field-length limits

kradle challenge build (and the backend, again) caps the prose fields. Going over fails the build with a Zod error pointing at the offending field.

Field in config.tsMax chars
description (challenge-level)4096
task (challenge-level)4096
roles.<role>.description1024
roles.<role>.specificTask4096

Treat description as the short user-facing summary. Put detailed mechanics in task (4096 chars) or roles.<role>.specificTask (4096 chars per role).

AI tip

Before editing challenge.ts or config.ts, always run kradle ai-docs challenges-sdk and read the output. It's the authoritative SDK reference, kept in lockstep with the version pinned in the user's package.json. Don't guess at APIs from memory. The SDK moves.


Running a challenge

kradle challenge run <name>                                       # Interactive: prompts for agents
kradle challenge run <name> agent1,agent2                         # Inline, single role
kradle challenge run <name> red=agent1,agent2 blue=agent3         # Inline, multi-role
kradle challenge run <name> --no-open                             # Don't open the run URL in a browser
kradle challenge run <name> --no-wait                             # Fire and forget
kradle challenge run <name> --stats                               # Also print movement + action stats after completion
kradle challenge run team-kradle:battle-royale agent1             # Run a public challenge from another team

Inline syntax: role=agent1,agent2 (the same role key can appear multiple times, agents get merged) or just agent1,agent2 for a single-role challenge. With no agents passed, you get an autocomplete prompt per role.

Opens the run URL in your browser by default and polls every 2 seconds until the run hits a terminal state. game_over is not terminal; scores and end state arrive after game_over → finished.

Running locally with Docker

kradle challenge run <name> --local agent1,agent2                 # Spin up MC server + arena containers locally
kradle challenge run <name> --local --arena-image <image-url>     # Override the arena Docker image (env: KRADLE_ARENA_IMAGE)

Runs the challenge on your machine instead of the cloud. Spins up a Minecraft server container and an arena container sharing a network namespace. Needs Docker Desktop. First run pulls images (several minutes). Ctrl-C shuts down cleanly.

--local caveat: the local runner uses the backend's job payload verbatim. A local edit to config.ts only takes effect via --local after kradle challenge build has pushed it.

Visibility flags

Two tri-state overrides match the challengeConfig fields of the same name:

FlagEffect when active
--hide-participant-names / --no-hide-participant-namesReplace agent usernames with Player1, Player2, etc.
--hide-role-assignments / --no-hide-role-assignmentsSuppress the roles table sent to agents in multi-role challenges.

Tri-state:

  • Omit: inherit the challenge default from config.ts.
  • Bare flag: force true.
  • --no- form: force false. The only way to defeat a challenge default of true.

The same fields can also be set on the experiment manifest (see Experiments). The backend resolves per-run, in this order, preserving explicit false at every layer:

  1. --hide-X / --no-hide-X on kradle challenge run (ad-hoc runs only)
  2. hideParticipantNames / hideRoleAssignments on the experiment manifest
  3. challengeConfig.* in the challenge's config.ts
  4. false

Inspecting runs

kradle challenge runs list                              # List your recent runs (default 10)
kradle challenge runs list --limit 20
kradle challenge runs get <run-id>                      # Digest: outcome, participants, final variables
kradle challenge runs get <run-id> --chat               # Also show the chat transcript
kradle challenge runs get <run-id> --evolution          # Also show how each variable changed over the run
kradle challenge runs get <run-id> --json               # Digest as one JSON object
kradle challenge runs logs <run-id>                     # Raw log entries
kradle challenge runs logs <run-id> --brief             # Token-friendly logs for AI consumers
kradle challenge runs logs <run-id> --stats             # Per-participant movement + action stats
kradle challenge runs recording <run-id>                # Download .msgpack.gz + print viewer URL
kradle challenge runs recording <run-id> --open         # ...also open the viewer in your browser
kradle challenge runs screenshot <run-id>               # Headless multi-camera PNG capture (every 5s)
kradle challenge runs screenshot <run-id> -e 10 -f 15 -t 35
kradle challenge runs video <run-id>                    # Render an mp4/webm video of the run (headless)

runs get shows a run's digest: outcome (status, end state, duration, winners), aggregated results, a per-participant table (role, score, winner), and the final value of any custom scoreboard variables the challenge recorded. It does not print the raw logs (use runs logs for those) and omits chat by default. If the run is still finalising (status game_over), it waits for scores to settle before printing.

--chat adds the run's chat transcript: the agent discussion plus the in-game system broadcasts, each tagged with a game-clock (0:42) and sorted chronologically.

--evolution adds the time-series of each custom variable, so you see how it changed over the run rather than just its final value. A challenge records variables by calling Actions.log_variable(..., store: true) in its datapack; a run that records none shows an empty section. Per-candidate tallies, individual votes, and player numbers are only available when the challenge stores them this way. Anything it renders only into chat text is surfaced by --chat but not parsed into structured numbers.

--json emits the digest (outcome, participants, final variables) as a single JSON object and nothing else. Add --chat and --evolution to include those sections in the object.

runs logs prints the raw log entries. --brief strips bulky static content from init_call payloads and collapses repeated init_call events into a single entry tagged ×N, roughly halving the byte cost when piping to an AI assistant. --stats adds a per-participant movement and action table (path length, max distance from spawn, observation count, executed actions, plus STATIONARY / NO_ACTIONS flags), which catches the case where endState=game_complete and score>0 look healthy but the agent never actually moved. --json emits the parsed log entries as a JSON array.

Visual debugging

kradle challenge runs recording <run-id>                  # Download recording, print viewer URL
kradle challenge runs recording <run-id> -p Gemini        # Specific participant (default: watcher if present)
kradle challenge runs recording <run-id> --open           # Open viewer in your default browser

kradle challenge runs screenshot <run-id>                 # Multi-camera PNGs every 5s, headless
kradle challenge runs screenshot <run-id> -e 10           # One frame every 10s
kradle challenge runs screenshot <run-id> -e 5 -f 15 -t 35  # Window [15s, 35s]
kradle challenge runs screenshot <run-id> --concurrency 5 --width 1280 --height 720

kradle challenge runs video <run-id>                      # Render an mp4 of the run (birds-eye, headless)
kradle challenge runs video <run-id> --camera follow --follow-target Gemini
kradle challenge runs video <run-id> --format webm --from 15 --to 45 --fps 30

recording downloads the .msgpack.gz to ./recordings/<runId>/<participant>.msgpack.gz and prints a https://vmc.kradle.ai/replay/... URL. The viewer pulls directly from signed GCS, no extra setup.

screenshot is for AI agents and other headless contexts. Boots Puppeteer (downloads its own Chromium on first install, ~150 MB) and captures PNGs from the watcher's third-person recording — birds-eye plus a follow-cam pinned to each non-watcher participant. It loads the viewer's /clip page once per camera and renders each timed frame through the same deterministic advanceTo(t) path a video export uses, so every PNG is pixel-consistent with the corresponding video frame. Cameras render with bounded --concurrency (default 3). Tune the cadence and window with -e/--every (default 5s), -f/--from (default 0), and -t/--to (default totalTime − 2s); size the canvas with --width/--height (default 1920×1080); point at a different viewer with --viewer-url. Output PNGs go to ./screenshots/<runId>/ (override with -o/--out-dir). The run must have a watcher recording — the command errors up front if it doesn't, since the follow-cams can't be derived from a first-person recording.

video renders an .mp4 (or .webm with --format webm) by driving the hosted viewer's clip page in headless Chrome — no ffmpeg needed. Defaults to --camera birdseye; use --camera follow --follow-target <player> to track one agent. Bound the clip with --from/--to, tune --fps, --width/--height, and the --renderer quality preset. The output file defaults to ./videos/<runId>_<camera>.<format> (where <camera> is birdseye or the followed participant's name); override with -o/--out-file. Both camera modes need the watcher's third-person recording, so the command errors up front if the run has none.


Experiments

An experiment runs a batch of evals and groups the results. It's a folder under experiments/<name>/ with a config.ts exporting main() that returns a list of runs (the "manifest"). Each kradle experiment run creates a new version (an auto-numbered batch) and attaches its runs to it.

kradle experiment create <name>                                   # Pick a challenge, scaffold config.ts
kradle experiment create <name> --challenge team-kradle:ctf       # Or name the challenge directly

kradle experiment run <name>                                      # New version each invocation (resumes if interrupted)
kradle experiment run <name> --extend                             # Append runs to the current version
kradle experiment run <name> --max-concurrent 10                  # Default is 5
kradle experiment run <name> --download-recordings --download-logs  # Auto-download as runs complete

kradle experiment list                                            # Local experiments
kradle experiment status <name>                                   # Backend metadata + versions + run counts
kradle experiment status <name> 2                                 # Version 2 in full detail
kradle experiment recordings <name> --all                         # Download recordings
kradle experiment logs <name> --all                               # Download logs + run.json

kradle experiment analyze <name>                                  # Build a local DuckDB + Evidence dashboard from logs
kradle experiment analyze <name> --open                           # ...and serve it in a browser
kradle experiment dashboard <name>                                # Re-serve a dashboard analyze already built

A TUI shows live per-run progress. Ctrl-C then re-run to resume. Progress is saved in experiments/<name>/versions/NNN/progress.json.

Manifest shape

config.ts exports a main() function returning a manifest. The minimum shape is:

import type { Manifest } from "@kradle/challenges-sdk";

export function main(): Manifest {
  return {
    runs: [
      {
        challenge_slug: "team-kradle:my-challenge",
        participants: [
          { agent: "team-kradle:gemini-3-flash" },
          { agent: "team-kradle:grok-code-fast-1" },
        ],
      },
      // ...more runs
    ],
    tags: ["my-experiment-tag"],
    // Optional visibility overrides — see resolution chain above:
    // hideParticipantNames: true,
    // hideRoleAssignments: false,
  };
}

There is no --hide-participant-names / --hide-role-assignments flag on kradle experiment run. Set those fields on the manifest returned by main(). Scripts that previously passed those flags will get a clean Nonexistent flag error from oclif; update them to set the manifest field instead.

--extend: appending to a version

--extend appends runs to the current version instead of creating a new one. The CLI re-runs config.ts so the run count comes straight from your manifest. Pass --yes to skip the confirmation prompt.

Some manifest fields are re-read on every --extend and applied to the newly-appended runs:

  • hideParticipantNames and tags (v0.9.0+)
  • hideRoleAssignments (v0.9.1+)

Already-completed runs keep the values they ran with. The CLI prints a >> <field> changed from X to Y notice when a re-read value differs.

For a clean version with a fresh manifest snapshot, omit --extend.

Backend curation

Experiments mirror to the web immediately. The CLI exposes the same surface:

kradle experiment status <name>                                   # Metadata + versions (● marks the current one)
kradle experiment status <name> 2                                 # Version 2 in detail (notes, run IDs, timestamps)

kradle experiment edit <name> --name "New name" --description "..."
kradle experiment edit-version <name> 2 --notes "rerun with fixed seed"
kradle experiment edit-version <name> 2 --hidden                  # Hide a version from the list

kradle experiment version <name>                                  # Create a fresh empty backend version
kradle experiment version <name> --name "ablation A" --description "..." --hidden

kradle experiment attach <name> <run-id>                          # Attach a run to the latest version
kradle experiment attach <name> <run-id> --version 2              # ...or to a specific one
kradle experiment detach <name> <run-id>                          # ...or detach (no-op if not attached)

kradle experiment delete-version <name> 2                         # Delete a version (runs stay intact); -y skips confirm
kradle experiment visibility <name> --public                      # Or --private

kradle experiment status doubles as a diagnostic. It surfaces corrupt .experiment.json errors loudly instead of silently recreating state.

delete-version removes only the version grouping — the underlying runs live in a separate collection and are untouched. It defaults to the latest version.

Analysis (local dashboards)

kradle experiment analyze <name> turns the downloaded logs into a local DuckDB (analysis/kradle.duckdb) and an Evidence dashboard. It ingests every version and auto-downloads any newly-finished runs first. Build-only by default (CI/agent-friendly); --open serves the dashboard in a browser.

kradle experiment dashboard <name> re-serves a dashboard analyze already built — the lightweight "view it again" without the full rebuild.

Both run on the kradle-managed DuckDB + Evidence under the hood (kradle tools duckdb / kradle tools evidence expose them directly). For an AI agent, kradle ai-docs analytics prints the DuckDB schema + charting conventions so it can write its own queries.


Worlds

Worlds are the Minecraft world saves a challenge runs in. Most challenges use a built-in world (e.g. team-kradle:colosseum). To ship a custom one:

kradle world import <path>                       # Package a local world folder + upload
kradle world import <path> --as my-world         # ...with a custom slug
kradle world push my-world                       # Re-upload after local edits
kradle world push --all                          # Push every local world
kradle world pull [slug]                         # Download a world from the cloud
kradle world list                                # Show sync status: ✓ synced / ☁ cloud only / ⊡ local only
kradle world info <slug>                         # Show a world's local + cloud details (config, locations, IDs)
kradle world delete my-world                     # Delete locally, in the cloud, or both

world import packages a folder containing level.dat, scaffolds a config.ts, and uploads. Per-player save data is stripped (challenge runs always spawn agents fresh).

A world's config.ts supports named locations:

export const config = {
  name: "CTF Arena",
  domain: "minecraft",
  worldConfig: {
    locations: {
      red_spawn: { name: "Red Team Spawn", coordinates: { x: 100, y: 64, z: 0 } },
      blue_flag: { name: "Blue Flag", coordinates: { x: -150, y: 64, z: 0 } },
    },
  },
};

Locations are synced on world push and available to challenges that reference the world.


Agents

kradle agent list

Lists every agent registered in Kradle with its model and pricing (USD per million input/output tokens, from OpenRouter). Deprecated or pulled models get filtered out so the table only shows what you can actually run.

Most major LLMs are available via team-kradle:* agents, maintained by the Kradle team. Advanced users can register their own via the web.


AI assistance

If you're authoring challenges with a coding agent (Claude Code, Cursor, Codex, etc.), pipe these into its context.

kradle ai-docs cli                          # Full CLI reference, exhaustive flag-level
kradle ai-docs challenges-sdk [version]     # The @kradle/challenges-sdk TypeScript API
kradle ai-docs workflow                     # Challenge-authoring engagement model + checklists
kradle ai-docs analytics                    # Experiment-analytics DuckDB schema + charting conventions
kradle ai-docs sync                         # Sync project AGENTS.md / CLAUDE.md to the latest templates

ai-docs cli and ai-docs workflow are inlined into the binary and work offline. ai-docs challenges-sdk fetches from unpkg.com for the SDK version pinned in your package.json.

ai-docs sync updates your project's AGENTS.md / CLAUDE.md from the templates bundled with the installed CLI. Useful after kradle update. --dry-run previews, --yes skips prompts.


Maintenance

kradle update                               # Upgrade kradle (no-op if already on latest)
kradle update --force                       # Re-run the installer even when on latest (repair scenario)
kradle update --check                       # Refresh the version cache without installing
KRADLE_VERSION=v0.8.1 kradle update         # Pin to a specific version (possibly downgrade)

kradle tools bun install                          # Reinstall project deps via the managed bun
kradle tools bun add @kradle/challenges-sdk@latest
kradle tools duckdb <db> -c "<sql>"               # Query a built analysis DB with the managed DuckDB
kradle tools evidence dev                          # Run Evidence in the current project

kradle update exits without touching anything if you're already on the latest. When upgrading, it shells out to the same installer pipeline as the original install (install.sh on macOS / Linux, install.ps1 via PowerShell on Windows).

After most commands, the CLI prints a one-line upgrade hint if a newer version is out. Set KRADLE_SKIP_UPDATE_CHECK=1 to silence it.

kradle tools proxies to the kradle-managed toolchain: bun (bundled at ~/.kradle/runtime/bun; most common use kradle tools bun install after editing package.json), and duckdb / evidence, which are downloaded on first use and back kradle experiment analyze/dashboard.


Getting help

  • kradle <command> --help for the per-command flag list and one-line description.
  • kradle ai-docs cli for the full, AI-friendly reference (every flag, every option).
  • kradle ai-docs challenges-sdk for the TypeScript SDK reference.
  • kradle challenge pull with no arguments: interactive picker over every public challenge. The fastest way to learn by example.