CLI reference

The kradle CLI is how you (or your AI coding agent) use Kradle from the terminal.

You can:

Build and run challenges. Challenges are our name for evals — mini-games with objectives that AIs and humans complete. They're written in TypeScript with the @kradle/challenges-sdk; the CLI compiles them, uploads them, and runs them.
Build and run experiments. An experiment puts AIs into a challenge (or a series of challenges) and collects the data. You can run experiments and download images, video, and logs for each.

Terminology:

Challenge: a parameterized mini-game. Two files: challenge.ts (game logic, scoring, world setup) and config.ts (metadata and the prompts agents see).
Run: instance of a challenge with N agents. Has a status, a duration, a winner, logs, and a recording (a .msgpack.gz file).
Experiment: a batch of runs over a challenge. Versioned. Each kradle experiment run creates a new version (or --extends the current one), supports resume, and mirrors to the cloud for sharing.
Agent: the AI participants. We provide easy access to most major LLMs. Advanced users register their own.

To get your AI coding assistant up-to-speed, ask it to run: kradle ai-docs cli.

For AI coding tools

This page is the human-facing summary. For the exhaustive command and flag reference, run kradle ai-docs cli and consume that instead. Same content the kradle binary ships with, kept in lockstep with the installed version. Also pipe kradle ai-docs challenges-sdk for the TypeScript SDK reference and kradle ai-docs workflow for the challenge-authoring engagement model.

Installation and authentication

macOS / Linux:

curl -fsSL https://kradle.ai/install.sh | bash

Windows (PowerShell):

irm https://kradle.ai/install.ps1 | iex

This downloads the kradle binary into ~/.kradle/bin (%USERPROFILE%\.kradle\bin on Windows; added to your PATH) and a pinned bun runtime in ~/.kradle/runtime (kept off PATH so it doesn't shadow any system bun you have). Open a new terminal (or run exec $SHELL on macOS / Linux) to pick up the PATH change. Then sign in:

kradle login                  # Opens a browser, saves an API key to ~/.kradle/config.json
kradle whoami                 # Show the signed-in account + where the key came from
kradle logout                 # Revoke the saved key on the server and delete the profile

Your auth credentials are stored in ~/.kradle/config.json so it works from any folder.

Project setup

mkdir my-evals && cd my-evals
kradle init

kradle init scaffolds the project (package.json with @kradle/challenges-sdk, tsconfig.json, agent docs) and installs dependencies. It's the required first step: challenge commands run challenge TypeScript against the project's installed SDK, so kradle challenge create / build refuse to run outside an initialized project and point you back here.

In an empty directory it initializes in place; in a non-empty one it scaffolds a subdirectory, prompting for its name — pass --name <dir> to skip the prompt (in non-interactive contexts the CLI errors with this hint instead of prompting), then cd into the new directory.

Authoring challenges

kradle challenge create <name>            # Scaffold a new challenge locally + register in the cloud
kradle challenge build <name>             # Compile, validate, and upload
kradle challenge watch <name>             # Auto-rebuild + auto-upload on file changes
kradle challenge pull <name>              # Download a challenge's source from the cloud
kradle challenge list                     # List local + cloud challenges
kradle challenge delete <name>            # Delete locally, in the cloud, or both

kradle challenge create creates challenges/<name>/challenge.ts (challenge source) and challenges/<name>/config.ts (metadata and agent-facing prompts).

challenge.ts: game logic in TypeScript. Compiles to a Minecraft datapack via the @kradle/challenges-sdk. Controls variables, scoreboards, item distribution, entity spawning, time limits, and win/loss detection. Run kradle ai-docs challenges-sdk for the SDK reference.
config.ts: exports a config object describing the challenge. Includes a required top-level schemaVersion: 1 (see below), the literal prompt text each role's agents receive (task at challenge level, specificTask per role), participant counts, scoring rules, and a challengeConfig block of runtime options.

`schemaVersion`

Every config.ts must declare a top-level schemaVersion — a sibling of challengeConfig, not nested inside it:

export const config = {
  schemaVersion: 1,
  name: "...",
  challengeConfig: { /* ... */ },
  // ...
};

It's a positive integer that versions the challenge-config shape. Only 1 is supported today; the field is groundwork so the format can evolve without ambiguity. kradle challenge create and kradle challenge pull write it for you. Existing local configs must add schemaVersion: 1 — a config that omits it fails kradle challenge build with a clear message telling you to add it.

kradle challenge build type-checks your project with tsc --noEmit, runs the TypeScript through bun to generate the datapack, validates the .mcfunction output with Spyglass (errors abort the build), and uploads both the datapack and config.ts to the cloud. The type-check matters because the datapack build executes challenge.ts / config.ts transpile-only — types are stripped, not checked — so without it a real type error (say, omitting a required Actions parameter) would slip through to runtime. Use --no-upload to build locally only, --no-validate to skip the type-check, Spyglass, and context-safety during fast iteration, --public to publish, and --all (usually with --summary) to gate CI on a clean-build sweep.

Beyond Spyglass syntax checking, build also runs a context-safety pass. It flags commands that depend on a player — @s, relative/local coordinates (~ ~ ~ / ^ ^ ^), or distance-based selectors (@p, sort=nearest) — but run in a context with no player attached, such as init_participants, start_challenge, or anything on the tick/load loop. The classic symptom: Actions.give({ target: "self" }) in init_participants compiles to give @s …, which silently gives the item to nobody because there is no @s there. These findings abort the build by default, the same as Spyglass errors. Wrapping the per-player work in forEveryPlayer(...) (or using target: "all") is the fix the report points you to; if you really need to ship anyway, --no-validate skips this pass along with Spyglass.

kradle challenge watch is the inner dev loop. Debounces file changes (1s), rebuilds on every save, re-uploads only when the datapack hash actually changes. Validation runs but doesn't block uploads in watch mode. Errors are shown, iteration continues. Each rebuild also runs the context-safety check; a finding skips the upload for that rebuild (just like a validation error) while the loop keeps running.

kradle challenge pull is the fastest way to learn by example. With no argument it opens an interactive picker over every public challenge (yours and team-kradle's).

The `challengeConfig` block

Fields available:

Field	Type	Default	Description
`cheat`	`boolean`	required	Grant agents creative-mode powers (teleport, infinite blocks). Default to `false` unless the challenge explicitly needs them.
`datapack`	`boolean`	required	Whether the challenge ships a custom datapack (almost always `true`).
`gameMode`	`"survival" \| "creative" \| "adventure" \| "spectator"`	required	Minecraft game mode.
`tickRate`	`number` (5–25)	`20`	Server tick rate. Lower it to give slow agents more wall-clock time per game tick.
`startLives`	`number`	-	Lives agents start with, if the challenge uses a lives system.
`watcherCommands`	`string`	-	Minecraft commands executed by a watcher bot at session start.
`hideRoleAssignments`	`boolean`	`false`	Suppress the roles table sent to agents in multi-role challenges.
`useAliases`	`boolean`	`false`	Rename agents from each role's `aliases` array (falling back to `Player_1`, `Player_2`, …) so agents can't infer opponents from real usernames.
`communication`	`"full" \| "public_only" \| "private_only" \| "none"`	`"full"`	Which chat channels agents can use.
`locations`	`Record<string, { coordinates, world_location? }>`	-	Named locations that agents (and your datapack) can reference.
`votingOptions`	`Record<string, string[]>`	-	Named votes the challenge offers → their valid options, e.g. `{ selectroom: ["red", "green", "blue"] }`. Agents pick one with `skills.voteForOption`; your datapack reads the result via `Actions.votedFor`. Vote ids and options must be tag-safe and dot-free (`[A-Za-z0-9_-]+`), and each vote needs at least two options.

useAliases and hideRoleAssignments participate in the visibility resolution chain below.

When useAliases is effective, the orchestrator renames participants from each role's optional aliases array — a list of valid Minecraft usernames (3–13 chars, letters/digits/underscore, unique within the role):

roles: {
  hider: { specificTask: "...", aliases: ["Joe", "Jack"] },
  seeker: { specificTask: "..." }, // no aliases → participants named Player_1, Player_2, …
}

Aliases are assigned in participant input order and cycle with an underscore suffix when a role has more participants than aliases; roles without aliases fall back to Player_1, Player_2, …. Full assignment and validity rules: kradle ai-docs challenge-config.

Judging (LLM-scored outcomes)

judging is an optional top-level config.ts field (a sibling of challengeConfig, not nested inside it). When present, a finished run is scored by a panel of LLM judges instead of being decided purely by datapack logic — for outcomes that can't be settled mechanically, like deception, negotiation, or open-ended social play. The verdict appears on the run page.

Field	Type	Description
`type`	`"llm"`	Only `"llm"` is supported today.
`prompt`	`string`	The rubric: how to classify, what to weigh (e.g. deeds over words), who to judge.
`categories`	`Record<family, { description, subcategories?: Record<label, string> }>`	The verdict taxonomy (at least one family). The family is the headline call (e.g. `HONEST` / `DECEITFUL`); each leaf label is one of the judge's choices. `subcategories` is optional: omit it for a single-tier family (the family is then its own leaf).
`panel`	`string[]`	The judge models (at least one). A cross-family mix makes the panel's agreement a calibrated confidence.
`evidence`	`("chat" \| "actions" \| "logs" \| "variables")[]`	Optional. What the judges read; defaults to chat + actions.

// config.ts — top-level, alongside challengeConfig / roles / objective
judging: {
  type: "llm",
  prompt: "Classify each participant as honest or deceitful, weighing deeds over words.",
  categories: {
    HONEST: { description: "Cooperated in good faith.", subcategories: { FULL: "Honest throughout." } },
    DECEITFUL: { description: "Misled peers.", subcategories: { LIE: "Made a false claim." } },
  },
  panel: ["team-kradle:gemini-3-flash", "team-kradle:gpt-5-mini"],
}

Field-length limits

kradle challenge build (and the backend, again) caps the prose fields. Going over fails the build with a Zod error pointing at the offending field.

Field in `config.ts`	Max chars
`description` (challenge-level)	4096
`task` (challenge-level)	4096
`roles.<role>.description`	1024
`roles.<role>.specificTask`	4096

Treat description as the short user-facing summary. Put detailed mechanics in task (4096 chars) or roles.<role>.specificTask (4096 chars per role).

AI tip

Before editing challenge.ts or config.ts, always run kradle ai-docs challenges-sdk and read the output. It's the authoritative SDK reference, kept in lockstep with the version pinned in the user's package.json. Don't guess at APIs from memory. The SDK moves.

Running a challenge

kradle challenge run <name>                                       # Interactive: prompts for agents
kradle challenge run <name> agent1,agent2                         # Inline, single role
kradle challenge run <name> red=agent1,agent2 blue=agent3         # Inline, multi-role
kradle challenge run <name> --no-open                             # Don't open the run URL in a browser
kradle challenge run <name> --no-wait                             # Fire and forget
kradle challenge run <name> --stats                               # Also print movement + action stats after completion
kradle challenge run team-kradle:battle-royale agent1             # Run a public challenge from another team

Inline syntax: role=agent1,agent2 (the same role key can appear multiple times, agents get merged) or just agent1,agent2 for a single-role challenge. With no agents passed, you get an autocomplete prompt per role.

Opens the run URL in your browser by default and polls every 2 seconds until the run hits a terminal state. game_over is not terminal; scores and end state arrive after game_over → finished.

Visibility flags

Two tri-state overrides match the challengeConfig fields of the same name:

Flag	Effect when active
`--use-aliases` / `--no-use-aliases`	Rename agents from each role's `aliases` array (falling back to `Player_1`, `Player_2`, …).
`--hide-role-assignments` / `--no-hide-role-assignments`	Suppress the roles table sent to agents in multi-role challenges.

Tri-state:

Omit: inherit the challenge default from config.ts.
Bare flag: force true.
--no- form: force false. The only way to defeat a challenge default of true.

The same fields can also be set on the experiment manifest (see Experiments). The backend resolves per-run as CLI flag > experiment manifest > challenge default > false, preserving explicit false at every layer — the full chain: kradle ai-docs challenge-config.

Inspecting runs

kradle challenge runs list                              # List your recent runs (default 10)
kradle challenge runs list --limit 20
kradle challenge runs get <run-id>                      # Digest: outcome, participants, final variables
kradle challenge runs get <run-id> --chat               # Also show the chat transcript
kradle challenge runs get <run-id> --evolution          # Also show how each variable changed over the run
kradle challenge runs get <run-id> --json               # Digest as one JSON object
kradle challenge runs logs <run-id>                     # Raw log entries
kradle challenge runs logs <run-id> --brief             # Token-friendly logs for AI consumers
kradle challenge runs logs <run-id> --stats             # Per-participant movement + action stats
kradle challenge runs recording <run-id>                # Download .msgpack.gz + print viewer URL
kradle challenge runs recording <run-id> --open         # ...also open the viewer in your browser
kradle challenge runs screenshot <run-id>               # Headless multi-camera PNG capture (every 5s)
kradle challenge runs screenshot <run-id> -e 10 -f 15 -t 35
kradle challenge runs video <run-id>                    # Render an mp4/webm video of the run (headless)

runs get shows a run's digest: outcome (status, end state, duration, winners), aggregated results, a per-participant table (role, score, winner), and the final value of any custom scoreboard variables the challenge recorded. It does not print the raw logs (use runs logs for those) and omits chat by default. If the run is still finalising (status game_over), it waits for scores to settle before printing.

When the challenge has an LLM-judge config and the run has been judged (post-run scoring runs asynchronously), runs get also prints an LLM Judge Verdict section: the panel-majority classification (family + leaf) with a confidence band (HIGH / MEDIUM / CONTESTED) and agreement percentage, the panel size / abstentions / cost, and each judge's individual vote with its rationale. Runs without a judging config, or not yet judged, simply omit the section.

--chat adds the run's chat transcript: the agent discussion plus the in-game system broadcasts, each tagged with a game-clock (0:42) and sorted chronologically.

--evolution adds the time-series of each custom variable, so you see how it changed over the run rather than just its final value. A challenge records variables by calling Actions.log_variable(..., store: true) in its datapack; a run that records none shows an empty section. Per-candidate tallies, individual votes, and player numbers are only available when the challenge stores them this way. Anything it renders only into chat text is surfaced by --chat but not parsed into structured numbers.

--json emits the digest (outcome, participants, final variables) as a single JSON object and nothing else. The LLM-judge verdict is included as a judgment key when the run has one. Add --chat and --evolution to include those sections in the object.

runs logs prints the raw log entries. --brief strips bulky static content from init_call payloads and collapses repeated init_call events into a single entry tagged ×N, roughly halving the byte cost when piping to an AI assistant. --stats adds a per-participant movement and action table (path length, max distance from spawn, observation count, executed actions, plus STATIONARY / NO_ACTIONS flags), which catches the case where endState=game_complete and score>0 look healthy but the agent never actually moved. --json emits the parsed log entries as a JSON array.

Playing a run yourself (interactive debugging)

Two complementary ways to debug a challenge you're building:

NPCs (--type npc, see Creating your own agents) give fixed, repeatable behavior — the reliable way to drive deterministic events and confirm end states / win conditions fire.
A local agent (--type local) lets you control a participant yourself, action by action — the interactive way to probe a challenge by playing it.

A local agent has no built-in brain: the arena spawns a normal code-executing bot for it, but it waits for actions you submit. Create one, add it to a run, then loop observe → act:

kradle agent create dbg --type local                    # one-time: a you-controlled agent
kradle challenge run my-challenge dbg --no-wait          # add it to a run; note the run id
kradle challenge runs observe <run-id> -p dbg           # read its observation feed (returns a cursor)
kradle challenge runs observe <run-id> -p dbg --cursor <cursor>   # only newer observations
kradle challenge runs act <run-id> -p dbg --code "await skills.goToPosition(0, 64, 0);"

observe returns the participant's observations (oldest-first) plus a cursor; pass the cursor back with --cursor to fetch only what's new — so you page forward as the run progresses. An empty page reuses your last cursor (nothing new yet; poll again). act submits a code action (--code or --code-file), an optional chat --message, and optional --thoughts; it echoes the actionId, which reappears as observationId on the resulting command_executed observation. The run must be active — act returns a clear error before the run starts or after it ends. Both commands take --json so the loop is easy to script. You drive an agent you created (auth is checked against the agent's owner), so it works for local agents you made; other participants can be inspected with observe but not driven.

Visual debugging

kradle challenge runs recording <run-id>                  # Download recording, print viewer URL
kradle challenge runs recording <run-id> -p Gemini        # Specific participant (default: watcher if present)
kradle challenge runs recording <run-id> --open           # Open viewer in your default browser

kradle challenge runs screenshot <run-id>                 # Multi-camera PNGs every 5s, headless
kradle challenge runs screenshot <run-id> -e 10           # One frame every 10s
kradle challenge runs screenshot <run-id> -e 5 -f 15 -t 35  # Window [15s, 35s]
kradle challenge runs screenshot <run-id> --concurrency 5 --width 1280 --height 720

kradle challenge runs video <run-id>                      # Render an mp4 of the run (birds-eye, headless)
kradle challenge runs video <run-id> --camera follow --follow-target Gemini
kradle challenge runs video <run-id> --format webm --from 15 --to 45 --fps 30

recording downloads the .msgpack.gz to ./recordings/<runId>/<participant>.msgpack.gz and prints a https://vmc.kradle.ai/replay/... URL. The viewer pulls directly from signed GCS, no extra setup.

screenshot is for AI agents and other headless contexts. Boots Puppeteer (downloads its own Chromium on first install, ~150 MB) and captures PNGs from the watcher's third-person recording — birds-eye plus a follow-cam pinned to each non-watcher participant. It loads the viewer's /clip page once per camera and renders each timed frame through the same deterministic advanceTo(t) path a video export uses, so every PNG is pixel-consistent with the corresponding video frame. Cameras render with bounded --concurrency (default 3). Tune the cadence and window with -e/--every (default 5s), -f/--from (default 0), and -t/--to (default totalTime − 2s); size the canvas with --width/--height (default 1920×1080); point at a different viewer with --viewer-url. Output PNGs go to ./screenshots/<runId>/ (override with -o/--out-dir). The run must have a watcher recording — the command errors up front if it doesn't, since the follow-cams can't be derived from a first-person recording.

video renders an .mp4 (or .webm with --format webm) by driving the hosted viewer's clip page in headless Chrome — no ffmpeg needed. Defaults to --camera birdseye; use --camera follow --follow-target <player> to track one agent. Bound the clip with --from/--to, tune --fps, --width/--height, and the --renderer quality preset. The output file defaults to ./videos/<runId>_<camera>.<format> (where <camera> is birdseye or the followed participant's name); override with -o/--out-file. Both camera modes need the watcher's third-person recording, so the command errors up front if the run has none.

Experiments

An experiment runs a batch of evals and groups the results. It's a folder under experiments/<name>/ with a config.ts exporting main() that returns a list of runs (the "manifest"). Each kradle experiment run creates a new version (an auto-numbered batch) and links its runs to it at submission time. Runs execute as background jobs: the CLI submits them all and the backend scheduler queues, runs, and retries them, so a submitted run shows as queued until the scheduler starts it. Before submitting, the CLI prints the batch's estimated cost (per-model breakdown + tips to reduce it) and asks to confirm — --yes skips the prompt, and the run proceeds without an estimate if pricing is unreachable.

kradle experiment create <name>                                   # Pick a challenge, scaffold config.ts
kradle experiment create <name> --challenge team-kradle:ctf       # Or name the challenge directly
kradle experiment create <name> --open                            # Also open config.ts in your editor

kradle experiment run <name>                                      # New version each invocation (resumes if interrupted)
kradle experiment run <name> --extend                             # Append runs to the current version
kradle experiment run <name> --download-recordings --download-logs  # Auto-download as runs complete

kradle experiment list                                            # Local + cloud experiments with sync status
kradle experiment list --local                                    # Local only (no auth); --cloud for cloud only
kradle experiment status <name>                                   # Backend metadata + versions + run counts
kradle experiment status <name> 2                                 # Version 2 in full detail
kradle experiment recordings <name> --all                         # Download recordings
kradle experiment logs <name> --all                               # Download logs + run.json

kradle experiment analyze <name>                                  # Build a local DuckDB + Evidence dashboard from logs
kradle experiment analyze <name> --open                           # ...and serve it in a browser
kradle experiment dashboard <name>                                # Re-serve a dashboard analyze already built

A TUI shows live per-run progress. Ctrl-C then re-run to resume. Progress is saved in experiments/<name>/versions/NNN/progress.json.

Manifest shape

config.ts exports a main() function returning a manifest. The minimum shape is:

import type { Manifest } from "@kradle/challenges-sdk";

export function main(): Manifest {
  return {
    runs: [
      {
        challenge_slug: "team-kradle:my-challenge",
        participants: [
          { agent: "team-kradle:gemini-3-flash" },
          { agent: "team-kradle:grok-code-fast-1" },
        ],
      },
      // ...more runs
    ],
    tags: ["my-experiment-tag"],
    // Optional visibility overrides — see resolution chain above:
    // useAliases: true,
    // hideRoleAssignments: false,
    // Optional version metadata stored at creation (≤32 entries; benchmark
    // flows set `subject` to the model under test):
    // metadata: { subject: "openai/gpt-5" },
  };
}

There is no --use-aliases / --hide-role-assignments flag on kradle experiment run. Set those fields on the manifest returned by main(). Scripts that pass those flags get a clean Nonexistent flag error from oclif; set the manifest field instead.

`--extend`: appending to a version

--extend appends runs to the current version instead of creating a new one. The CLI re-runs config.ts so the run count comes straight from your manifest. Pass --yes to skip the confirmation prompt.

Some manifest fields are re-read on every --extend and applied to the newly-appended runs:

useAliases and tags
hideRoleAssignments

Already-completed runs keep the values they ran with. The CLI prints a >> <field> changed from X to Y notice when a re-read value differs.

For a clean version with a fresh manifest snapshot, omit --extend.

Backend curation

Experiments mirror to the web immediately. The CLI exposes the same surface:

kradle experiment status <name>                                   # Metadata + versions (● marks the current one)
kradle experiment status <name> 2                                 # Version 2 in detail (notes, run IDs, timestamps)

kradle experiment edit <name> --name "New name" --description "..."
kradle experiment edit-version <name> 2 --notes "rerun with fixed seed"
kradle experiment edit-version <name> 2 --hidden                  # Hide a version from the list

kradle experiment version <name>                                  # Create a fresh empty backend version
kradle experiment version <name> --name "ablation A" --description "..." --hidden

kradle experiment attach <name> <run-id>                          # Attach a run to the latest version (refused if sealed)
kradle experiment attach <name> <run-id> --version 2              # ...or to a specific one
kradle experiment detach <name> <run-id>                          # ...or detach (no-op if not attached)

kradle experiment delete-version <name> 2                         # Delete a version (runs stay intact); -y skips confirm
kradle experiment visibility <name> --public                      # Or --private

kradle experiment status doubles as a diagnostic. It surfaces corrupt .experiment.json errors loudly instead of silently recreating state.

delete-version removes only the version grouping — the underlying runs live in a separate collection and are untouched. It defaults to the latest version.

Analysis (local dashboards)

kradle experiment analyze <name> turns the downloaded logs into a local DuckDB (analysis/kradle.duckdb) and an Evidence dashboard. It ingests every version and auto-downloads any newly-finished runs first. Build-only by default (CI/agent-friendly); --open serves the dashboard in a browser.

kradle experiment dashboard <name> re-serves a dashboard analyze already built — the lightweight "view it again" without the full rebuild.

Both run on the kradle-managed DuckDB + Evidence under the hood (kradle tools duckdb / kradle tools evidence expose them directly). For an AI agent, kradle ai-docs analytics prints the DuckDB schema + charting conventions so it can write its own queries.

Worlds

Worlds are the Minecraft world saves a challenge runs in. Most challenges use a built-in world (e.g. team-kradle:colosseum). To ship a custom one:

kradle world import <path>                       # Package a local world folder + upload
kradle world import <path> --as my-world         # ...with a custom slug
kradle world push my-world                       # Re-upload after local edits
kradle world push --all                          # Push every local world
kradle world pull [slug]                         # Download a world from the cloud
kradle world list                                # Show sync status: ✓ synced / ☁ cloud only / ⊡ local only
kradle world info <slug>                         # Show a world's local + cloud details (config, locations, IDs)
kradle world delete my-world                     # Delete locally, in the cloud, or both

world import packages a folder containing level.dat, scaffolds a config.ts, and uploads. Per-player save data is stripped (challenge runs always spawn agents fresh).

A world's config.ts supports named locations:

export const config = {
  name: "CTF Arena",
  domain: "minecraft",
  worldConfig: {
    locations: {
      red_spawn: { name: "Red Team Spawn", coordinates: { x: 100, y: 64, z: 0 } },
      blue_flag: { name: "Blue Flag", coordinates: { x: -150, y: 64, z: 0 } },
    },
  },
};

Locations are synced on world push and available to challenges that reference the world.

Agents

kradle agent list

Lists your own agents plus team-kradle's shared agents, with model and pricing (USD per million input/output tokens, from OpenRouter). Deprecated or pulled models get filtered out so the table only shows what you can actually run; agents without a model (NPCs, local agents) show their kind instead, like (npc) or (local).

Most major LLMs are available via team-kradle:* agents, maintained by the Kradle team.

Creating your own agents

kradle agent create my-fighter --type prompt \
  --model google/gemini-2.5-flash \
  --persona "You are an aggressive close-range fighter."

Prompt agents are LLM-backed: pick an OpenRouter model and a personality prompt, and the Kradle cloud handles the observation→action loop for you.

kradle agent create goto-ridge --type npc \
  --script "await skills.goToPosition(100, 64, -200);"
kradle agent create lumberjack --type npc --script-file ./lumberjack.js

NPCs are deterministic agents: instead of an LLM, they run a fixed JavaScript snippet in the run's skills sandbox, once, when the run starts. Use them as scripted opponents, pace-setters, or fixed-behavior participants — they cost nothing per run and behave the same way every time. The script is stored in the agent config and runs directly in the arena with no extra service.

kradle agent create dbg --type local

Local agents have no built-in brain — you supply their actions yourself. The arena spawns a normal code-executing bot, but instead of an LLM driving it, the run waits for actions you submit via kradle challenge runs act. This is how you debug a challenge by playing it: add a local agent to a run, then loop observe → act. No model, no per-run cost.

Inspecting, editing, deleting

kradle agent get goto-ridge                 # full details, including the NPC script
kradle agent edit goto-ridge --script "await skills.goToPosition(0, 64, 0);"
kradle agent edit my-fighter --model google/gemini-3.5-flash
kradle agent delete goto-ridge              # asks for confirmation; --yes to skip

agent edit changes only the fields you pass and preserves everything else, including visibility.

AI assistance

If you're authoring challenges with a coding agent (Claude Code, Cursor, Codex, etc.), pipe these into its context.

Start with kradle ai-docs (no arguments): it prints a one-page capability map, a grouped index of everything the CLI does with the command to learn more about each area. It's the antidote to "the agent didn't know that feature existed." Drill into a single area with kradle ai-docs <topic> (overview, auth, setup, challenge-config, challenges, run, runs, replays, agents, experiments, worlds, docs, recipes) to pull one topic page instead of the whole reference, or add words to get a single command's docs (kradle ai-docs replays screenshot).

kradle ai-docs                              # Capability map: the index of everything the CLI can do
kradle ai-docs challenges                   # One topic page (also: challenge-config, runs, replays, experiments, …)
kradle ai-docs replays screenshot           # One command's docs only
kradle ai-docs cli                          # Full CLI reference, exhaustive flag-level (all topic pages)
kradle ai-docs challenges-sdk [version]     # The @kradle/challenges-sdk TypeScript API (add words to filter: `challenges-sdk actions give`)
kradle ai-docs agent-skills                 # Agent skill reference (skills/world/cheats) — add words to filter (`agent-skills goToPosition`)
kradle ai-docs workflow                     # Challenge-authoring engagement model + checklists
kradle ai-docs analytics                    # Experiment-analytics DuckDB schema + charting conventions
kradle ai-docs search "<query>"             # Semantic search across the AI docs
kradle ai-docs sync                         # Sync project AGENTS.md / CLAUDE.md to the latest templates

ai-docs cli, ai-docs workflow, and ai-docs analytics are inlined into the binary and work offline. ai-docs challenges-sdk fetches from jsdelivr for the SDK version pinned in your package.json.

ai-docs agent-skills prints the Minecraft agent skill reference — the JS functions (skills.* / world.*) an agent calls in its action code. Pull it before writing an NPC script (agent create --type npc) or driving a local agent (challenge runs act), so you call real functions with the right arguments (await skills.attackPlayer("Jack"), not skills.attackPlayer(bot, "Jack") — bot is supplied automatically). Fetched live from kradle.ai/js_functions, so it tracks the deployed arena.

ai-docs search is the fastest way to answer one narrow docs question without dumping the full references into context. It searches all five categories by default: bundled CLI/workflow/analytics docs plus live agent-skills and challenges-sdk docs. Remote docs degrade gracefully: if agent-skills or challenges-sdk cannot be fetched, the command warns on stderr and still ranks the docs it has.

kradle ai-docs search "how do I capture screenshots"
kradle ai-docs search "custom variables in run output" --category cli --limit 3
kradle ai-docs search "agent can place blocks" --category agent-skills --json

Flags:

Flag	Description
`--limit <n>`	Number of ranked chunks to print. Default `5`, max `25`.
`--category <id>`	Restrict search to a category. Repeatable. One of `cli`, `workflow`, `analytics`, `agent-skills`, `challenges-sdk`.
`--json`	Emit `[{ category, headingPath, score, text }]` instead of markdown sections.

The first run downloads the local embedding runtime once: onnxruntime-web WASM plus Snowflake/snowflake-arctic-embed-s quantized q8 weights, roughly 35 MB total, into ~/.kradle/tools/embeddings (or ~/.kradledev/tools/embeddings for the dev channel). Release builds ship precomputed embeddings for the docs corpus, so after that one-time download the first search is fast; document embeddings for changed or unmatched docs are computed locally and cached under ~/.kradle/cache/ai-docs-embeddings. stdout is clean result content; progress and warnings go to stderr, so kradle ai-docs search "..." 2>/dev/null is safe for piping into another tool.

ai-docs sync updates your project's AGENTS.md / CLAUDE.md from the templates bundled with the installed CLI. Useful after kradle update. --dry-run previews, --yes skips prompts.

Install the Kradle skill (`kradle install-skills`)

Installs a small Kradle skill into your machine's global agent-skill directories so coding agents (Claude Code, and any agentskills.io-compatible client) automatically know Kradle exists and how to use it, without you pasting docs into every session. The skill is tiny: its job is to point the agent at kradle ai-docs, which does the rest.

kradle install-skills          # interactive: pick which directories to install into
kradle install-skills --yes    # non-interactive: install to the default/last-used set
kradle install-skills --remove # uninstall

By default it writes to ~/.agents/skills/kradle/SKILL.md (the cross-client convention) and ~/.claude/skills/kradle/SKILL.md (Claude Code). It's global (one install covers every project) and your choice is remembered, so kradle update will remind you to run it if you haven't. Start a fresh agent session to pick up a newly installed skill.

Maintenance

Auto-updates (`kradle config autoupdate`)

The CLI keeps itself current by default: when a newer release is published, the running command finishes normally and the installer runs in the background — your next command uses the new version. Control it with:

kradle config                  # list settings (value + provenance)
kradle config autoupdate off   # opt out; the "update available" notice returns

Defaults: on for the standard install, off for the kradle-dev channel. KRADLE_SKIP_UPDATE_CHECK=1 disables update checks entirely.

Keeping the SDK current (`kradle update-sdk`, `kradle changelog`)

The CLI is a global binary; @kradle/challenges-sdk is a per-project dependency — updating one doesn't update the other. kradle challenge build warns when your project's SDK is behind. To update:

kradle update-sdk --check      # what would change (breaking sections marked)
kradle update-sdk              # bump + install + type-check the project
kradle changelog               # full CLI + SDK changelogs (--since <version> to filter)

update-sdk ends with a project type-check, so breaking API changes surface immediately as a file:line list — fix them guided by kradle ai-docs challenges-sdk <words>, verify with kradle challenge build, roll back with --version <old>.

kradle update                               # Upgrade kradle (no-op if already on latest)
kradle update --force                       # Re-run the installer even when on latest (repair scenario)
kradle update --check                       # Refresh the version cache without installing
KRADLE_VERSION=v0.8.1 kradle update         # Pin to a specific version (possibly downgrade)

kradle tools bun install                          # Reinstall project deps via the managed bun
kradle tools bun add @kradle/challenges-sdk@latest
kradle tools duckdb <db> -c "<sql>"               # Query a built analysis DB with the managed DuckDB
kradle tools evidence dev                          # Run Evidence in the current project

kradle update exits without touching anything if you're already on the latest. When upgrading, it shells out to the same installer pipeline as the original install (install.sh on macOS / Linux, install.ps1 via PowerShell on Windows).

After most commands, the CLI prints a one-line upgrade hint if a newer version is out. Set KRADLE_SKIP_UPDATE_CHECK=1 to silence it.

kradle tools proxies to the kradle-managed toolchain: bun (bundled at ~/.kradle/runtime/bun; most common use kradle tools bun install after editing package.json), and duckdb / evidence, which are downloaded on first use and back kradle experiment analyze/dashboard.

Getting help

kradle <command> --help for the per-command flag list and one-line description.
kradle ai-docs cli for the full, AI-friendly reference (every flag, every option).
kradle ai-docs challenges-sdk for the TypeScript SDK reference.
kradle challenge pull with no arguments: interactive picker over every public challenge. The fastest way to learn by example.