Loading blog posts...
Loading blog posts...
Loading...

Claude Code gets messy fast when every developer prompts differently, the agent rewrites half the repo, and nobody can explain why tests were skipped.
Here's the deal: treat Claude Code like an IT-managed platform, not a magical pair programmer. That means contracts, guardrails, and verification loops. Below are best practices you can copy into your repo and workflow today.
markdown## CLAUDE.md (repo contract for Claude Code) ## Mission - Primary goal: ship safe, reviewable PRs with tests. - Never change public APIs without updating docs and changelog. ## Repo map - /apps/api: FastAPI service - /apps/web: Next.js app - /packages/shared: shared types + utilities ## Non-negotiables - Run: lint + typecheck + unit tests before proposing "done" - No new dependencies without justification + security note - No secrets in code, logs, or test fixtures ## Coding standards - Python: ruff + black, type hints required for public functions - TypeScript: strict mode, no `any` without comment - Error handling: return typed errors, no silent catches ## Testing rules - Every bug fix: add regression test - Every feature: unit test + one integration test - Prefer deterministic tests: freeze time, avoid network ## Change boundaries - Do not modify: /infra/terraform without explicit approval - Do not touch auth flows unless task explicitly says so ## PR format - Summary: what changed + why - Risk: edge cases + rollback plan - Verification: commands run + results ## Examples ### Good commit message feat(api): add idempotency key to payment webhook ### Good test name test_webhook_rejects_duplicate_idempotency_key
A strong CLAUDE.md cuts down on style drift and stops the agent from guessing (which is where a lot of weirdness starts). Keep it short, enforceable, and packed with examples the agent can copy.
Store it in version control and review changes to it like code because, well, it's basically policy. For a deeper structure guide, see Dometrain's breakdown: Creating the Perfect CLAUDE.md for Claude Code.
Important
[!IMPORTANT]
Treat CLAUDE.md as a contract: if the repo violates it, fix the repo or update the contract. Don't let it become aspirational.
textYou are Claude Code working in this repo. Task: [ONE SENTENCE GOAL] Constraints: - Max PR size: [X] files or [Y] lines changed - No new dependencies - Must include tests Process: 1) Ask up to 5 clarifying questions if anything is ambiguous. 2) Propose a plan with 5-10 steps. Each step must name files to touch. 3) Execute step-by-step. After each step: - summarize changes - list commands to run 4) Stop before final commit and provide: - risk list - rollback plan - verification evidence
This beats one-shot prompting because it forces assumptions into the open before code changes. And it creates natural checkpoints where a human can say, "Wait, no, that's not what we meant" before the repo gets churned.
The "small steps" rule matters most when Claude Code is doing multi-file edits and refactors. For a practical walkthrough of this loop and how Plan Mode fits, reference: Claude Code Tutorial for Beginners.
Tip
[!TIP] Put "Max PR size" into the prompt. It prevents the most common failure mode: large, opaque rewrites that are painful to review.
textEnter Plan Mode. Goal: Add rate limiting to /apps/api endpoints using existing middleware patterns. Rules: - Read relevant files before editing. - Identify current request lifecycle and where middleware is applied. - Propose 2 options: Option A: minimal change using existing components Option B: cleaner refactor if needed - For each option: list files, risk, and test approach. Stop after the plan and wait for approval.
Plan Mode is usually the cheapest way to avoid expensive backtracking. It nudges Claude Code to actually look for existing patterns, read the right modules, and offer options before it starts editing.
Use it for anything architectural, anything security-related, or anything that could realistically break CI. One pattern that works well in practice is plan approval gates: require a human "approve option A/B" before any edits. That one pause removes a lot of assumption-driven regressions.
textProduce a PR-sized change. Output format: 1) Files changed (with reason per file) 2) Diffstat estimate (rough lines added/removed) 3) Actual patch in unified diff format 4) PR description: - What changed - Why - Risks and edge cases - How verified (commands + results) Constraints: - Touch at most files - No file over lines changed unless approved - Keep public interfaces stable
Claude Code tends to do better when you define what "done" looks like in review terms. Asking for "files changed" and a "diffstat estimate" is a nice control trick: it forces the agent to think about scope before it writes.
Asking for a PR description helps later too, especially if multiple sessions (or multiple people) touched the same area. And honestly, the teams I've seen get real throughput gains usually get them from strict loops plus reviewable diffs, not from letting the agent go bigger and more autonomous.
Source discussion on operationalizing workflows and verification: DEV Community MCP and 2026 workflow shift.
bash## Minimal "agent-safe" local verification script ## Run this before you accept Claude's "done" set -euo pipefail # Example for a mixed TS + Python repo npm run lint npm run typecheck npm test ruff check. pytest -q
The best Claude Code best practice in 2026 is kind of boring: don't trust outputs without automated checks. Make verification a script so it runs the same way every time (no "works on my machine" vibes).
If the agent can't run commands in your environment, make it list the exact commands and expected outputs. One simple rule catches a lot: "No tests updated" is a quality warning for non-trivial changes. If a feature or bug fix doesn't touch tests, force a justification.
Warning
[!WARNING] If Claude Code proposes code that "looks right" but cannot name the exact test file it would add or update, stop and re-scope. That's a common sign it didn't fully map the code paths.
textBefore marking complete, run a verification checklist: - Unit tests: which files changed and why - Integration tests: which scenario proves the behavior end-to-end - Static checks: lint, typecheck - Security checks: dependency scan or SAST if available - Observability: logs/metrics impacted and example log line Return results as a table with: Check | Command | Pass/Fail | Notes
This forces evidence-based completion. And it gives you artifacts you can paste straight into a PR (which reviewers will thank you for).
bash## Safe iteration loop for AI-driven changes git checkout -b ai/[ticket-id]-[short-name] # After each completed micro-step git status git add -A git commit -m "wip: [step] [what changed]" # If Claude goes off-track git reset --hard HEAD~1
Frequent commits are a control surface, not a virtue signal. They give you rollback points, make review easier, and let you compare alternate approaches with git range-diff.
In agentic workflows, this is how teams keep speed without sacrificing safety. A useful variant is checkpoint commits: commit after adding tests, after refactor, after feature wiring. That way, when CI breaks, you can usually pinpoint the exact stage where it went sideways.
textSplit this work into parallel tracks and keep outputs mergeable. Track A (Implementation): - Add feature behind flag [FLAG_NAME] - Keep changes minimal Track B (Tests): - Add unit tests + one integration test - Focus on edge cases Track C (Docs): - Update README or /docs with new behavior - Include migration notes Track D (Threat model): - Identify abuse cases and data exposure risks - Propose mitigations and a security checklist For each track: - list files to touch - list acceptance criteria - stop after a proposed patch plan
Parallelization works when each track has clear boundaries and merge points. The trick is to converge through a human-reviewed integration step, not to let multiple agents edit the same files and hope Git sorts it out.
For example, tests can usually be built in parallel as long as the feature surface is stable. This is also where Claude Code becomes "system design" in a very real sense: you're designing work streams and guardrails, not just prompts.
textMCP usage policy (copy into internal docs) Allowed: - Read-only access to documentation sources - Read-only access to ticketing issues for requirements - Read-only access to feature flag state and config Restricted: - No write access to production systems - No secret retrieval - No customer data access Approval required: - Any action that changes infrastructure, IAM, or billing - Any write operation outside a sandbox environment
MCP servers (Model Context Protocol) change the workflow because Claude Code can pull standardized context from docs, tickets, APIs, and feature flags without you pasting everything. That's powerful, but it also widens the blast radius if access isn't tight.
Treat MCP like an internal integration: least privilege, explicit allowlists, and approval gates. The practical win is fewer glue scripts and fewer stale requirements. The practical risk is the agent acting on the wrong environment if identities and scopes are loose (I've seen that happen, and it's never fun).
Source: MCPs and the 2026 workflow shift.
yaml# Example policy model you can adapt (conceptual) agent: name: claude-code environment: dev permissions: repo: read: true write: true ci: read: true trigger: true cloud: read: true write: false secrets: source: vault allowed_paths: - "kv/dev/*" rotation_days: 30 approvals: required_for: - "infra/*" - "iam/*" - "prod/*"
The standard approach is "agent identity equals service account": scoped tokens, short TTLs, and clear audit trails. Default to read-only where you can, especially for cloud and ticketing.
Require human approval for anything that can impact production, costs, or data exposure. This matters more in 2026 because agents are simply more capable at multi-step actions. One over-permissioned integration can turn a helpful coding session into an incident.
textDecision rule for /model switching Use a faster/cheaper model when: - formatting, renames, comment cleanup - writing tests from a fixed spec - updating docs based on known behavior Use a stronger model when: - cross-cutting refactors - concurrency, caching, distributed systems changes - auth, crypto, payments, PII handling - debugging flaky tests with multiple hypotheses Always require Plan Mode when: - touching > 3 files - changing public APIs - changing data models or migrations
Model switching is a budget and risk control. Save expensive reasoning for work where it prevents real defects. Use cheaper models for mechanical work where verification will catch mistakes anyway.
For feature demos and control modes, see: Claude Code ULTIMATE Beginner Guide (2026) and the longer reference guide: Claude Code Complete Guide 2026.
| Capability | What to standardize | Why it matters | Common alternative |
|---|---|---|---|
| Persistent context | CLAUDE.md contract | Reduces drift and wrong assumptions | Ad hoc prompts in tickets |
| Structured planning | Plan Mode + approval gate | Prevents expensive wrong turns | One-shot "write code" requests |
| Verification | One script that runs lint, types, tests | Turns "done" into evidence | Manual spot checks |
| Parallel work | Tracks (impl, tests, docs, threat model) | Faster without merge chaos | One agent edits everything |
| Controlled context | MCP allowlists + least privilege | Prevents overreach and data risk | Copy-paste docs into chat |
| Change containment | PR-size limits + commit checkpoints | Makes rollback and review easy | Large rewrites |
This table is the operational baseline. If a team standardizes only two things, start with CLAUDE.md plus verification scripts. Everything else builds on those controls.
Stripe achieved 99.99% API availability by pairing strict change management with automated testing and staged rollouts. Claude Code fits best when it feeds those same gates.
Netflix achieved faster recovery from failures using automated canary analysis and progressive delivery. Agent-written changes still need the same verification and rollback paths.
Shopify handled peak traffic events by investing in observability and safe deploy practices. Claude Code changes should include log and metric impacts, not only code diffs.
These aren't "Claude-specific" wins. They're the same engineering controls that keep agentic coding safe at scale.
Start here (your first step): Create CLAUDE.md in the repo root and add 12 bullets: repo map, non-negotiables, testing rules, PR format.
Quick wins (immediate impact):
./verify.sh script and require it in every Claude Code session before "done".Deep dive (for those who want more):
Claude Code best practices in 2026 look a lot like platform ops: a versioned CLAUDE.md contract, a strict plan-to-PR workflow, and verification that produces evidence. Keep diffs small, commit in checkpoints, and scale with parallel tracks that merge through review.
As MCP integrations expand, least privilege and approval gates are probably the difference between faster delivery and a very avoidable incident.