Open-source MCP server · MIT License · by Mathew Graham

AI wrote the code.
Does it actually
match the design?

AI coding agents ship UI fast — but fast isn't correct. Wrong tokens, missing states, off-by-4px padding. DesignDiff verifies that AI-generated interfaces match your design system before they reach production.

Token drift detected — when agents hardcode values instead of referencing design tokens

Missing states flagged — hover, focus, disabled, error variants compared against Figma

Patch-ready diffs returned — for your agent to apply and re-check in the same session

Figma — design spec

Button / Primary · Default

Get started

16px padding ✓
token reference ✓

States defined:

hover

focus

disabled

Property

Figma value

padding

16px 24px

correct

background

token(brand-blue)

token ✓

border-radius

4px

correct

hover:bg

token(brand-blue-700)

defined

focus:ring

2px offset-2

defined

Code output · Parity 52/100

src/components/Button.tsx · rendered

Get started

12px padding ✗
hardcoded #2563EB ✗

States — all missing:

no hover

no focus

no disabled

Property

Code (computed)

Issue

padding

12px 24px

off by 4px

background

#2563EB

hardcoded

border-radius

4px

✓

hover:bg

missing

no feedback

focus:ring

missing

WCAG 2.4.7

52 / 100 — below threshold

Patch-ready fix generated — agent applies and re-checks

💡 Pattern: AI-generated code signature — correct structure, hardcoded colors, missing states. Agent generated from Figma without access to the token file.

Button.tsxPatch-ready

- padding: 12px 24px;

+ padding: 16px 24px;

- background: #2563EB;

+ background: var(--brand-blue-600);

✓ Agent re-checks: 94/100 · parity restored

Why it exists

AI ships UI fast.
Fast isn't correct.

What goes wrong when AI builds UI

When you ask Claude Code or Cursor to build a component from Figma, the agent reads the spec and writes code that looks correct. But it typically hardcodes a color value (#2563EB) instead of referencing the token (var(--brand-blue)), gets padding off by a few pixels, and skips interactive states in Figma variants it didn't fully parse.

The component looks fine visually. That's what makes this dangerous. Errors only surface when a brand refresh doesn't propagate, a keyboard user can't navigate, or an accessibility audit fails.

The same drift happens when developers implement manually, when designs change after code ships, and when tokens get renamed. It compounds silently across every PR.

What DesignDiff does about it

DesignDiff is an MCP server — a verification tool that runs inside your AI agent. After the agent generates a component, it calls DesignDiff to check the output before the code lands in a PR.

Reads Figma precisely — extracts exact pixel values, token references, and every variant state. Not a screenshot — raw design data including every Figma variable.

Renders in a real browser — Playwright captures computed CSS after the full cascade. Source code is intent. Computed styles are what users actually see.

Scores, explains, and suggests fixes — every mismatch gets a consequence, a severity, and a patch-ready diff the agent can apply in the same response.

When the gap happens

🤖

AI builds a component from Figma

The most common case. Agent generates code that looks right but has token, spacing, or state errors. DesignDiff catches them before the PR merges.

✏️

Developer implements manually

Developer eyeballs spacing, guesses token names, skips a Figma variant. Subtle drift accumulates across every component they touch.

🎨

Design changes after code ships

Designer updates spacing in Figma. Nobody tells the developer. Component in production quietly diverges. Nobody notices for months.

🔄

Tokens get renamed

System renames brand-blue to color-primary. Components using old names fall back to hardcoded values — until the next brand refresh.

Quality Score — where this is heading

One score.
Safe to ship.

Today DesignDiff scores rendered parity. The goal is a single score across five verification dimensions — one number CI gates, managers, and developers all understand without translation.

Button/Primary · rendered parity check · v0.1

94/100

Rendered Parity Score

Design Fidelity✓ PASS

Token Compliance✓ PASS

Spacing✓ PASS

Typography✓ PASS

Borders✓ PASS

State Coveragev0.2

Accessibilityv0.2

Responsive Layoutv0.2

hover:bg state missing — no interaction feedback on desktop pointer. Fix: &:hover { background: var(--brand-blue-700) }

What v0.1 scores today

Design Fidelity — spacing, color, and typography match the Figma spec within configurable tolerances

Token Compliance — no hardcoded values bypassing your design system

Spacing — padding, margin, and gap values match the spec

Typography — font size, weight, line-height

Borders — radius, width, color

CI gate — where it's going

✓ Button/Primary: 94/100

✓ Input/Text: 98/100

✓ Modal/Confirm: 91/100

✕ Card/Product: 61/100

PR blocked when any component drops below threshold

How it works

Verify. Score.
Patch. Ship.

Runs inside your existing AI agent. No new dashboard, no browser extension. Human review focuses on product decisions, not token drift.

01 — Fetch

Pull design intent from Figma

Calls the Figma REST API with your node_id. Extracts exact pixel values, token references, and every interactive state variant. URL-encodes node IDs correctly — a common agent mistake that causes silent 404s.

Figma REST API

GET /files/{id}/nodes?ids=123%3A456

✓ component "Button / Primary"

✓ padding 16px 24px

✓ background token(brand-blue-600)

✓ states hover · focus · disabled

✓ tokens 3 variables resolved

02 — Render

Capture what the browser actually shows

Launches a headless Playwright browser and renders the component at its real URL — Storybook, localhost, or staging. Captures computed CSS after the full cascade. Source code is intent. Computed styles are truth.

Property

Figma says

Browser shows

padding

16px 24px

12px 24px

background

token(brand-blue)

#2563EB

border-radius

4px

hover state

defined

missing

focus ring

2px offset-2

missing

03 — Score

Consequence-first parity scoring

Token violations score highest — they bypass your design system and break on every brand refresh. Missing focus rings are a WCAG 2.4.7 risk. Every mismatch includes the real consequence of not fixing it.

52 / 100

Below threshold — patch-ready fix generated

Color tokensFAIL — hardcoded

StatesFAIL — 3 missing

SpacingWARN

TypographyPASS

BordersPASS

04 — Fix

Patch-ready fix in the same response

When score is below threshold, DesignDiff generates a patch-ready diff and returns it immediately — no second tool call. Your agent applies it, then re-verifies to confirm the score improved.

generate_sync_patch → agent applies → re-check

- padding: 12px 24px;

+ padding: 16px 24px;

- background: #2563EB;

+ background: var(--brand-blue-600);

✓ Re-check: 94/100 · parity restored

Installation

Running in
five minutes.

One config entry. Your agent verifies every component it generates or modifies. Works best with Storybook, local preview routes, or any stable component URL your agent can reach.

~/.config/claude/mcp.json — Claude Code

{

"mcpServers": {

"designdiff": {

"command": "npx",

"args": ["-y", "designdiff-mcp"],

"env": { "FIGMA_API_KEY": "your-token" }

}

Cursor · Windsurf · VS Code Copilot

Same config format. Works with any MCP-compatible agent.

Say to your agent:

// Agent calls DesignDiff automatically

"Build the Button from Figma file abc123, node 123:456, at localhost:6006 — and verify it matches the spec"

→ Builds · Verifies · Suggests fix · Agent re-checks: 94/100

Claude Code

Cursor

Windsurf

VS Code Copilot

Zed

Any MCP client

MCP Tools

Parity today.
Full verification tomorrow.

The core engine ships now. Each new tool adds another dimension to the quality score — another check AI-generated code passes before it ships.

Core · ships now

check_component_parity

(node_id, component_url, code_path)

Diffs Figma spec against computed CSS. Scores spacing, tokens, typography, borders, states. Returns a patch-ready fix when score drops below threshold.

Core · ships now

flag_stale_mappings

(file_id, repo_root)

Cross-references Code Connect mappings against git history. Surfaces components that have silently drifted, ranked by blast radius and days since divergence.

Core · ships now

generate_sync_patch

(node_id, code_path, format?)

Generates a surgical patch — not a rewrite. Token substitutions, prop corrections, missing state stubs. Git diff, JSX, or CSS delta output.

v0.2 · Q3 2026 coming

audit_state_coverage

(node_id, component_url)

Checks whether every Figma-defined state exists in code. Flags WCAG 2.4.7 risk for missing focus rings and disabled states.

v0.2 · Q3 2026 coming

check_responsive_parity

(node_id, component_url, breakpoints?)

Renders at 375/768/1280/1920px. Catches overflow, collapsed containers, wrong flex direction. The most common mobile implementation gap.

v0.2 · Q3 2026 coming

check_theme_parity

(node_id, component_url, themes?)

Parity across Figma variable modes — light/dark, brand variants. Catches hardcoded colors that are invisible in light but break in dark mode.

Roadmap

What works today.
Where it's going.

Now — v0.1

Rendered parity check

Point DesignDiff at a component URL and a Figma node. Get a scored diff of spacing, color tokens, typography, and borders — plus a patch-ready fix for everything that's off.

check_component_parity · flag_stale_mappings · generate_sync_patch · React/CSS · MIT License

Next — v0.2

State, theme, and responsive coverage

Verify that every Figma-defined state exists in code. Check rendering across breakpoints and light/dark themes.

audit_state_coverage · check_responsive_parity · check_theme_parity · Storybook adapter

Later — v1.0

Team quality gates

A unified Implementation Quality Score across all dimensions. Block PRs below threshold in CI. Track score history over time. The verification layer between AI-generated UI and production.

AI wrote the code.Does it actuallymatch the design?

AI ships UI fast.Fast isn't correct.

What goes wrong when AI builds UI

What DesignDiff does about it

One score.Safe to ship.

Verify. Score.Patch. Ship.

Running infive minutes.

Parity today.Full verification tomorrow.

What works today.Where it's going.

DesignDiff verifies that AI-generated UImatches your design system before it ships.

AI wrote the code.
Does it actually
match the design?

AI ships UI fast.
Fast isn't correct.

One score.
Safe to ship.

Verify. Score.
Patch. Ship.

Running in
five minutes.

Parity today.
Full verification tomorrow.

What works today.
Where it's going.

DesignDiff verifies that AI-generated UI
matches your design system before it ships.