GitHub からリリースされたばかりの Spec Kit を試してみましたので使用感などをまとめました。
Claude Code との組み合わせを前提としています。
Spec Kitは、GitHub が公開した仕様書駆動開発ツールです。AWS の Kiro に相当するツールですが、こちらはエディタを使用せず、CLI + コーディングエージェント を組み合わせて仕様書生成からコード生成まで行うツールです。
3つのフェーズに分けて、仕様書の生成から実装計画、タスク分解までを行います。
まずはインストール
Python パッケージマネジャの uv コマンドでインストール&実行します。
uvx --from git+https://github.com/github/spec-kit.git specify init
対話型でセットアップが始まり、Claude Code・GitHub Copilot・Gemini CLIから選択できます。
対話型のセットアップを立ち上げたくない場合は、
uvx --from git+https://github.com/github/spec-kit.git specify init --ai claude
のように指定します。
初期化が完了すると、プロジェクト名のディレクトリが作成され、その中に以下のようなファイルが生成されます。
hoge
├── memory
│ ├── constitution_update_checklist.md
│ └── constitution.md
├── scripts
│ ├── check-task-prerequisites.sh
│ ├── common.sh
│ ├── create-new-feature.sh
│ ├── get-feature-paths.sh
│ ├── setup-plan.sh
│ └── update-agent-context.sh
└── templates
├── agent-file-template.md
├── plan-template.md
├── spec-template.md
└── tasks-template.md
仕様書を生成する
仕様書作成までに以下の 3 つのコマンドを使用します。
1. /specify
最初の/specify
コマンドは、自然言語で書いた「こんなアプリを作りたい」という曖昧な要求を、構造化された仕様書に変換してくれます。簡潔な要求(例:「簡単なToDoアプリを作りたい」)からでも機能要件、ユーザーストーリー、エッジケースなどを生成でき、さらに曖昧な部分には[NEEDS CLARIFICATION]
マーカーが付きます。例えば「タスク編集機能は必要?」「オフライン動作は必要?」といった判断できない要件を明示してくれるので、後から要件を詰めていくことができます。より詳細な要求を与えるほど、明確で実装可能な仕様書が生成されるため、可能な限り具体的に記述することが推奨されています。
/specify "簡単なToDoアプリを作りたい"
を実行すると scripts/create-new-feature.sh --json "簡単なToDoアプリを作りたい"
が実行され、specs
ディレクトリに仕様書が生成されます。
├── memory
│ ├── constitution_update_checklist.md
│ └── constitution.md
├── scripts
│ ├── check-task-prerequisites.sh
│ ├── common.sh
│ ├── create-new-feature.sh
│ ├── get-feature-paths.sh
│ ├── setup-plan.sh
│ └── update-agent-context.sh
├── specs
また、同時に 001-todo
のような git のブランチも切られます。
仕様書には、実行フローやガイドライン、ユーザーシナリオなどが含まれています。
以下は生成されたspec.md
の例です。
spec.md
Feature Branch: 001-todo
Created: 2025-09-03
Status: Draft
Input: User description: “簡単なToDoアプリを作りたい”
Execution Flow (main)
1. Parse user description from Input
→ If empty: ERROR "No feature description provided"
2. Extract key concepts from description
→ Identify: actors, actions, data, constraints
3. For each unclear aspect:
→ Mark with [NEEDS CLARIFICATION: specific question]
4. Fill User Scenarios & Testing section
→ If no clear user flow: ERROR "Cannot determine user scenarios"
5. Generate Functional Requirements
→ Each requirement must be testable
→ Mark ambiguous requirements
6. Identify Key Entities (if data involved)
7. Run Review Checklist
→ If any [NEEDS CLARIFICATION]: WARN "Spec has uncertainties"
→ If implementation details found: ERROR "Remove tech details"
8. Return: SUCCESS (spec ready for planning)
⚡ Quick Guidelines
- ✅ Focus on WHAT users need and WHY
- ❌ Avoid HOW to implement (no tech stack, APIs, code structure)
- 👥 Written for business stakeholders, not developers
Section Requirements
- Mandatory sections: Must be completed for every feature
- Optional sections: Include only when relevant to the feature
- When a section doesn’t apply, remove it entirely (don’t leave as “N/A”)
For AI Generation
When creating this spec from a user prompt:
- Mark all ambiguities: Use [NEEDS CLARIFICATION: specific question] for any assumption you’d need to make
- Don’t guess: If the prompt doesn’t specify something (e.g., “login system” without auth method), mark it
- Think like a tester: Every vague requirement should fail the “testable and unambiguous” checklist item
-
Common underspecified areas:
- User types and permissions
- Data retention/deletion policies
- Performance targets and scale
- Error handling behaviors
- Integration requirements
- Security/compliance needs
User Scenarios & Testing (mandatory)
Primary User Story
A user wants to manage their personal tasks by creating a list where they can add new tasks, mark completed tasks as done, and remove tasks they no longer need. The application should provide a simple interface for basic task management without requiring complex features like sharing, categories, or due dates.
Acceptance Scenarios
- Given no existing tasks, When user adds a new task with text “Buy groceries”, Then the task appears in the task list as incomplete
- Given a task exists in the list, When user marks it as complete, Then the task status changes to completed and is visually distinguished from incomplete tasks
- Given a completed task exists, When user clicks to delete it, Then the task is removed from the list entirely
- Given an incomplete task exists, When user clicks to delete it, Then the task is removed from the list entirely
- Given multiple tasks exist, When user views the list, Then all tasks are displayed with their current status
Edge Cases
- What happens when user tries to add a task with empty text?
- How does the system behave when there are no tasks to display?
- What happens when user tries to mark an already completed task as complete again?
Requirements (mandatory)
Functional Requirements
- FR-001: System MUST allow users to add new tasks with descriptive text
- FR-002: System MUST display all tasks in a list format showing task text and completion status
- FR-003: System MUST allow users to mark incomplete tasks as completed
- FR-004: System MUST allow users to delete any task (completed or incomplete)
- FR-005: System MUST visually distinguish between completed and incomplete tasks
- FR-006: System MUST prevent adding tasks with empty or whitespace-only text
- FR-007: System MUST persist tasks so they remain available when user returns to the application
- FR-008: System MUST provide immediate visual feedback when tasks are added, completed, or deleted
[NEEDS CLARIFICATION: Should completed tasks be automatically hidden after a certain time period?]
[NEEDS CLARIFICATION: Is there a maximum number of tasks that should be supported?]
[NEEDS CLARIFICATION: Should tasks be editable after creation?]
[NEEDS CLARIFICATION: Should the application work offline or require internet connection?]
Key Entities (include if feature involves data)
- Task: Represents a single item to be completed, containing descriptive text and completion status (completed/incomplete)
- Task List: Collection of all tasks, maintaining order and providing operations for adding, updating, and removing tasks
Review & Acceptance Checklist
GATE: Automated checks run during main() execution
Content Quality
Requirement Completeness
Execution Status
Updated by main() during processing
生成されたものをさらに調整したい場合はプロンプトでそのまま書いてあげると追加してくれます。当然人の手で書いても問題ありません。
2. /plan
次の/plan
コマンドは、仕様書で定義された「何を作るか」を「どうやって作るか」に変換します。プロジェクトの「憲法」(constitution.md)に照らし合わせたConstitution Checkが実行されます。これにより、シンプルさ、テスト駆動開発(TDD)、観測可能性などの原則に違反していないかをチェックし、違反がある場合は正当な理由を記録しておきます。
/plan
コマンドを実行すると、scripts/setup-plan.sh --json
が実行され、plans.md
に実装計画が生成されます。/plan
では実装計画以外にも research.md
が生成されます。このファイルには仕様書で定義された機能を実装する前に、技術的な選択肢を調査・比較し、最適な技術スタックと実装方針を決定した過程が記録されています。
/plan
が完了すると以下のようなファイルが生成されます(★のファイル)。
.
├── CLAUDE.md
├── memory
│ ├── constitution_update_checklist.md
│ └── constitution.md
├── scripts
│ ├── check-task-prerequisites.sh
│ ├── common.sh
│ ├── create-new-feature.sh
│ ├── get-feature-paths.sh
│ ├── setup-plan.sh
│ └── update-agent-context.sh
├── specs
│ └── 001-todo
│ ├── contracts
以下は生成されたplan.md
の例です。
plan.md
Branch: 001-todo
| Date: 2025-09-03 | Spec: spec.md
Input: Feature specification from /specs/001-todo/spec.md
Execution Flow (/plan command scope)
1. Load feature spec from Input path
→ If not found: ERROR "No feature spec at {path}"
2. Fill Technical Context (scan for NEEDS CLARIFICATION)
→ Detect Project Type from context (web=frontend+backend, mobile=app+api)
→ Set Structure Decision based on project type
3. Evaluate Constitution Check section below
→ If violations exist: Document in Complexity Tracking
→ If no justification possible: ERROR "Simplify approach first"
→ Update Progress Tracking: Initial Constitution Check
4. Execute Phase 0 → research.md
→ If NEEDS CLARIFICATION remain: ERROR "Resolve unknowns"
5. Execute Phase 1 → contracts, data-model.md, quickstart.md, agent-specific template file (e.g., `CLAUDE.md` for Claude Code, `.github/copilot-instructions.md` for GitHub Copilot, or `GEMINI.md` for Gemini CLI).
6. Re-evaluate Constitution Check section
→ If new violations: Refactor design, return to Phase 1
→ Update Progress Tracking: Post-Design Constitution Check
7. Plan Phase 2 → Describe task generation approach (DO NOT create tasks.md)
8. STOP - Ready for /tasks command
IMPORTANT: The /plan command STOPS at step 7. Phases 2-4 are executed by other commands:
- Phase 2: /tasks command creates tasks.md
- Phase 3-4: Implementation execution (manual or via tools)
Summary
Primary requirement: Create a simple ToDo application where users can add, complete, delete, and view tasks with persistent storage. Technical approach: Single-page web application with local storage persistence, focusing on core CRUD operations and immediate user feedback.
Technical Context
Language/Version: JavaScript ES6+ with HTML5/CSS3
Primary Dependencies: No external frameworks (vanilla JS for simplicity)
Storage: localStorage (browser local storage for persistence)
Testing: Browser-based testing with simple test framework or manual testing
Target Platform: Modern web browsers (Chrome, Firefox, Safari, Edge)
Project Type: single (simple web page, no backend required)
Performance Goals: Instant response for all user interactions (
Constraints: Work offline, no server dependency, minimal resource usage
Scale/Scope: Support for hundreds of tasks per user, single user focus
Constitution Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
Simplicity:
- Projects: 1 (single web page)
- Using framework directly? Yes (vanilla HTML/JS/CSS, no wrappers)
- Single data model? Yes (Task entity only)
- Avoiding patterns? Yes (no unnecessary abstractions)
Architecture:
- EVERY feature as library? N/A (single HTML page application)
- Libraries listed: N/A (vanilla implementation)
- CLI per library: N/A (web application)
- Library docs: N/A (single file application)
Testing (NON-NEGOTIABLE):
- RED-GREEN-Refactor cycle enforced? Yes (will write tests first)
- Git commits show tests before implementation? Yes (will be enforced)
- Order: Contract→Integration→E2E→Unit strictly followed? Modified for frontend (E2E→Integration→Unit)
- Real dependencies used? Yes (actual browser localStorage)
- Integration tests for: new libraries, contract changes, shared schemas? Yes (localStorage integration)
- FORBIDDEN: Implementation before test, skipping RED phase
Observability:
- Structured logging included? Basic console logging for errors
- Frontend logs → backend? N/A (no backend)
- Error context sufficient? Yes (user-facing error messages)
Versioning:
- Version number assigned? 1.0.0
- BUILD increments on every change? Yes
- Breaking changes handled? Yes (localStorage schema versioning if needed)
Project Structure
Documentation (this feature)
specs/001-todo/
├── plan.md # This file (/plan command output)
├── research.md # Phase 0 output (/plan command)
├── data-model.md # Phase 1 output (/plan command)
├── quickstart.md # Phase 1 output (/plan command)
├── contracts/ # Phase 1 output (/plan command)
└── tasks.md # Phase 2 output (/tasks command - NOT created by /plan)
Source Code (repository root)
# Option 1: Single project (DEFAULT)
src/
├── index.html
├── style.css
├── script.js
└── lib/
└── todo.js
tests/
├── integration/
│ └── todo-storage.test.js
└── e2e/
└── todo-app.test.js
Structure Decision: Option 1 (single project) – simple web page with no backend required
Phase 0: Outline & Research
-
Extract unknowns from Technical Context above:
- No NEEDS CLARIFICATION remain – all technical decisions made based on simple requirements
-
Generate and dispatch research agents:
Task: "Research localStorage best practices for persistence" Task: "Research vanilla JS patterns for DOM manipulation" Task: "Research testing approaches for vanilla JS applications"
-
Consolidate findings in
research.md
using format:- Decision: [what was chosen]
- Rationale: [why chosen]
- Alternatives considered: [what else evaluated]
Output: research.md with all technical decisions documented
Phase 1: Design & Contracts
Prerequisites: research.md complete
-
Extract entities from feature spec →
data-model.md
:- Task entity with id, text, completed status
- Task collection with CRUD operations
- localStorage schema definition
-
Generate API contracts from functional requirements:
- Task management interface (add, complete, delete, list)
- localStorage contract specification
- DOM interaction contracts
-
Generate contract tests from contracts:
- localStorage persistence tests
- DOM manipulation tests
- Task state management tests
-
Extract test scenarios from user stories:
- Each acceptance scenario → test case
- Edge cases → error handling tests
-
Update agent file incrementally:
- Create CLAUDE.md for Claude Code context
- Include ToDo app technical context
- Document key patterns and conventions
Output: data-model.md, /contracts/*, failing tests, quickstart.md, CLAUDE.md
Phase 2: Task Planning Approach
This section describes what the /tasks command will do – DO NOT execute during /plan
Task Generation Strategy:
- Load
/templates/tasks-template.md
as base - Generate tasks from Phase 1 design docs (contracts, data model, quickstart)
- Each contract → contract test task [P]
- Each entity → model creation task [P]
- Each user story → integration test task
- Implementation tasks to make tests pass
Ordering Strategy:
- TDD order: Tests before implementation
- Dependency order: Core logic before DOM manipulation before styling
- Mark [P] for parallel execution (independent files)
Estimated Output: 15-20 numbered, ordered tasks in tasks.md
IMPORTANT: This phase is executed by the /tasks command, NOT by /plan
Phase 3+: Future Implementation
These phases are beyond the scope of the /plan command
Phase 3: Task execution (/tasks command creates tasks.md)
Phase 4: Implementation (execute tasks.md following constitutional principles)
Phase 5: Validation (run tests, execute quickstart.md, performance validation)
Complexity Tracking
Fill ONLY if Constitution Check has violations that must be justified
No violations – simple single-page application meets all constitutional requirements.
Progress Tracking
This checklist is updated during execution flow
Phase Status:
Gate Status:
Based on Constitution v2.1.1 – See /memory/constitution.md
3. /tasks
最後の/tasks
コマンドは、実装計画を開発者が今すぐ着手できる具体的な作業単位まで分解してくれます。生成されるタスクリストには、詳細なタスク、各タスクの推定工数、タスク間の依存関係、受け入れ条件(Done定義)、必要なテストケースが含まれます。
/tasks
コマンドを実行すると、scripts/check-task-prerequisites.sh --json
が実行され、tasks.md
にタスクリストが生成されます。
.
├── CLAUDE.md
├── memory
│ ├── constitution_update_checklist.md
│ └── constitution.md
├── scripts
│ ├── check-task-prerequisites.sh
│ ├── common.sh
│ ├── create-new-feature.sh
│ ├── get-feature-paths.sh
│ ├── setup-plan.sh
│ └── update-agent-context.sh
├── specs
│ └── 001-todo
│ ├── contracts
│ │ ├── dom-interface.md
│ │ └── task-api.md
│ ├── data-model.md
│ ├── plan.md
│ ├── quickstart.md
│ ├── research.md
│ ├── spec.md
│ └── tasks.md
最終的に完成したタスクリストがこちらです。
tasks.md
Input: Design documents from /specs/001-todo/
Prerequisites: plan.md (required), research.md, data-model.md, contracts/
Execution Flow (main)
1. Load plan.md from feature directory
→ If not found: ERROR "No implementation plan found"
→ Extract: tech stack, libraries, structure
2. Load optional design documents:
→ data-model.md: Extract entities → model tasks
→ contracts/: Each file → contract test task
→ research.md: Extract decisions → setup tasks
3. Generate tasks by category:
→ Setup: project init, dependencies, linting
→ Tests: contract tests, integration tests
→ Core: models, services, CLI commands
→ Integration: DB, middleware, logging
→ Polish: unit tests, performance, docs
4. Apply task rules:
→ Different files = mark [P] for parallel
→ Same file = sequential (no [P])
→ Tests before implementation (TDD)
5. Number tasks sequentially (T001, T002...)
6. Generate dependency graph
7. Create parallel execution examples
8. Validate task completeness:
→ All contracts have tests?
→ All entities have models?
→ All endpoints implemented?
9. Return: SUCCESS (tasks ready for execution)
Format: [ID] [P?] Description
- [P]: Can run in parallel (different files, no dependencies)
- Include exact file paths in descriptions
Path Conventions
-
Single project:
src/
,tests/
at repository root - Paths shown below assume single project per plan.md structure
Phase 3.1: Setup
Phase 3.2: Tests First (TDD) ⚠️ MUST COMPLETE BEFORE 3.3
CRITICAL: These tests MUST be written and MUST FAIL before ANY implementation
Phase 3.3: Core Implementation (ONLY after tests are failing)
Phase 3.4: Integration
Phase 3.5: Polish
Dependencies
- Setup (T001-T003) before all other tasks
- Tests (T004-T008) before implementation (T009-T020)
- T009 (Task model) blocks T010-T011 (storage and API)
- T012 (HTML) blocks T014 (DOM manipulation)
- T013 (CSS) can run parallel with T014
- T011 (Task API) and T014 (DOM) must complete before T017 (integration)
- Implementation before polish (T021-T026)
Parallel Example
Phase 3.1 Setup (Run in sequence)
mkdir -p src/lib tests/integration tests/e2e
touch src/index.html src/style.css src/script.js src/lib/todo.js
Phase 3.2 Tests (Run in parallel – different files)
Phase 3.3 Core Implementation (Mixed parallel/sequential)
Notes
- [P] tasks = different files, no dependencies
- Verify tests fail before implementing
- Commit after each task
- Test in browser after each implementation task
- Follow TDD red-green-refactor cycle strictly
Task Generation Rules
Applied during main() execution
-
From Contracts:
- task-api.md → T004 (Task API contract test)
- dom-interface.md → T005 (DOM interface contract test)
-
From Data Model:
- Task entity → T009 (Task model implementation)
- TaskCollection → T010 (Storage layer)
-
From User Stories (quickstart.md):
- Add task scenario → Part of T007 (E2E test)
- Complete task scenario → Part of T007 (E2E test)
- Delete task scenario → Part of T007 (E2E test)
- Input validation → T008 (Input validation test)
-
Ordering:
- Setup → Tests → Models → Services → DOM → Integration → Polish
- localStorage operations depend on Task model
- DOM operations depend on HTML structure
Validation Checklist
GATE: Checked by main() before returning
File Mapping
Tests (Phase 3.2):
-
tests/integration/task-api.test.js
– T004 -
tests/integration/dom-interface.test.js
– T005 -
tests/integration/todo-storage.test.js
– T006 -
tests/e2e/todo-app.test.js
– T007 -
tests/integration/input-validation.test.js
– T008
Implementation (Phase 3.3-3.4):
-
src/lib/todo.js
– T009, T010, T011, T018, T019 -
src/index.html
– T012, T022 -
src/style.css
– T013, T021 -
src/script.js
– T014, T015, T016, T017, T020, T023, T026
Quality (Phase 3.5):
- Cross-cutting testing and optimization tasks – T024, T025
Ready for Execution
All 26 tasks generated with clear dependencies, file paths, and parallel execution guidance. TDD methodology enforced with comprehensive test coverage before implementation.
実装する
ここからは Spec Kit の機能ではなく、コーディングエージェントの機能を使用して実装を進めることができます。
例えばこんな感じのプロンプトです。
@specs/001-todo/plan.md に従って実装を進めて
バイブコーディングをする際のガイドラインを提供してくれるという点では有用なツールだと思いました。
散らばりがちな仕様やタスクを整理するのに役立ちそうです。
一方で、まだまだ発展途上のツールという印象も受けました。
README には、Greenfield だけでなく、既存プロジェクトにも適用できると書かれていますが、gitのブランチが自動的に切られる、CLAUDE.md や関連するファイルが自動的に生成・配置されるということを考えると、今のところ既存プロジェクトに適用するのは難しそうに思えました。既存のプロジェクトの場合、既存の仕様書やコードをどのように取り込むかがこのツールの肝となリますが、それがどのように実現できるのかという点も不明です。
研究のゴールとして、エンタープライズレベルの要件を満たすことや、反復的なプロセスでも活用できることなどが挙げられており、今後の発展が楽しみです。
Views: 0