Sprint 1 Planning Report — Persona Voice Skill

◈ Executive Summary

10

Team Members

68

Story Points Committed

14

User Stories

2 wk

Sprint Duration

10

Planning Docs

60+

Test Cases

Sprint Goal

Deliver a working data ingestion pipeline and local embedding system so that writing samples can be imported, embedded, stored, and retrieved by context — forming the foundation for voice matching in Sprint 2.

★ Product Vision

For Taps, a professional who communicates across multiple channels (WhatsApp, email, LinkedIn, professional outreach), the Persona Voice Skill is a local-first, privacy-preserving writing assistant that learns his authentic voice from real writing samples and generates context-appropriate text that sounds like him — not like AI. Unlike generic LLM outputs or cloud-based writing tools, our product keeps all data on-device, uses style embeddings for retrieval-augmented voice matching, and improves continuously as more samples are ingested.

⚙ System Architecture

 ┌──────────────────────────────────────────────────────────────────┐
 │                     CLI Interface (commander)                     │
 │   ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
 │   │  ingest   │  │  query   │  │  voice   │  │  stats/export │  │
 │   └─────┬────┘  └────┬─────┘  └────┬─────┘  └──────┬────────┘  │
 └─────────┼────────────┼─────────────┼───────────────┼────────────┘
           │            │             │               │
           ▼            ▼             ▼               ▼
 ┌──────────────────────────────────────────────────────────────────┐
 │                      Core Service Layer                          │
 │                                                                  │
 │   ┌─────────────┐   ┌─────────────┐   ┌─────────────────────┐  │
 │   │  Ingestion   │   │  Retrieval  │   │  Voice Filter       │  │
 │   │  Pipeline    │   │  Engine     │   │  (few-shot rewrite) │  │
 │   │  ·WhatsApp   │   │  ·cosine    │   │  ·prompt assembly   │  │
 │   │  ·email      │   │  ·ranking   │   │  ·context selection │  │
 │   │  ·text       │   │  ·filtering │   │  ·output formatting │  │
 │   └──────┬───────┘   └──────┬──────┘   └──────────┬──────────┘  │
 │          │                  │                      │             │
 │          ▼                  ▼                      ▼             │
 │   ┌──────────────────────────────────────────────────────────┐  │
 │   │            Embedding Service (ONNX Runtime)               │  │
 │   │       @xenova/transformers · all-MiniLM-L6-v2            │  │
 │   │              384-dimensional vectors                      │  │
 │   └───────────────────────┬──────────────────────────────────┘  │
 │                           │                                      │
 │   ┌───────────────────────▼──────────────────────────────────┐  │
 │   │               Storage Layer (SQLite)                      │  │
 │   │    better-sqlite3 · AES-256-GCM encrypted text            │  │
 │   │    embedding BLOBs · context-indexed · metadata JSON      │  │
 │   └──────────────────────────────────────────────────────────┘  │
 └──────────────────────────────────────────────────────────────────┘

⇄ Data Flow

Ingestion Path

User provides file + context via CLI
Parser extracts writing samples (WhatsApp/email/text)
Samples cleaned and normalized
Raw text encrypted with AES-256-GCM
Text embedded via ONNX model (384-dim)
Embedding + encrypted text stored in SQLite

Query Path

User provides query text + context via CLI
Query text embedded via ONNX model
Cosine similarity search over stored embeddings
Filter by context type (optional)
Top-N results retrieved and ranked
Encrypted text decrypted for few-shot prompt
Voice filter rewrites output in Taps' voice

☰ Sprint 1 Backlog — MoSCoW Priority

MUST (7)

SHOULD (4)

COULD (2)

WON'T (1)

MUST HAVE

US-1: Project Scaffolding

TypeScript project with build tooling, linting, test infrastructure, proper .gitignore for privacy.

MUST 3 pts

US-2: WhatsApp Export Parser

Parse WhatsApp .txt exports, filter by sender, extract messages with timestamps and context tagging.

MUST 5 pts

US-3: Email & Text Parser

Parse plain text emails and generic text files into writing samples with context tags.

MUST 3 pts

US-4: Local Embedding Model

ONNX Runtime with all-MiniLM-L6-v2 for 384-dim embeddings. Batch and single-query modes, fully local.

MUST 8 pts

US-5: SQLite Vector Store

Store embeddings + encrypted text in SQLite with context-indexed cosine similarity search.

MUST 8 pts

US-6: Retrieval API

Given context + query text, retrieve top-N similar samples ranked by cosine similarity.

MUST 5 pts

US-7: Data Encryption

AES-256-GCM encryption for all stored writing samples. Secure key management via .env.

MUST 5 pts

SHOULD HAVE

US-8: Voice Filter

Few-shot prompt construction from retrieved examples. Rewrite AI drafts in Taps' voice.

SHOULD 8 pts

US-9: CLI Tools

Commands: ingest, ingest-whatsapp, voice, similar, stats. Progress bars, color output, JSON mode.

SHOULD 8 pts

US-10: Bulk Directory Ingestion

Ingest all text files from a directory with context tagging and progress reporting.

SHOULD 3 pts

US-11: Ingestion Statistics

Display sample counts by context, database size, model info, distribution chart.

SHOULD 2 pts

COULD HAVE

US-12: OpenClaw SKILL.md

Package as an installable OpenClaw skill with proper SKILL.md specification.

COULD 3 pts

US-13: Export Embeddings

Export embeddings to file for backup and portability.

COULD 2 pts

⚙ Technology Stack

Component	Technology	Purpose
Language	TypeScript 5.4+ (strict mode)	Type-safe development, OpenClaw ecosystem compatibility
Runtime	Node.js 18+ (LTS)	Server-side JavaScript execution
Embeddings	@xenova/transformers + all-MiniLM-L6-v2	Local ONNX inference, 384-dim sentence embeddings
Database	better-sqlite3	Vector storage, metadata, encrypted text
CLI Framework	commander	Command parsing, subcommands, flags
CLI Styling	chalk + ora + cli-table3 + cli-progress	Colors, spinners, tables, progress bars
Encryption	Node.js crypto (AES-256-GCM)	Writing sample encryption at rest
Testing	vitest + @vitest/coverage-v8	Unit, integration, E2E tests with coverage
Linting	ESLint + Prettier	Code quality and formatting

☯ Team Capacity

#	Role	Capacity	Sprint 1 Focus
1	Product Owner	6 pts	Requirements, acceptance criteria, backlog prioritization
2	Scrum Master	4 pts	Sprint planning, ceremonies, process, impediment removal
3	Backend Developer	10 pts	Embedding engine, vector store, retrieval API, encryption
4	Frontend Developer	8 pts	CLI tools, command design, OpenClaw packaging
5	UI Designer	4 pts	CLI output design, formatting, accessibility
6	UX Researcher	6 pts	Style analysis framework, voice quality metrics
7	QA Engineer	10 pts	Test strategy, 60+ test cases, coverage targets
8	DevOps Engineer	8 pts	Project setup, CI/CD, deployment scripts, model management
9	Business Analyst	6 pts	Requirements spec, data formats, traceability matrix
10	Code Reviewer	6 pts	Triple-gate review process, code standards, quality gates
	Total	68 pts

⏰ Sprint Timeline

Day 1

Sprint Planning • Commit to backlog • Environment setup begins

Days 2-3

Project scaffolding • ONNX model integration • SQLite schema • Parser development starts

Day 4

Code Review Sync #1 • WhatsApp parser complete • Email parser in progress

Day 5

Backlog Refinement • Embedding service operational • Vector store insert/query working

Days 6-7

Retrieval API • CLI commands • Integration testing • Code Review Sync #2

Days 8-9

Voice filter (if time) • Polish • Bug fixes • Code Review Sync #3 • Final testing

Day 10

Sprint Review (demo) • Sprint Retrospective • Sprint 2 readiness check

⚠ Risk Register

Risk	Likelihood	Impact	Mitigation
ONNX model too slow or incompatible with Node.js 18	Medium	High	Spike on Day 1-2; fallback to smaller model or pre-computed embeddings
SQLite cosine similarity too slow at scale (10k+ samples)	Medium	Medium	Benchmark early; consider sqlite-vec extension or pre-filtering by context
WhatsApp export format varies across locales/versions	High	Medium	Support multiple date formats; use regex patterns with fallbacks
Insufficient writing samples for meaningful voice matching	Medium	Medium	Set minimum thresholds; warn user when sample count is low
VPS memory constraints (<2GB) for ONNX model	Low	High	Profile memory usage early; all-MiniLM-L6-v2 is small (~100MB)

✓ Quality Gates

Testing

All unit tests pass (vitest)
All integration tests pass
Code coverage > 80% (lines, branches, functions)
60+ test cases across all components
Performance benchmarks met

Code Quality

TypeScript strict mode — 0 errors
ESLint — 0 errors
Prettier — all files formatted
No any types in production code
All functions < 50 lines

Security

AES-256-GCM encryption for all stored text
No raw text in git repository
Parameterized SQL queries only
No hardcoded secrets
npm audit clean

Performance

Single embedding < 100ms
Batch 100 embeddings < 30s
Search 10k samples < 500ms
CLI startup < 3s (cold)
Memory < 512MB RSS

🔍 Triple-Gate Review Process

  Developer        Gate 1           Gate 2           Gate 3
  ┌──────┐      ┌──────────┐    ┌─────────────┐   ┌──────────┐
  │  PR  │ ───▶ │  Peer    │ ──▶│   Code      │ ──▶│  Lead    │ ──▶ MERGE
  │Created│     │  Review  │    │  Reviewer   │   │ Approval │
  └──────┘      └──────────┘    └─────────────┘   └──────────┘
      ▲              │                │                  │
      └── Revisions ─┴────────────────┴──────────────────┘

Gate 1 — Peer Review: Backend ↔ Frontend cross-review. Focus: correctness, readability, tests. SLA: 12h.

Gate 2 — Code Reviewer: Architecture compliance, security, patterns, performance. SLA: 24h.

Gate 3 — Lead: Sprint goal alignment, cross-cutting concerns, final merge authority. SLA: 12h.

📄 Sprint 1 Planning Documents

★

Product Backlog

14 user stories, MoSCoW prioritized, with acceptance criteria and story points

⏰

Sprint Plan

Capacity planning, ceremonies, task assignments, dependencies, risk register

⚙

Technical Architecture

System design, TypeScript interfaces, SQLite schema, data flow, ONNX integration

⌨

CLI Design

8 commands, flags, UX patterns, OpenClaw SKILL.md spec, example sessions

⚙

DevOps Plan

Project structure, CI/CD, deployment scripts, model management, security

✓

Test Strategy

60+ test cases, coverage targets, test data strategy, quality gates

☺

UI Design

CLI output formatting, color system, ASCII mockups, accessibility

✎

Style Analysis Framework

Writing dimensions, context profiles, quality metrics, few-shot strategy

☰

Requirements Spec

32 functional + 25 non-functional requirements, traceability matrix

🔍

Code Review Standards

Triple-gate process, checklists, anti-patterns, PR template, merge criteria

✔ Definition of Done

Code compiles without errors (TypeScript strict mode)
All unit tests pass with >80% coverage
Integration tests pass for affected flows
Code reviewed through all three gates
No critical or high severity bugs open
Security checklist passed (encryption, no secrets, parameterized SQL)
Performance benchmarks met
CLI help text complete for all commands
Merged to main branch

Persona Voice Skill

◈ Executive Summary

Sprint Goal

★ Product Vision

⚙ System Architecture

⇄ Data Flow

Ingestion Path

Query Path

☰ Sprint 1 Backlog — MoSCoW Priority

MUST HAVE

US-1: Project Scaffolding

US-2: WhatsApp Export Parser

US-3: Email & Text Parser

US-4: Local Embedding Model

US-5: SQLite Vector Store

US-6: Retrieval API

US-7: Data Encryption

SHOULD HAVE

US-8: Voice Filter

US-9: CLI Tools

US-10: Bulk Directory Ingestion

US-11: Ingestion Statistics

COULD HAVE

US-12: OpenClaw SKILL.md

US-13: Export Embeddings

⚙ Technology Stack

☯ Team Capacity

⏰ Sprint Timeline

⚠ Risk Register

✓ Quality Gates

Testing

Code Quality

Security

Performance

🔍 Triple-Gate Review Process

📄 Sprint 1 Planning Documents

Product Backlog

Sprint Plan

Technical Architecture

CLI Design

DevOps Plan

Test Strategy

UI Design

Style Analysis Framework

Requirements Spec

Code Review Standards

✔ Definition of Done

➤ Next Steps

Immediate (Day 1)

Sprint 2 Preview