After this section | You will find |
|---|---|
Short decisions and asks for leadership and QA. | |
What we mean by each test type. | |
What already exists in this repo (CI, tests, gaps). | |
Later sections | ROI ordering, bootstrap vs ROI, mock boundaries, effort tables, risks, and QA handoff. |
Audience | Use |
|---|---|
Managers | Trade-offs (cost, schedule, risk), CI expectations, why sprint capacity for tests is rational. |
QA | What automation can replace or narrow, weekly smoke scope, where to log gaps when coverage does not exist yet. |
Term | Meaning here |
|---|---|
Unit test | Fast, no I/O; pure functions, parsers, small helpers. |
BLoC / Cubit test | State transitions and side-effect ordering with controlled fakes ( |
Use case test | Application service orchestration with fake repositories or ports—not real network. |
Widget test | Pumps widgets under |
Golden test | Compares rasterized output to a reference image; great for stable design surfaces, sensitive to fonts and OS. |
Integration test ( | Runs a real app binding; often on device or emulator; higher cost and flake risk. |
Contract / API test | Asserts HTTP shapes, status codes, and parsing against fixtures or a staging API; catches server drift. |
Flow snapshot test (optional pattern) | Full-app widget test: pump near-complete app, Dio (or client) returns static JSON from |
Area | Baseline |
|---|---|
CI |
|
Integration tests |
|
Unit / BLoC / use case | Many tests under |
Golden tests | No |
API SDK tests |
|
Approach | When to use | Trade-off |
|---|---|---|
Mock / fake BLoC or fixed state | Most widget and golden tests | Fast, stable; does not prove HTTP parsing. |
Fake repository at domain boundary | When the widget depends on orchestration outcomes | More logic than a dumb bloc mock, still avoids JSON in every test. |
Mock Dio deep in the tree | Rarely | Couples UI tests to serialization; noisy failures; prefer dedicated API tests instead. |
Category | API / JSON contract drift | App logic & state | UI layout & visuals | Navigation & deep links | Real device / OS / plugins |
|---|---|---|---|---|---|
Unit | Low | Low (local only) | None | None | None |
BLoC / Cubit | Low (unless bloc parses raw JSON) | High for covered transitions | Low | Low | None |
Use case + fake repo | Low at HTTP edge | High for orchestration | None | Low | None |
Widget (scoped, fake state) | Low | Medium (binding only) | High for pumped screens | Medium | None |
Golden (selective) | Low | Low | High at snapshot points | Low | None |
Contract / API tests (fixtures or staging) | High | Low | None | Low | Low (staging deps) |
Flow snapshot (full app + JSON + goldens) | Medium (only if fixtures match reality) | High for scripted paths | High at milestones | High for scripted paths | Low |
| Medium–high (if real backend) | High | Medium | High | High for covered plugins |
Layer | Effort to write | Effort to maintain | Flake risk | CI fit today |
|---|---|---|---|---|
Unit | Low | Low | Very low | Excellent |
BLoC | Low–medium | Low | Low | Excellent |
Use case + fake repo | Medium | Low–medium | Low | Excellent |
Widget | Medium | Medium | Low | Good |
Golden | Medium–high | High (design/locale) | Medium | Good if OS pinned |
| High | Medium | Medium–high | Poor without device CI |
Cross-repo API hook | Medium setup | Low per change if automated from OpenAPI/fixtures | Low | Excellent once suite exists |
Flow snapshot (full app + fixtures + goldens) | Very high | High | Medium–high | Good if OS pinned for goldens |
Test category | Manual (typical range) | AI-assisted (typical range) | Notes |
|---|---|---|---|
Unit (helpers, parsers) | 0.25–1.5 h | 0.15–0.75 h | AI excels at table-driven cases. |
BLoC / Cubit | 1–4 h | 0.5–2 h | Depends on event/side-effect complexity. |
Use case + fake repo | 1.5–5 h | 1–3 h | Fakes must match real ports; AI speeds scaffolding. |
Widget (scoped subtree, fake bloc/repo) | 2–8 h | 1–4 h | Finder stability drives variance. |
Golden only (a few stable widgets or screens) | 2–6 h | 1–3 h | First-time CI + font pinning not included. |
Contract / API (first endpoints + fixtures) | 3–12 h | 2–8 h | Faster if OpenAPI or samples exist. |
| 4–16 h | 2.5–10 h | Flakiness tuning often dominates. |
Flow snapshot (full app + Dio JSON + platform fakes + flow + goldens) | 1–3 days | 0.5–2 days | DI + routing + async + baseline images; AI helps but does not remove integration pain. |
Test category | Manual (typical range) | AI-assisted (typical range) |
|---|---|---|
Unit / BLoC / use case | 0.25–2 h | 0.15–1.25 h |
Widget (scoped) | 0.5–3 h | 0.25–2 h |
Golden | 0.5–4 h | 0.25–3 h (often re-baselining images) |
Contract / API | 0.5–3 h | 0.25–2 h |
| 1–6 h | 0.5–4 h |
Flow snapshot | 0.5–2 days | 0.25–1.5 days |
Topic | Classic | With AI assistance |
|---|---|---|
First draft of tests and mocks | Slower | Faster boilerplate and case enumeration |
Flaky test diagnosis | Engineer-led | Still engineer-led; AI may suggest hypotheses |
Golden review | Human judgment | Human judgment; AI should not “approve” pixels |
Secrets and restricted env | Controlled manually | Still require policy: no credentials in prompts, offline constraints respected |
Wrong green tests | Rare if review is strict | Higher risk if assertions are weak—review stays in Definition of Done |