Pilot Phase Guide
"Week 1-2: Prove the concept. Calibrate the system. Build the foundation."
The pilot phase is designed to validate HEAT with minimal risk and maximum learning. By the end of Week 2, you'll have baseline data, calibrated thresholds, and proof of value to justify full team rollout.
Pilot Objectives
✅ Validate signal quality — Do Pain Streaks correlate with actual grind? ✅ Calibrate intensity scale — Is x8 meaningful vs x5? ✅ Identify tag patterns — Which work types matter for your team? ✅ Measure adoption friction — Does 30-second tagging actually work? ✅ Prove business value — Can you catch one burnout signal early?
Success criteria: At least 1 actionable insight per pilot participant by Week 2
Pilot Team Selection
Option A: Individual Pilot (Safest)
Who: 1-2 volunteer developers + their manager
Pros:
- Minimal disruption
- Easy to calibrate
- Safe to iterate
Cons:
- Limited pattern detection
- No team heatmap view
- Slower to prove ROI
Best for: Conservative organizations, initial validation
Option B: Small Team Pilot (Recommended)
Who: 3-5 developers + 1 manager (e.g., single squad)
Pros:
- Team heatmap visible
- Bus Factor detectable
- Faster ROI proof
- Realistic workflow test
Cons:
- Requires group buy-in
- More coordination
Best for: Agile teams, fast-moving organizations
Selection Criteria
Look for:
| Criterion | Why It Matters |
|---|---|
| Volunteers | Adoption works when people want it |
| Diverse work types | Mix of Feature, Bug, Support, Config work |
| Reasonable manager | Will actually use the data, not weaponize it |
| Stable project | Not in crisis mode (skews baseline) |
| Technical comfort | Can install browser extension |
Avoid:
❌ Teams in firefighting mode (skews data) ❌ Skeptical managers (kills adoption) ❌ Solo developers with no manager buy-in ❌ Teams with <1 week left on project
Week 1: Setup & Calibration
Day 1: Monday — Installation & Onboarding
Morning (30 minutes):
Install browser extension
- Download from internal deployment or public store
- Configure API endpoint
- Authenticate with SSO or team token
Quick training (15 minutes)
- Watch 5-minute tagging demo video
- Review Observable Signals guide
- Download Quick Reference Card
What to tag:
Work Type: Feature | Bug | Blocker | Support | Config | Research
Intensity: x1 (routine) to x10 (high load)
Rule of thumb:
- x1-x3: No mental fatigue
- x4-x6: Concentrated effort
- x7-x10: Grinding, stuck, frustrated- Tag your first task
- Pick task you're currently on
- Choose work type
- Estimate intensity (make your best guess)
- Click "Save Tag"
Afternoon:
- Tag 2-3 more tasks as you switch work
- Note which parts feel unclear (we'll calibrate tomorrow)
Expected tags Day 1: 3-5 per person (getting comfortable)
Day 2: Tuesday — Intensity Calibration
Morning standup (5 minutes):
Quick round-robin:
- "What intensity did you log yesterday?"
- "Did any task feel harder than the number you gave it?"
- "Any confusion on work types?"
Intensity calibration exercise:
Review these scenarios together:
| Scenario | Initial Guess | Calibrated | Why |
|---|---|---|---|
| Reviewed PR for familiar code | x2 | x1 | Too easy to be x2, truly routine |
| Debugged SQL performance | x6 | x8 | "I was stuck for 3 hours, felt exhausting" |
| Built CRUD endpoint | x4 | x3 | Standard feature work, not intense |
| Production incident | x9 | x9 | Correctly calibrated (crisis mode) |
The Physical Test:
x1: Could do this while half-asleep
x3: Normal focus, feel fine after
x5: Deep focus, need a break after
x7: Frustrated, want to walk away
x10: Mentally fried, done for the dayAfternoon:
- Re-tag yesterday's work with calibrated understanding
- Tag today's work with new mental model
Expected outcome: Everyone aligns on intensity scale meaning
Day 3: Wednesday — Work Type Clarity
Morning standup (5 minutes):
Discuss edge cases:
| Scenario | Tag | Why |
|---|---|---|
| "Helping teammate debug their code" | Support, x2 | Assisting others, low personal effort |
| "Fixing bug in feature I just built" | Bug, x4 | It's a defect, even if recent |
| "Stuck waiting for QA environment" | Blocker, x5 | Can't proceed, frustration building |
| "Reading docs for new library" | Research, x3 | Exploration, not building yet |
| "Updating CI/CD config" | Config, x2 | Tooling/environment work |
Common confusions resolved:
Q: "What if I'm blocked on a bug?" A: Tag as Blocker (primary) + note "waiting on DB team" (we track bottlenecks, not just bug fixing)
Q: "I spent 4 hours on a feature, but 2 hours was routine and 2 hours was hard. How do I tag?" A: Two tags: Feature, x2 and Feature, x7. Be granular.
Q: "Do I tag meetings?" A: Only if it's substantive work (architecture review = Research, x4). Skip standup.
Afternoon:
- Tag with confidence on work types
- Note any remaining edge cases for Friday review
Expected outcome: <10% tag ambiguity
Day 4: Thursday — Pattern Recognition
Morning:
Pilot participants get access to Developer View dashboard:
Your Heatmap (This Week):
├── Monday: 12 intensity (🟩 Normal)
├── Tuesday: 18 intensity (🟨 Warm) — Calibration adjustments
├── Wednesday: 8 intensity (🟦 Cool)
└── Thursday: (in progress)
Tag Distribution:
├── Feature: 45%
├── Bug: 25%
├── Blocker: 15%
├── Support: 10%
├── Config: 5%Self-reflection questions (5 minutes):
- Does my heatmap match how I felt this week?
- Am I surprised by my tag distribution?
- Did I catch any Pain Streaks forming?
Pair discussion (10 minutes):
- Share your heatmap with one other pilot participant
- Discuss: "What does this data tell you about your week?"
- Look for patterns: "Why was Tuesday warm?"
Afternoon:
- Continue tagging
- Start thinking about patterns
Expected outcome: First "aha moment" — data matches reality
Day 5: Friday — Week 1 Retrospective
Pilot team meeting (30 minutes):
Agenda:
Data review (10 minutes)
Manager shares (anonymized or open, team decides):
- Team heatmap view
- Tag distribution
- Any Pain Streaks detected (probably none yet, Week 1 is baseline)
Calibration check (5 minutes)
- Does everyone agree on intensity scale now?
- Any work types still confusing?
- Are tags taking >30 seconds? (If yes, simplify)
Early insights (10 minutes)
Go around the table:
- "One thing I learned about my work this week?"
Examples:
- "I didn't realize 30% of my time was Support"
- "Tuesday felt bad because I had 3 blockers stacked"
- "I thought I was productive Wednesday, but it was all Config work"
Week 2 goals (5 minutes)
- Goal: Catch one Pain Streak before it becomes a problem
- Goal: Identify one Bus Factor = 1 situation
- Goal: 90%+ tagging consistency
Expected outcome: Team is comfortable, sees value, ready to continue
Week 2: Pattern Detection
Day 6: Monday — Baseline Established
Morning standup (3 minutes):
- Quick reminder: Tag as you work
- New goal: "Let's see if we can detect a Pain Streak this week"
Manager setup:
Enable Pain Streak Alerts:
- Threshold: 3 days on same task/tag combo
- Alert delivery: Slack DM or email
- Auto-check: Daily at 5pm
Developers:
Tag as usual. Week 1 was calibration, Week 2 is real signal detection.
Day 7: Tuesday — First Pain Streak Alert (Maybe)
Scenario: Alice gets 🔥 Streak: 2 days notification
Task: PROJ-42 "Fix payment gateway timeout"
Tag: Blocker, x8
Streak: Day 2 (Monday + Tuesday)Manager action:
Check-in (5 minutes): "Hey Alice, I see you've been on the payment bug for 2 days at high intensity. Need help?"
Options:
- Pair with senior developer
- Escalate to vendor support
- Deprioritize for now if not critical
Outcome:
- Alice paired with Bob (knows payment system)
- Blocker resolved by end of day
- Cascade prevented: Rushed fix that could have caused production incident
Value demonstrated: $245K cascade avoided (see Cascade Effect example)
Day 8: Wednesday — Bus Factor Detection
Manager View shows:
Module Ownership (Last 2 Weeks):
├── Payment Gateway: Alice (100%)
├── Auth Service: Carol (90%), David (10%)
├── API Core: Bob (60%), Alice (40%)
└── Frontend: David (70%), Carol (30%)
🚨 Bus Factor = 1: Payment GatewayManager action:
- Immediate: Schedule knowledge transfer session (Friday)
- This sprint: Pair Carol with Alice on payment work
- Next month: Cross-training plan for all critical modules
Value demonstrated: If Alice leaves, payment module continuity preserved
Day 9: Thursday — Tag Analysis Insight
Tag Analysis View shows:
Team Tag Distribution (Week 2):
├── Feature: 140 intensity (good, innovation work)
├── Bug: 80 intensity (normal)
├── Blocker: 180 intensity (🔴 Alert: 2× baseline)
├── Support: 60 intensity (normal)
└── Config: 120 intensity (🟡 Warning: Environment issues?)
Drill-down on Blocker spike:
- 3 of 5 developers blocked on "QA environment down"
- Total lost capacity: ~12 hours this weekManager action:
- Immediate: Escalate to DevOps (QA env priority)
- Root cause: Deployment script broken Monday, not fixed until Thursday
- Prevention: QA env monitoring alert added
Value demonstrated: Systemic issue caught, future cascades prevented
Day 10: Friday — Pilot Phase Retrospective
Pilot team meeting (45 minutes):
Agenda:
Success stories (15 minutes)
Share wins:
- Alice's Pain Streak caught early
- Bus Factor = 1 on Payment identified
- Config spike revealed QA env issue
Metrics review (10 minutes)
Adoption:
- Tagging consistency: 85%+ (target: 90%+)
- Avg time per tag: 25 seconds (target: <30s)
- Dashboard usage: Manager checked 4×/week
Value:
- Pain Streaks detected: 1
- Cascades prevented: 1 (estimated $245K savings)
- Systemic issues surfaced: 1 (QA env downtime)
- Bus Factor = 1 situations: 1 (cross-training initiated)
Learnings (10 minutes)
What worked:
- Daily tagging became habit by Day 3
- Manager using data proactively, not punitively
- Dashboard matched "gut feel" (validated framework)
What needs improvement:
- Tag selector UI could be faster (1-click for common tags?)
- Some edge cases still unclear (support vs collaboration?)
- Need reminder to tag at end of day
Go/No-Go decision (10 minutes)
Criteria for full rollout:
Criterion Status Notes Tagging adoption >85% ✅ 85% Good enough At least 1 actionable insight ✅ 3 insights Exceeded Team sentiment positive ✅ Unanimous Everyone wants to continue Manager buy-in ✅ Strong Already using data Technical stability ✅ No issues Extension + API solid Decision: ✅ GO for full team rollout
Post-Pilot: Preparing for Rollout
Week 3 Prep Activities
Manager:
Document pilot wins
- Write 1-page summary for leadership
- Include: Alice's Pain Streak, Bus Factor finding, QA env issue
- Quantify value: $245K cascade prevented (conservative)
Plan team onboarding
- Schedule 30-minute team kickoff meeting
- Prepare demo using pilot team's real data (anonymized if needed)
- Assign onboarding buddy for each new user
Review thresholds
- Based on pilot data, are default thresholds right?
- Intensity color bands: <5 (cool), 5-15 (normal), 15-30 (warm), 30+ (critical)
- Adjust if needed for team norms
Pilot participants:
Become champions
- Share your experience in team channels
- Offer to pair with new users for first tagging session
- Be honest about what worked and what didn't
Refine workflows
- Identify optimal tagging moments (end of day vs real-time?)
- Share tips (e.g., "I tag in standup for yesterday's work")
Common Pilot Issues & Solutions
Issue 1: "I keep forgetting to tag"
Solutions:
- Browser extension badge reminder
- End-of-day Slack reminder bot
- Buddy system (pair checks in daily)
- Make it a standup ritual
Issue 2: "I'm not sure if this is x5 or x7"
Solution:
Use comparative questions:
- "Was this harder than the last x5 task I did?"
- "Did I need a break after, or could I continue?"
- "On a scale of 1-10 physical exhaustion, how tired am I?"
Remember: Precision doesn't matter as much as consistency. If you're consistently calling similar work x6, the patterns will emerge.
Issue 3: "Manager is using this against me"
RED FLAG. Stop pilot immediately.
Required conversation:
"HEAT is a team health tool, not a performance review system. Using it punitively will kill adoption and trust."
If manager doesn't course-correct, escalate or abandon pilot.
Issue 4: "This feels like surveillance"
Clarify:
| HEAT Measures | HEAT Does NOT Measure |
|---|---|
| Effort intensity patterns | Hours worked |
| Team health signals | Individual productivity scores |
| Systemic friction | Who is "best" or "worst" |
| Knowledge distribution | Surveillance metrics |
Opt-in philosophy:
- You choose what to tag
- Your data is visible to you first
- Manager sees patterns, not granular activity
If concern persists, review Architecture privacy section together.
Issue 5: "Tags don't match my reality"
Investigate:
Calibration issue?
- Review intensity scale together
- Use Physical Test framework
Work type confusion?
- Review Observable Signals
- Clarify edge cases
Legitimate gap?
- Maybe HEAT isn't capturing your work type (e.g., heavy meeting load)
- Consider custom tags or note this as limitation
Remember: Pilot is for finding these gaps. Iterate.
Pilot Success Metrics
Quantitative (Minimum Thresholds)
| Metric | Target | Acceptable | Concern |
|---|---|---|---|
| Tagging consistency | >90% | >80% | <80% |
| Avg time per tag | <20s | <30s | >30s |
| Dashboard usage (manager) | Daily | 3×/week | <2×/week |
| Actionable insights | 3+ | 1+ | 0 |
| Pain Streaks detected | 1+ | 0 (Week 1 only) | 0 (Week 2) |
Qualitative (Must-Haves)
✅ Team sentiment: Majority positive or neutral ✅ Manager buy-in: Using data proactively ✅ Technical stability: No blocking bugs ✅ Privacy comfort: No surveillance concerns ✅ Value proof: At least 1 "aha moment"
Pilot Deliverables (End of Week 2)
For Leadership:
One-page summary
- Pilot team: 5 developers, 2-week trial
- Key findings: [List 3-5 insights]
- Value demonstrated: [$X cascade prevented]
- Recommendation: Proceed to full rollout
Example dashboard screenshot (anonymized)
- Team heatmap
- Tag distribution
- Pain Streak example
For Team:
Onboarding guide (based on pilot learnings)
- 5-minute quick start
- Common edge cases FAQ
- Tag cheat sheet
Manager playbook
- Daily/weekly/monthly rituals
- Intervention matrix (when to act on signals)
- Example check-in scripts