◈ Operations Dashboard
🔍 Search... ⌘K
🔔3
⚡LIVE
47
Open Incidents
🔴CRIT
5
P1/P2 Active
↑ 2 since yesterday
⏱AVG
2h 14m
MTTR (Mean Time to Resolve)
↓ 18% vs last week
✓SLA
94.2%
SLA Compliance
△FIRE
14
Active Alerts
⟲WIP
4
Changes In Progress
2 scheduled today
01 Incident Trend — 7 Days
P1P2P3P4
02 Recent Incidents
| Number | Priority | State | Description | Assigned | SLA | Created |
|---|
03 Active Alerts — Firing
| Alert Name | Severity | Config Item | Fired | Action |
|---|
04 By State
05 By Category
06 On-Call Team
Platform Engineering
VK
Vivek K
Primary On-Call • 📱 +91-9876XXXXX
SP
Santhosh P
Secondary • DevOps
07 SLA Compliance
P1 — 87%
P2 — 92%
P3 — 97%
P4 — 99%
⚡ Incident Management
P1 Active
SLA Watch
32
Total Open
3
P1 — Critical
7
P2 — High
4
SLA Breached
| Number | Priority | State | Description | Assigned To | SLA | Created | Actions |
|---|
⟲ Change Management
12 Active
12
Total Changes
3
Pending Approval
92%
Success Rate
| Number | Type | State | Risk | Description | Assigned | Planned Start | Approval |
|---|
◉ Problem & KEDB
Root Cause Analysis
8
Total Problems
3
Under Investigation
4
Known Errors
2
RCA Complete
Problem List
| Number | Priority | State | Description | Known Error | Linked INC | RCA Status |
|---|
Known Error Database (KEDB)
| KE ID | Problem | Root Cause | Workaround | Uses |
|---|---|---|---|---|
| KE0012 | etcd slow → API latency | auto-compaction disabled | Manual compact+defrag | 5× |
| KE0011 | MongoDB pool exhaustion | maxPoolSize=50 too low | Restart pods, increase pool | 3× |
| KE0010 | cert-manager CRD mismatch | CRD v1.12 vs operator v1.14 | Manual cert renewal | 2× |
| KE0009 | Ollama GPU OOM Qwen3-32B | 32B needs 48GB, 4090=24GB | Use Q4 quantized model | 4× |
△ Alert Engine
14 Firing
5
Critical
7
Warning
2
Info
23
Resolved (24h)
| Alert Name | Severity | Status | Source | Config Item | Fired At | Actions |
|---|
▣ Assets / CMDB
30 Configuration Items
24
Live
4
Maintenance
2
Decommissioned
◎ Reports & Analytics
Incidents
SLA
Teams
Changes
312
Total (30 days)
278
Resolved
2h 14m
Avg MTTR
12
SLA Breaches
Priority Distribution
Category Breakdown
P1 — 87%
P2 — 92%
P3 — 97%
P4 — 99%
| Team | Assigned | Resolved | MTTR | SLA % |
|---|---|---|---|---|
| Platform Engineering | 42 | 38 | 1h 52m | 96% |
| DevOps | 31 | 28 | 2h 06m | 93% |
| DBA | 18 | 16 | 3h 24m | 88% |
| AI/ML Engineering | 12 | 10 | 4h 10m | 85% |
| Network Operations | 9 | 9 | 1h 30m | 100% |
92%
Success Rate
3%
Rollback Rate
45m
Avg Downtime
⬡ Integration Hub
◬ Teams & On-Call
⚡ Create New Incident✕
Auto-Calculated Priority
P4