workflow:full-stack-system-reliability-review
Full-Stack System Reliability Review overview
Reviews end-to-end system reliability across the full technology stack -- analyzing frontend error rates, API latency percentiles, and database query performance together as correlated signals, evaluating SLO attainment across web, API, and data tiers, reviewing cloud infrastructure capacity headroom and auto-scaling policy effectiveness, auditing observability coverage for blind spots in tracing, logging, and metrics pipelines, assessing CI/CD pipeline reliability and deployment success rates, stress-testing failover procedures across availability zones, and correlating incident frequency with recent change velocity. Produces cross-stack reliability scorecard, SLO attainment report, and prioritized hardening backlog. Excludes feature development.
Attributes
Outgoing edges
- domain:web-development·DomainWeb Development
- domain:databases·DomainDatabases
- domain:cloud-infra·DomainCloud Infrastructure
- domain:observability·DomainObservability
- domain:devops·DomainDevOps
- role:staff-engineer·RoleStaff Engineer
- role:platform-engineer·RolePlatform Engineer
- role:incident-commander·RoleIncident Commander
- org-unit:engineering·OrgUnitEngineering
- org-unit:infra-engineering·OrgUnitInfrastructure Engineering
- org-unit:incident-response-team·OrgUnitIncident Response Team
- skill-area:sli-slo-management·SkillAreaSLI / SLO Management
- skill-area:observability-pipeline·SkillAreaObservability Pipeline
- skill-area:chaos-engineering·SkillAreaChaos Engineering
- responsibility:slo-definition·ResponsibilitySLO definition
- responsibility:capacity-planning·ResponsibilityCapacity Planning
- responsibility:review-architecture-changes·ResponsibilityReview architecture changes