Production Readiness Scorecard Example
Production readiness overview
Production readiness is about ensuring your software is secure, reliable, and fully observable for operational use. It minimizes downtime, improves user experiences, and reduces the chance of critical failures in a live environment.
Achieving production readiness involves setting up cross-functional standards like testing, monitoring, documentation, code reviews, observability, security controls, and deployment workflows, often across different infrastructure like AWS, GitHub, and Kubernetes. However, keeping up with the velocity of modern software development is a challenge. Many organizations still rely on spreadsheets, wikis, Git repositories, or project management software to manage this process.
But production readiness isn’t a “one-time” check; it’s an ongoing process. With software components constantly evolving, it’s essential to ensure that they remain production-ready over time. That means having a framework in place to assess ongoing health, check for functionality, latency, and error rates, and monitor adherence to standards. This allows teams to continuously align with best practices and reduce the burden on developers.
Key Elements to Track for Production Readiness
🥇 Gold Tier Requirements:
Code Coverage -> Achieve at least 90% code coverage to ensure comprehensive testing.
Critical Issues -> No open critical issues in tracking systems like Jira.
Vulnerabilities -> Zero known vulnerabilities identified by tools like SonarQube.
Alerting Configuration -> System alerts set up for monitoring key metrics.
Escalation Policies -> Defined on-call escalation policies with at least two levels.
Secure Design Reviews -> Conduct periodic reviews for secure design validation.
Availability and Latency SLOs -> Meet set SLO targets for availability and latency consistently.
Traffic KPIs -> Track key traffic metrics to monitor performance in real-time.
P99 Latency Tracking -> Maintain logs for P99 latency metrics.
Error Rate Tracking -> Ensure error tracking is enabled to capture incidents in real-time.
🥈 Silver Tier Requirements:
Health Dashboards -> Have dashboards for real-time system health visibility.
Code Coverage -> Maintain code coverage of at least 75%.
Advanced Security -> Enable advanced security scanning (e.g., Dependabot, CodeQL).
Rollback Documentation -> Document rollback processes for quick recovery.
Runbooks -> Provide runbooks for all critical processes.
Observability Integration -> Ensure observability tools are integrated as sources for each service.
🥉Bronze Tier Requirements:
Git Ignore -> Confirm a
.gitignore
file is present in each repository.Git Integration -> Integrate the Git provider as a data source in the catalog.
Incident Management Integration -> Integrate with incident management tools to handle issues.
Peer Reviews for PRs -> Require peer reviews for all pull requests.
Recent Deployments -> Track and log recent deployments.
Owner Assignment -> Ensure at least one owner is assigned to each service.
On-Call Definition -> Establish an on-call rotation for handling incidents.
README File -> Ensure that each repository includes a README for quick reference.
SLOs for Availability and Latency -> Define SLOs for key metrics such as availability and latency.
By following these tiers, you can ensure that your services remain production-ready and avoid manual processes to keep software aligned with ongoing standards. With a framework in place, you’ll not only meet today’s needs but also be prepared to scale and adapt as software evolves.
Example production readiness scorecard code definition
Last updated