🚀 Executive Summary
TL;DR: Developers often deliver code deemed ‘done’ that lacks critical operational requirements for production, leading to deployment challenges. This article outlines how Senior DevOps Engineers can bridge this gap by implementing temporary ‘wrapper’ solutions, establishing a formal ‘Production Readiness Checklist’ as part of the ‘Definition of Done’, and automating enforcement through CI/CD pipeline gates.
🎯 Key Takeaways
- The ‘Wrapper’ approach involves creating a new Dockerfile and a smart entrypoint script to securely fetch secrets (e.g., from AWS Secrets Manager) at runtime, isolating developer code from production configurations.
- A ‘Production Readiness Checklist’ should be collaboratively defined with development teams, formalizing requirements such as environment variable configuration, multi-stage Dockerfiles with pinned base image versions, JSON logging to stdout/stderr, and dedicated health check endpoints.
- The ‘Paved Road’ approach leverages automated CI/CD pipeline gates, including linting, static analysis, security scanning (e.g., Trivy, Snyk), and custom checks, to enforce production-ready standards and provide immediate feedback on non-compliant code.
Tired of developers handing you code that only works on their laptop? Here’s how a Senior DevOps Engineer bridges the gap between ‘done’ and ‘deployable’ without starting a war.
From Laptop to Launch: My Guide to Handling “Production-Ready” Code That Isn’t
I still remember the night. It was 2 AM, the go-live was in six hours, and a junior full-stack dev had just triumphantly messaged “It’s done!” on Slack. He’d handed me a zip file with a single Dockerfile. My blood ran cold when I opened it. It was pulling `node:latest`, had AWS keys hardcoded directly in the source, and the entry command was just `npm start`. It “worked on his machine,” of course. That night, fueled by lukewarm coffee and pure adrenaline, we had to rebuild the entire artifact, rip out the secrets, and write a proper deployment manifest. We made the launch, but just barely. This situation, born from a Reddit thread about getting free dev work, is one I’ve seen play out a dozen times. The problem isn’t malice; it’s a fundamental disconnect in what “done” means.
The Core of the Problem: Two Worlds, One Goal
Let’s be clear: most developers aren’t trying to make our lives difficult. Their job is to solve a business problem with code. They think in terms of features, user interfaces, and logic. Their definition of “done” is often “the feature works as specified.”
Our world—the world of Ops, SRE, and Cloud Architecture—is completely different. We think in terms of:
- Reliability: Will this fall over if two people use it at once? What about ten thousand?
- Security: Are we leaking credentials? Is the container running as root?
- Observability: How will we know it’s broken before the customer does? Where are the logs going?
- Scalability: Can we run more than one instance of this? Is it stateless?
- Deployability: Can I deploy this with a single command, and more importantly, can I roll it back just as easily?
When a developer hands you “production-ready” code that isn’t, it’s because their checklist for “ready” is missing all of our items. Your job isn’t just to deploy the code; it’s to bridge that gap. Here are three ways to do it, from the trenches.
Solution 1: The Quick Fix (The “Wrapper” Approach)
This is the 2 AM, “we have to launch tomorrow” solution. You’ve been handed the artifact, and there’s no time to send it back for a rewrite. You accept their code as a black box and build a production-safe “wrapper” around it.
Let’s say they give you a Docker image named dev-app:latest that requires a database password passed in as `DB_PASS`.
Steps:
- Create a New Dockerfile: You create a new Dockerfile in your own repo.
- Write a Smart Entrypoint Script: This script is where the magic happens. It will fetch secrets from a secure source (like AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault) before the application starts.
# Our new, production-safe Dockerfile
FROM dev-app:latest as base
# Add our own entrypoint script that handles secrets and configs
COPY entrypoint.sh /usr/local/bin/entrypoint.sh
RUN chmod +x /usr/local/bin/entrypoint.sh
# We could also add a non-root user here for security
# RUN addgroup -S appgroup && adduser -S appuser -G appgroup
# USER appuser
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
CMD ["npm", "start"]
#!/bin/sh
# entrypoint.sh
# Fetch the secret from AWS Secrets Manager and export it
# This assumes the container has an IAM role with permission.
export DB_PASS=$(aws secretsmanager get-secret-value --secret-id prod/my-app/db-password --query SecretString --output text)
# If the secret fetch fails, exit immediately so the container fails to start.
if [ -z "$DB_PASS" ]; then
echo "FATAL: Could not retrieve database password from Secrets Manager." >&2
exit 1
fi
# Now, execute the original command passed to the container
exec "$@"
Warning: This is a hack. It creates technical debt and doesn’t fix the root cause. You’re now maintaining a wrapper around their code, which can lead to its own set of problems. Use this to hit a deadline, but immediately follow up with Solution 2.
Solution 2: The Permanent Fix (The “Definition of Done” Approach)
This is the real solution. It’s a process and culture change. You need to work with the development team to formally agree on what “production-ready” actually means. You create a “Production Readiness Checklist” that becomes part of their “Definition of Done” for any new service.
This isn’t a document you throw over the wall. You sit down with the dev lead and create it together. It should be non-negotiable for any code intended for production.
Sample Checklist Table:
| Category | Requirement | Why it Matters |
|---|---|---|
| Configuration | Application MUST be configurable via environment variables. No hardcoded secrets, IPs, or hostnames. | Allows us to run the same artifact in dev, staging, and prod without code changes. Essential for security. |
| Containerization | A multi-stage Dockerfile MUST be provided, pinning base image versions (e.g., node:18-alpine, not node:latest). |
Prevents surprise “it broke in the pipeline” builds and slims down the final image. |
| Observability | Application MUST log to stdout/stderr in JSON format. |
Allows our log aggregator (Datadog, Splunk) to parse and index logs automatically. |
| Health Checks | A /healthz or similar endpoint MUST be exposed that returns a 200 OK if the app is healthy. |
Required for Kubernetes readiness/liveness probes and load balancer health checks. |
By making this a formal part of their workflow, you shift the responsibility left. They can’t mark a ticket as “done” until it meets these operational requirements.
Solution 3: The ‘Nuclear’ Option (The “Paved Road” Approach)
Sometimes, even with a checklist, old habits die hard. If you’re still getting non-compliant code, it’s time to enforce the rules automatically. This is where you build a “paved road”—a CI/CD pipeline that makes it easy to do the right thing and impossible to do the wrong thing.
In this model, developers don’t build their own images. They push code, and the pipeline does the rest. But the pipeline has gates.
Pipeline Gates:
- Linting & Static Analysis: The pipeline automatically runs a linter. It can even have custom rules, like “fail the build if the string ‘AWS_SECRET_ACCESS_KEY’ is found in the source code.”
- Security Scanning: After the Docker image is built, the pipeline runs a scanner like Trivy or Snyk. If it finds a critical CVE in a dependency (like that `node:latest` image they love), the build fails.
- Checklist Enforcement: The pipeline can run a script that checks for the existence of a `Dockerfile` that uses `latest`, or checks that a health check endpoint is defined in the web framework’s routes. No check, no pass.
Pro Tip: This approach seems harsh, but it often leads to the best results. It removes ambiguity. The pipeline becomes the single source of truth for what is acceptable in production. It stops being a personal argument between you and the developer and becomes a simple, impersonal “the build is red.” The developer gets instant feedback and learns what’s required to make it green.
Ultimately, the goal is collaboration, not confrontation. By setting clear expectations, providing tools, and automating enforcement, you can turn the dreaded “it works on my machine” into a reliable, repeatable “it’s live in production.”
🤖 Frequently Asked Questions
âť“ What is the fundamental disconnect between developers and operations regarding ‘production-ready’ code?
Developers typically define ‘done’ by feature functionality, while operations focuses on reliability, security, observability, scalability, and deployability, leading to code that works on a developer’s machine but isn’t ready for production environments.
âť“ How do the ‘Wrapper’ and ‘Definition of Done’ approaches compare for achieving production readiness?
The ‘Wrapper’ approach is a quick, temporary fix that encapsulates non-compliant code with production-safe configurations, incurring technical debt. The ‘Definition of Done’ approach is a permanent, collaborative solution that integrates operational requirements into the development workflow, shifting responsibility left.
âť“ What is a common implementation pitfall when developers provide ‘production-ready’ code, and how can it be addressed?
A common pitfall is hardcoding sensitive information like AWS keys or using unpinned base image versions (e.g., `node:latest`). This can be addressed by enforcing configuration via environment variables, utilizing secure secret management systems via smart entrypoint scripts, and implementing CI/CD pipeline gates for static analysis and security scanning.
Leave a Reply