Step-by-step guide to building a zero‑downtime CI/CD pipeline on AWS with GitHub Actions
You’ve probably heard the phrase “zero‑downtime deployment” tossed around in meetings, and you know it sounds great—but how do you actually make it happen? In today’s fast‑moving cloud world, a single hiccup can cost users trust and revenue. That’s why I’m sharing a practical, no‑fluff walk‑through that gets your code from GitHub to a live AWS service without ever taking the site offline.
Why zero‑downtime matters right now
Most modern apps run 24/7, and users expect updates to be invisible. A broken release can trigger a cascade of alerts, angry tickets, and sleepless nights for the on‑call team. Building a pipeline that updates safely, rolls back automatically, and keeps traffic flowing is no longer a nice‑to‑have—it’s a baseline expectation.
High‑level picture
Before we dive into commands, let’s sketch the flow:
- Code push – Developer pushes to GitHub.
- GitHub Actions – Runs tests, builds Docker image, pushes to ECR.
- Blue/Green deployment – New version is launched in a separate environment (the “green” stack) while the current version (the “blue” stack) continues serving traffic.
- Traffic shift – Using AWS Elastic Load Balancer (ELB) or Route 53, we gradually move users to the green stack.
- Verification – Health checks confirm the new version works.
- Cutover or rollback – If everything is green, we retire the old stack; otherwise we roll back.
That pattern gives us true zero‑downtime because the old stack never goes away until the new one is proven healthy.
Prerequisites
- An AWS account with permissions to create ECR repositories, ECS clusters, ALB, and IAM roles.
- A GitHub repository with your application code.
- Basic familiarity with Docker, ECS (Elastic Container Service), and GitHub Actions.
- The AWS CLI installed locally (optional but handy for testing).
Step 1: Set up an ECR repository
ECR (Elastic Container Registry) is AWS’s private Docker registry. It stores the images your pipeline will build.
aws ecr create-repository --repository-name my-app --region us-east-1
Take note of the repository URI that the command returns – you’ll need it in the GitHub Actions workflow.
Step 2: Create an ECS cluster and two services (blue & green)
We’ll use the “blue/green” model inside a single ECS cluster.
aws ecs create-cluster --cluster-name my-app-cluster --region us-east-1
Now define two services, each pointing to the same task definition but with different names:
aws ecs create-service \
--cluster my-app-cluster \
--service-name my-app-blue \
--task-definition my-app-task \
--desired-count 2 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-abc123],securityGroups=[sg-xyz789],assignPublicIp=ENABLED}" \
--region us-east-1
aws ecs create-service \
--cluster my-app-cluster \
--service-name my-app-green \
--task-definition my-app-task \
--desired-count 0 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-abc123],securityGroups=[sg-xyz789],assignPublicIp=ENABLED}" \
--region us-east-1
The blue service starts with traffic; the green service stays at zero instances until we’re ready to switch.
Step 3: Hook an Application Load Balancer (ALB)
Create an ALB that forwards to both services. The trick is to use target groups—one for blue, one for green.
aws elbv2 create-load-balancer \
--name my-app-alb \
--subnets subnet-abc123 subnet-def456 \
--security-groups sg-xyz789 \
--region us-east-1
aws elbv2 create-target-group \
--name tg-blue \
--protocol HTTP \
--port 80 \
--vpc-id vpc-123abc \
--target-type ip \
--region us-east-1
aws elbv2 create-target-group \
--name tg-green \
--protocol HTTP \
--port 80 \
--vpc-id vpc-123abc \
--target-type ip \
--region us-east-1
Register the blue service’s tasks with tg-blue. The green service will be registered later when we spin up its tasks.
Create a listener that forwards to tg-blue by default:
aws elbv2 create-listener \
--load-balancer-arn <alb-arn> \
--protocol HTTP \
--port 80 \
--default-actions Type=forward,TargetGroupArn=<tg-blue-arn> \
--region us-east-1
Step 4: Write the GitHub Actions workflow
Create .github/workflows/deploy.yml in your repo.
name: CI/CD Pipeline
on:
push:
branches: [ main ]
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Log in to Amazon ECR
uses: aws-actions/amazon-ecr-login@v2
with:
region: us-east-1
- name: Build and push Docker image
env:
ECR_REGISTRY: ${{ secrets.ECR_REGISTRY }}
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY:latest .
docker tag $ECR_REGISTRY:latest $ECR_REGISTRY:$IMAGE_TAG
docker push $ECR_REGISTRY:$IMAGE_TAG
docker push $ECR_REGISTRY:latest
- name: Update ECS task definition
id: taskdef
uses: aws-actions/amazon-ecs-render-task-definition@v1
with:
task-definition: ecs-task-def.json
container-name: my-app
image: ${{ secrets.ECR_REGISTRY }}:${{ github.sha }}
- name: Deploy green service
run: |
aws ecs update-service \
--cluster my-app-cluster \
--service my-app-green \
--task-definition ${{ steps.taskdef.outputs.task-definition }} \
--desired-count 2 \
--region us-east-1
- name: Shift traffic to green
run: |
# Change ALB listener to point to green target group
aws elbv2 modify-listener \
--listener-arn <listener-arn> \
--default-actions Type=forward,TargetGroupArn=<tg-green-arn> \
--region us-east-1
- name: Wait for green health checks
run: |
# Simple loop that checks target health
for i in {1..30}; do
HEALTH=$(aws elbv2 describe-target-health --target-group-arn <tg-green-arn> --region us-east-1 \
--query 'TargetHealthDescriptions[*].TargetHealth.State' --output text)
if [[ "$HEALTH" == "healthy" ]]; then
echo "Green service is healthy"
exit 0
fi
echo "Waiting for green service..."
sleep 10
done
echo "Green service failed health checks"
exit 1
- name: Retire blue service
if: success()
run: |
aws ecs update-service \
--cluster my-app-cluster \
--service my-app-blue \
--desired-count 0 \
--region us-east-1
- name: Switch ALB back to blue (now new version)
if: success()
run: |
aws elbv2 modify-listener \
--listener-arn <listener-arn> \
--default-actions Type=forward,TargetGroupArn=<tg-blue-arn> \
--region us-east-1
- name: Clean up old tasks
if: success()
run: |
aws ecs update-service \
--cluster my-app-cluster \
--service my-app-green \
--desired-count 0 \
--region us-east-1
What the workflow does
- Checkout – pulls the latest code.
- Docker build – creates an image and pushes it to ECR with both
latestand a SHA tag. - Task definition render – updates the ECS task JSON to point at the new image.
- Deploy green – spins up the green service with the new image.
- Traffic shift – tells the ALB to send traffic to the green target group.
- Health check loop – waits until the green tasks report healthy.
- Retire blue – scales the old service down.
- Swap listener back – now the blue target group points at the new version, so future shifts will use it as the “blue” baseline.
- Cleanup – turns the green service off until the next release.
If any step fails, the pipeline stops and the old (blue) version stays live, giving you a built‑in rollback.
Step 5: Store secrets safely
In your GitHub repo settings, add the following secrets:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYECR_REGISTRY– the URI of the ECR repo (e.g.,123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app)
Never hard‑code credentials; GitHub Actions will inject them at runtime.
Step 6: Test it locally
Before you push to main, run the workflow on a feature branch and watch the AWS console. Verify that:
- The green service spins up with the new image.
- The ALB health checks turn green.
- Traffic moves without any 5xx errors.
A quick curl loop against the public ALB URL can surface any hiccups early.
Common pitfalls and how to avoid them
| Issue | Why it happens | Fix |
|---|---|---|
| Health check timeout | Target group health check path returns 404 or takes too long | Make sure your app responds to /health quickly and returns 200 |
| IAM permission errors | GitHub Actions role missing ecs:UpdateService or elasticloadbalancing:* | Add the missing actions to the IAM policy attached to the GitHub OIDC role |
| Image not found | SHA tag not pushed or task definition still points at old tag | Confirm both latest and SHA tags are pushed; double‑check the task definition rendering step |
A quick personal note
When I first tried blue/green on a legacy monolith, I spent an entire weekend chasing a stray environment variable that broke the green stack. The lesson? Keep your config as code and test it in the same container you’ll ship. That habit saved me countless late‑night fire drills later.
Wrap‑up
Zero‑downtime deployments feel like magic until you see the pieces line up: a clean Docker image, a well‑defined task, two identical services, and an ALB that can flip traffic like a light switch. With GitHub Actions handling the orchestration, you get repeatable, auditable releases that keep users blissfully unaware of any behind‑the‑scenes work.
Give this pipeline a spin on a sandbox account, tweak the health checks to match your app, and you’ll have a solid foundation for reliable, fast releases. The DevOps Chronicle will keep sharing more tips, so stay tuned for deeper dives into canary releases and automated rollback strategies.
- → Building a Personal CI/CD Pipeline with GitHub Actions @techbrew
- → How to Build a CI/CD Pipeline with GitHub Actions for Faster Deployments @techfrontier
- → Implement GitOps with ArgoCD on Kubernetes @cloudcraft
- → Design a cost‑optimized multi‑region serverless architecture on AWS: step‑by‑step guide @cloudcraft
- → Step‑by‑Step Server Documentation Workflow to Reduce Errors and Speed Deployments @checkpresenterinsights