Step-by-step guide to building a zero‑downtime CI/CD pipeline on AWS with GitHub Actions

You’ve probably heard the phrase “zero‑downtime deployment” tossed around in meetings, and you know it sounds great—but how do you actually make it happen? In today’s fast‑moving cloud world, a single hiccup can cost users trust and revenue. That’s why I’m sharing a practical, no‑fluff walk‑through that gets your code from GitHub to a live AWS service without ever taking the site offline.

Why zero‑downtime matters right now

Most modern apps run 24/7, and users expect updates to be invisible. A broken release can trigger a cascade of alerts, angry tickets, and sleepless nights for the on‑call team. Building a pipeline that updates safely, rolls back automatically, and keeps traffic flowing is no longer a nice‑to‑have—it’s a baseline expectation, much like adopting automation scripts that cut release time in half for small DevOps teams.

High‑level picture

Before we dive into commands, let’s sketch the flow:

Code push – Developer pushes to GitHub.
GitHub Actions – Runs tests, builds Docker image, pushes to ECR.
Blue/Green deployment – New version is launched in a separate environment (the “green” stack) while the current version (the “blue” stack) continues serving traffic, similar to patterns described in our automating Kubernetes deployments with ArgoCD guide.
Traffic shift – Using AWS Elastic Load Balancer (ELB) or Route 53, we gradually move users to the green stack.
Verification – Health checks confirm the new version works.
Cutover or rollback – If everything is green, we retire the old stack; otherwise we roll back.

That pattern gives us true zero‑downtime because the old stack never goes away until the new one is proven healthy.

Prerequisites

An AWS account with permissions to create ECR repositories, ECS clusters, ALB, and IAM roles.
A GitHub repository with your application code.
Basic familiarity with Docker, ECS (Elastic Container Service), and GitHub Actions.
The AWS CLI installed locally (optional but handy for testing).

Step 1: Set up an ECR repository

ECR (Elastic Container Registry) is AWS’s private Docker registry. It stores the images your pipeline will build.

aws ecr create-repository --repository-name my-app --region us-east-1

Take note of the repository URI that the command returns – you’ll need it in the GitHub Actions workflow.

Step 2: Create an ECS cluster and two services (blue & green)

We’ll use the “blue/green” model inside a single ECS cluster.

aws ecs create-cluster --cluster-name my-app-cluster --region us-east-1

Now define two services, each pointing to the same task definition but with different names:

aws ecs create-service \
  --cluster my-app-cluster \
  --service-name my-app-blue \
  --task-definition my-app-task \
  --desired-count 2 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-abc123],securityGroups=[sg-xyz789],assignPublicIp=ENABLED}" \
  --region us-east-1

aws ecs create-service \
  --cluster my-app-cluster \
  --service-name my-app-green \
  --task-definition my-app-task \
  --desired-count 0 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-abc123],securityGroups=[sg-xyz789],assignPublicIp=ENABLED}" \
  --region us-east-1

The blue service starts with traffic; the green service stays at zero instances until we’re ready to switch.

Step 3: Hook an Application Load Balancer (ALB)

Create an ALB that forwards to both services. The trick is to use target groups—one for blue, one for green.

aws elbv2 create-load-balancer \
  --name my-app-alb \
  --subnets subnet-abc123 subnet-def456 \
  --security-groups sg-xyz789 \
  --region us-east-1

aws elbv2 create-target-group \
  --name tg-blue \
  --protocol HTTP \
  --port 80 \
  --vpc-id vpc-123abc \
  --target-type ip \
  --region us-east-1

aws elbv2 create-target-group \
  --name tg-green \
  --protocol HTTP \
  --port 80 \
  --vpc-id vpc-123abc \
  --target-type ip \
  --region us-east-1

Create a listener that forwards to tg-blue by default:

aws elbv2 create-listener \
  --load-balancer-arn <alb-arn> \
  --protocol HTTP \
  --port 80 \
  --default-actions Type=forward,TargetGroupArn=<tg-blue-arn> \
  --region us-east-1

Step 4: Write the GitHub Actions workflow

Create .github/workflows/deploy.yml in your repo.

name: CI/CD Pipeline

on:
  push:
    branches: [ main ]

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2

      - name: Log in to Amazon ECR
        uses: aws-actions/amazon-ecr-login@v2
        with:
          region: us-east-1

      - name: Build and push Docker image
        env:
          ECR_REGISTRY: ${{ secrets.ECR_REGISTRY }}
          IMAGE_TAG: ${{ github.sha }}
        run: |
          docker build -t $ECR_REGISTRY:latest .
          docker tag $ECR_REGISTRY:latest $ECR_REGISTRY:$IMAGE_TAG
          docker push $ECR_REGISTRY:$IMAGE_TAG
          docker push $ECR_REGISTRY:latest

      - name: Update ECS task definition
        id: taskdef
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: ecs-task-def.json
          container-name: my-app
          image: ${{ secrets.ECR_REGISTRY }}:${{ github.sha }}

      - name: Deploy green service
        run: |
          aws ecs update-service \
            --cluster my-app-cluster \
            --service my-app-green \
            --task-definition ${{ steps.taskdef.outputs.task-definition }} \
            --desired-count 2 \
            --region us-east-1

      - name: Shift traffic to green
        run: |
          # Change ALB listener to point to green target group
          aws elbv2 modify-listener \
            --listener-arn <listener-arn> \
            --default-actions Type=forward,TargetGroupArn=<tg-green-arn> \
            --region us-east-1

      - name: Wait for green health checks
        run: |
          # Simple loop that checks target health
          for i in {1..30}; do
            HEALTH=$(aws elbv2 describe-target-health --target-group-arn <tg-green-arn> --region us-east-1 \
              --query 'TargetHealthDescriptions[*].TargetHealth.State' --output text)
            if [[ "$HEALTH" == "healthy" ]]; then
              echo "Green service is healthy"
              exit 0
            fi
            echo "Waiting for green service..."
            sleep 10
          done
          echo "Green service failed health checks"
          exit 1

      - name: Retire blue service
        if: success()
        run: |
          aws ecs update-service \
            --cluster my-app-cluster \
            --service my-app-blue \
            --desired-count 0 \
            --region us-east-1

      - name: Switch ALB back to blue (now new version)
        if: success()
        run: |
          aws elbv2 modify-listener \
            --listener-arn <listener-arn> \
            --default-actions Type=forward,TargetGroupArn=<tg-blue-arn> \
            --region us-east-1

      - name: Clean up old tasks
        if: success()
        run: |
          aws ecs update-service \
            --cluster my-app-cluster \
            --service my-app-green \
            --desired-count 0 \
            --region us-east-1

What the workflow does

Checkout – pulls the latest code.
Docker build – creates an image and pushes it to ECR with both latest and a SHA tag.
Task definition render – updates the ECS task JSON to point at the new image.
Deploy green – spins up the green service with the new image.
Traffic shift – tells the ALB to send traffic to the green target group.
Health check loop – waits until the green tasks report healthy.
Retire blue – scales the old service down.
Swap listener back – now the blue target group points at the new version, so future shifts will use it as the “blue” baseline.
Cleanup – turns the green service off until the next release.

If any step fails, the pipeline stops and the old (blue) version stays live, giving you a built‑in rollback.

Step 5: Store secrets safely

In your GitHub repo settings, add the following secrets:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
ECR_REGISTRY – the URI of the ECR repo (e.g., 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app)

Never hard‑code credentials; GitHub Actions will inject them at runtime.

Step 6: Test it locally

Before you push to main, run the workflow on a feature branch and watch the AWS console. Verify that:

The green service spins up with the new image.
The ALB health checks turn green.
Traffic moves without any 5xx errors.

A quick curl loop against the public ALB URL can surface any hiccups early.

Common pitfalls and how to avoid them

Issue	Why it happens	Fix
Health check timeout	Target group health check path returns 404 or takes too long	Make sure your app responds to `/health` quickly and returns 200
IAM permission errors	GitHub Actions role missing `ecs:UpdateService` or `elasticloadbalancing:*`	Add the missing actions to the IAM policy attached to the GitHub OIDC role
Image not found	SHA tag not pushed or task definition still points at old tag	Confirm both `latest` and SHA tags are pushed; double‑check the task definition rendering step

A quick personal note

When I first tried blue/green on a legacy monolith, I spent an entire weekend chasing a stray environment variable that broke the green stack. The lesson? Keep your config as code and test it in the same container you’ll ship. That habit saved me countless late‑night fire drills later.

Wrap‑up

Zero‑downtime deployments feel like magic until you see the pieces line up: a clean Docker image, a well‑defined task, two identical services, and an ALB that can flip traffic like a light switch. With GitHub Actions handling the orchestration, you get repeatable, auditable releases that keep users blissfully unaware of any behind‑the‑scenes work.

Give this pipeline a spin on a sandbox account, tweak the health checks to match your app, and you’ll have a solid foundation for reliable, fast releases. The DevOps Chronicle will keep sharing more tips, so stay tuned for deeper dives into canary releases, automated rollback strategies, and automating Kubernetes deployments with ArgoCD.