Python Microservices Made Simple: Deploying Flask Services with Docker and Kubernetes

If you’ve ever tried to squeeze a monolithic Django app onto a tiny VM and watched it sputter, you know why the microservice buzz feels like a breath of fresh air. In 2024 the cloud is a playground, and the tools to turn a single Flask endpoint into a resilient, auto‑scaling service are finally mature enough that you don’t need a PhD in distributed systems to get it right. For a practical walkthrough, see how to deploy a scalable Flask app with Docker and GitHub Actions.

Why Microservices Matter Right Now

The old “one‑code‑base‑to‑rule‑them‑all” approach is still hanging around in legacy shops, but the cost of that model is showing up in slower deployments, tangled dependencies, and a never‑ending battle with “it works on my machine.” Microservices let you isolate concerns, pick the right language or framework for each job, and—crucially—scale components independently. That means your user‑facing API can spin up more pods while a background worker stays snug on a single node, saving you both money and headaches.

The Flask Sweet Spot

Flask is the lightweight cousin of Django. It gives you just enough scaffolding to spin up a REST endpoint without the baggage of an ORM, admin panel, or built‑in authentication system. For a microservice that does one thing—say, calculate a shipping quote or validate a coupon—Flask’s minimalism translates to faster build times and a smaller attack surface. Plus, the Flask community has embraced Docker and Kubernetes early, so you’ll find plenty of examples to lean on.

Dockerizing Your First Flask Service

Before you can hand your code over to Kubernetes, you need a container image that runs everywhere. Here’s a no‑frills Dockerfile that keeps the image size under 100 MB:

# Use the official lightweight Python image
FROM python:3.11-slim

# Set a working directory
WORKDIR /app

# Install only the runtime dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the source code
COPY . .

# Expose the port Flask will run on
EXPOSE 5000

# Use the built‑in Flask server for dev, gunicorn for prod
CMD ["gunicorn", "-w", "3", "-b", "0.0.0.0:5000", "app:app"]

A few things to note:

Slim base image – python:3.11-slim strips out unnecessary OS packages, keeping the layer count low.
--no-cache-dir – prevents pip from storing the download cache inside the image, shaving off a few megabytes.
Gunicorn – a production‑grade WSGI server that forks multiple workers, giving you better concurrency than Flask’s dev server.

Build the image with:

docker build -t shipping-quote:1.0 .

Run it locally to sanity‑check:

docker run -p 5000:5000 shipping-quote:1.0

If you can hit http://localhost:5000/quote and see JSON back, you’re ready for the next step.

From Docker to Kubernetes – The Leap

Kubernetes (often shortened to “k8s”) is the orchestrator that turns a single container into a self‑healing, load‑balanced service. Think of it as a traffic cop that watches your pods, restarts any that crash, and spreads them across nodes for high availability.

Service and Deployment Manifests

Kubernetes uses YAML files to describe the desired state. Below are two minimal manifests: one for the Deployment (which manages pod replicas) and one for the Service (which exposes the pods).

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: shipping-quote
spec:
  replicas: 3
  selector:
    matchLabels:
      app: shipping-quote
  template:
    metadata:
      labels:
        app: shipping-quote
    spec:
      containers:
        - name: shipping-quote
          image: shipping-quote:1.0
          ports:
            - containerPort: 5000
          resources:
            limits:
              cpu: "500m"
              memory: "256Mi"

service.yaml

apiVersion: v1
kind: Service
metadata:
  name: shipping-quote
spec:
  selector:
    app: shipping-quote
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5000
  type: LoadBalancer

A couple of quick explanations:

replicas: 3 – Kubernetes will keep three pods running. If one dies, another is spawned automatically.
Resource limits – Setting CPU and memory caps prevents a runaway pod from hogging the node.
type: LoadBalancer – In a cloud environment this creates an external IP that routes traffic to your service. On a local cluster you can swap it for NodePort and hit the node’s IP directly.

Apply the manifests:

kubectl apply -f deployment.yaml
kubectl apply -f service.yaml

You can watch the rollout with kubectl get pods -w. When all pods show Running, your Flask microservice is live on the cluster.

You can also streamline the process with CI/CD pipelines; learn more in our guide on automating your development workflow with GitHub Actions.

A Quick End‑to‑End Walkthrough

Write the Flask endpoint – In app.py expose a /quote route that reads a JSON payload and returns a calculated price.
Pin dependencies – requirements.txt should contain only Flask and gunicorn for this example.
Dockerize – Use the Dockerfile above, build, and test locally.
Push to a registry – Tag the image with your Docker Hub or private registry name (docker tag shipping-quote:1.0 myrepo/shipping-quote:1.0) and push it (docker push myrepo/shipping-quote:1.0).
Update the Deployment – Change the image: field in deployment.yaml to the fully qualified registry name.
Deploy to k8s – Run the kubectl apply commands.
Verify – Grab the external IP (kubectl get svc shipping-quote) and curl the endpoint: curl http://<EXTERNAL_IP>/quote -d '{"weight":2.5}' -H "Content-Type: application/json".

If the response looks sane, congratulations—you just turned a single Python file into a cloud‑native microservice.

Common Pitfalls and How to Dodge Them

Forgot to expose the port – Flask defaults to 5000, but if you change it in code you must also update the Dockerfile EXPOSE line and the containerPort in the Deployment.
Image not found – Kubernetes can’t pull a private image unless you create a secret with your registry credentials and reference it in the pod spec.
Health checks missing – Without liveness/readiness probes, k8s assumes a pod is healthy as soon as it starts. Add a simple /health endpoint and configure probes to avoid traffic being sent to a booting container.
Resource limits too low – Setting memory to 64Mi may cause the pod to be OOM‑killed under normal load. Start with generous defaults, monitor, then tighten.

Takeaway

Microservices don’t have to be a black box of obscure tooling. With Flask’s simplicity, Docker’s reproducibility, and Kubernetes’ automation, you can spin up a production‑grade service in a single afternoon. The key is to keep each layer—code, container, orchestration—lean and well‑documented. Once you have that foundation, scaling, versioning, and even swapping languages for individual services becomes a painless exercise rather than a nightmare.