Python Microservices Made Simple: Deploying Flask Services with Docker and Kubernetes
If you’ve ever tried to squeeze a monolithic Django app onto a tiny VM and watched it sputter, you know why the microservice buzz feels like a breath of fresh air. In 2024 the cloud is a playground, and the tools to turn a single Flask endpoint into a resilient, auto‑scaling service are finally mature enough that you don’t need a PhD in distributed systems to get it right.
Why Microservices Matter Right Now
The old “one‑code‑base‑to‑rule‑them‑all” approach is still hanging around in legacy shops, but the cost of that model is showing up in slower deployments, tangled dependencies, and a never‑ending battle with “it works on my machine.” Microservices let you isolate concerns, pick the right language or framework for each job, and—crucially—scale components independently. That means your user‑facing API can spin up more pods while a background worker stays snug on a single node, saving you both money and headaches.
The Flask Sweet Spot
Flask is the lightweight cousin of Django. It gives you just enough scaffolding to spin up a REST endpoint without the baggage of an ORM, admin panel, or built‑in authentication system. For a microservice that does one thing—say, calculate a shipping quote or validate a coupon—Flask’s minimalism translates to faster build times and a smaller attack surface. Plus, the Flask community has embraced Docker and Kubernetes early, so you’ll find plenty of examples to lean on.
Dockerizing Your First Flask Service
Before you can hand your code over to Kubernetes, you need a container image that runs everywhere. Here’s a no‑frills Dockerfile that keeps the image size under 100 MB:
# Use the official lightweight Python image
FROM python:3.11-slim
# Set a working directory
WORKDIR /app
# Install only the runtime dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy the source code
COPY . .
# Expose the port Flask will run on
EXPOSE 5000
# Use the built‑in Flask server for dev, gunicorn for prod
CMD ["gunicorn", "-w", "3", "-b", "0.0.0.0:5000", "app:app"]
A few things to note:
- Slim base image –
python:3.11-slimstrips out unnecessary OS packages, keeping the layer count low. --no-cache-dir– prevents pip from storing the download cache inside the image, shaving off a few megabytes.- Gunicorn – a production‑grade WSGI server that forks multiple workers, giving you better concurrency than Flask’s dev server.
Build the image with:
docker build -t shipping-quote:1.0 .
Run it locally to sanity‑check:
docker run -p 5000:5000 shipping-quote:1.0
If you can hit http://localhost:5000/quote and see JSON back, you’re ready for the next step.
From Docker to Kubernetes – The Leap
Kubernetes (often shortened to “k8s”) is the orchestrator that turns a single container into a self‑healing, load‑balanced service. Think of it as a traffic cop that watches your pods, restarts any that crash, and spreads them across nodes for high availability.
Service and Deployment Manifests
Kubernetes uses YAML files to describe the desired state. Below are two minimal manifests: one for the Deployment (which manages pod replicas) and one for the Service (which exposes the pods).
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: shipping-quote
spec:
replicas: 3
selector:
matchLabels:
app: shipping-quote
template:
metadata:
labels:
app: shipping-quote
spec:
containers:
- name: shipping-quote
image: shipping-quote:1.0
ports:
- containerPort: 5000
resources:
limits:
cpu: "500m"
memory: "256Mi"
service.yaml
apiVersion: v1
kind: Service
metadata:
name: shipping-quote
spec:
selector:
app: shipping-quote
ports:
- protocol: TCP
port: 80
targetPort: 5000
type: LoadBalancer
A couple of quick explanations:
replicas: 3– Kubernetes will keep three pods running. If one dies, another is spawned automatically.- Resource limits – Setting CPU and memory caps prevents a runaway pod from hogging the node.
type: LoadBalancer– In a cloud environment this creates an external IP that routes traffic to your service. On a local cluster you can swap it forNodePortand hit the node’s IP directly.
Apply the manifests:
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
You can watch the rollout with kubectl get pods -w. When all pods show Running, your Flask microservice is live on the cluster.
A Quick End‑to‑End Walkthrough
- Write the Flask endpoint – In
app.pyexpose a/quoteroute that reads a JSON payload and returns a calculated price. - Pin dependencies –
requirements.txtshould contain onlyFlaskandgunicornfor this example. - Dockerize – Use the Dockerfile above, build, and test locally.
- Push to a registry – Tag the image with your Docker Hub or private registry name (
docker tag shipping-quote:1.0 myrepo/shipping-quote:1.0) and push it (docker push myrepo/shipping-quote:1.0). - Update the Deployment – Change the
image:field indeployment.yamlto the fully qualified registry name. - Deploy to k8s – Run the
kubectl applycommands. - Verify – Grab the external IP (
kubectl get svc shipping-quote) and curl the endpoint:curl http://<EXTERNAL_IP>/quote -d '{"weight":2.5}' -H "Content-Type: application/json".
If the response looks sane, congratulations—you just turned a single Python file into a cloud‑native microservice.
Common Pitfalls and How to Dodge Them
- Forgot to expose the port – Flask defaults to 5000, but if you change it in code you must also update the Dockerfile
EXPOSEline and the containerPort in the Deployment. - Image not found – Kubernetes can’t pull a private image unless you create a secret with your registry credentials and reference it in the pod spec.
- Health checks missing – Without liveness/readiness probes, k8s assumes a pod is healthy as soon as it starts. Add a simple
/healthendpoint and configure probes to avoid traffic being sent to a booting container. - Resource limits too low – Setting memory to 64Mi may cause the pod to be OOM‑killed under normal load. Start with generous defaults, monitor, then tighten.
Takeaway
Microservices don’t have to be a black box of obscure tooling. With Flask’s simplicity, Docker’s reproducibility, and Kubernetes’ automation, you can spin up a production‑grade service in a single afternoon. The key is to keep each layer—code, container, orchestration—lean and well‑documented. Once you have that foundation, scaling, versioning, and even swapping languages for individual services becomes a painless exercise rather than a nightmare.