Deploy Your First LLM Application on AWS Lambda
Ever wondered why you can now run a chat‑bot from a tiny function instead of a big server? The answer is simple: cloud providers have made it cheap and easy to spin up small pieces of code that can talk to powerful AI models. In this post I’ll walk you through the exact steps to get a Python‑based LLM (large language model) running on AWS Lambda. No fluff, just a practical guide you can follow today.
Why Lambda for LLMs?
Low cost, high scalability
Lambda charges you only for the milliseconds your code actually runs. If your LLM call takes 200 ms, you pay for that slice of time, not for an idle VM. When traffic spikes, Lambda automatically creates more instances, so you never have to worry about capacity planning.
No server maintenance
You don’t need to patch an OS, install drivers, or keep a server alive 24/7. All you need is a zip file with your code and a few environment variables. That fits perfectly with the “code‑first” mindset most Python developers have.
Easy integration with other AWS services
Lambda can read from S3, write to DynamoDB, or be triggered by API Gateway. This means you can build a full API around your LLM without touching a single line of infrastructure code.
What You Need
| Item | Reason |
|---|---|
| AWS account | To create Lambda, IAM role, and API Gateway |
| Python 3.9+ installed locally | Lambda runtime is Python 3.9 (or 3.10) |
boto3 and requests libraries | To call the LLM endpoint and interact with AWS |
| An LLM API key (e.g., OpenAI, Cohere, or a hosted Hugging Face model) | The actual language model you’ll query |
I keep a small “starter kit” in my GitHub repo, but you can copy the snippets below into a fresh folder and zip it up.
Step 1: Set Up the LLM API
For this tutorial I’ll use OpenAI’s gpt-3.5-turbo endpoint because it’s free for small tests. Sign up at openai.com, generate an API key, and store it safely. You’ll later add it to Lambda as a secret.
If you prefer a self‑hosted model, just replace the request URL and payload accordingly – the rest of the code stays the same.
Step 2: Write the Lambda Handler
Create a file called lambda_function.py with the following content:
import os
import json
import requests
# Read the API key from environment variables
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
if not OPENAI_API_KEY:
raise RuntimeError('OPENAI_API_KEY not set')
def query_llm(prompt: str) -> str:
url = 'https://api.openai.com/v1/chat/completions'
headers = {
'Authorization': f'Bearer {OPENAI_API_KEY}',
'Content-Type': 'application/json'
}
data = {
'model': 'gpt-3.5-turbo',
'messages': [{'role': 'user', 'content': prompt}],
'max_tokens': 150
}
response = requests.post(url, headers=headers, json=data)
response.raise_for_status()
result = response.json()
return result['choices'][0]['message']['content'].strip()
def lambda_handler(event, context):
# Expect a JSON body with a "prompt" field
try:
body = json.loads(event.get('body', '{}'))
prompt = body.get('prompt', '')
if not prompt:
return {
'statusCode': 400,
'body': json.dumps({'error': 'Missing prompt'})
}
except json.JSONDecodeError:
return {
'statusCode': 400,
'body': json.dumps({'error': 'Invalid JSON'})
}
try:
answer = query_llm(prompt)
return {
'statusCode': 200,
'body': json.dumps({'answer': answer})
}
except Exception as e:
return {
'statusCode': 500,
'body': json.dumps({'error': str(e)})
}
A few notes:
lambda_handleris the entry point Lambda expects. It receives aneventdict that API Gateway will fill with the HTTP request.- We read the prompt from the JSON body, call
query_llm, and return the answer as JSON. - Errors are turned into proper HTTP status codes – a small but useful habit.
Step 3: Package Dependencies
Lambda’s Python runtime does not include requests by default, so we need to bundle it.
mkdir package
pip install --target ./package requests
cp lambda_function.py package/
cd package
zip -r ../lambda_deploy.zip .
cd ..
Now you have a lambda_deploy.zip file containing your code and the requests library.
Step 4: Create an IAM Role
Your Lambda function needs permission to write logs to CloudWatch. In the AWS console:
- Go to IAM → Roles → Create role.
- Choose “Lambda” as the trusted entity.
- Attach the managed policy
AWSLambdaBasicExecutionRole. - Give the role a name like
lambda-llm-exec.
Copy the ARN; you’ll need it in the next step.
Step 5: Deploy the Function
You can use the AWS console or the CLI. I’ll show the CLI because it’s reproducible.
aws lambda create-function \
--function-name llm-demo \
--runtime python3.9 \
--role arn:aws:iam::123456789012:role/lambda-llm-exec \
--handler lambda_function.lambda_handler \
--zip-file fileb://lambda_deploy.zip \
--timeout 30 \
--memory-size 512
A few things to watch:
- Timeout – LLM calls can sometimes take a second or two, so give yourself at least 30 seconds.
- Memory – More memory means a higher CPU allocation, which can shave off a few hundred milliseconds.
Step 6: Store the API Key Securely
Never hard‑code secrets. In the console, go to the Lambda function → Configuration → Environment variables. Add a key called OPENAI_API_KEY and paste your key. Mark it as “encrypted” (the console does this automatically).
Step 7: Expose the Function via API Gateway
Now you need an HTTP endpoint that your front‑end or curl can hit.
aws apigateway create-rest-api --name llm-demo-api
Take note of the id returned. Then:
# Get the root resource id
PARENT_ID=$(aws apigateway get-resources --rest-api-id $API_ID --query "items[?path=='/'].id" --output text)
# Create a POST method on the root
aws apigateway put-method \
--rest-api-id $API_ID \
--resource-id $PARENT_ID \
--http-method POST \
--authorization-type "NONE"
# Link the method to Lambda
aws apigateway put-integration \
--rest-api-id $API_ID \
--resource-id $PARENT_ID \
--http-method POST \
--type AWS_PROXY \
--integration-http-method POST \
--uri arn:aws:apigateway:$AWS_REGION:lambda:path/2015-03-31/functions/arn:aws:lambda:$AWS_REGION:123456789012:function:llm-demo/invocations
# Give API Gateway permission to invoke Lambda
aws lambda add-permission \
--function-name llm-demo \
--statement-id apigateway-invoke \
--action lambda:InvokeFunction \
--principal apigateway.amazonaws.com \
--source-arn arn:aws:execute-api:$AWS_REGION:123456789012:$API_ID/*/POST/
Finally, deploy the API:
aws apigateway create-deployment --rest-api-id $API_ID --stage-name prod
Your endpoint will be:
https://{api_id}.execute-api.{region}.amazonaws.com/prod
Step 8: Test It Out
Grab a terminal and run:
curl -X POST https://{api_id}.execute-api.{region}.amazonaws.com/prod \
-H "Content-Type: application/json" \
-d '{"prompt":"Tell me a joke about cloud computing"}'
You should see a JSON response with an answer field containing the LLM’s reply. If you get a 500 error, check CloudWatch logs – they are gold for debugging.
Tips for Production Use
- Cold start mitigation – Lambda containers spin up on first request. If you expect low latency, keep a small amount of provisioned concurrency (a few warm instances) to avoid the first‑request delay.
- Rate limiting – Most LLM providers enforce request limits. Use API Gateway usage plans or a simple token bucket in front of the function.
- Cost monitoring – Enable AWS Cost Explorer alerts. Even a few hundred requests can add up if you forget to set a usage cap.
Wrap‑Up
Deploying an LLM on AWS Lambda is surprisingly straightforward. You get a serverless, pay‑as‑you‑go endpoint that can power chat‑bots, content generators, or any text‑based feature you can imagine. The biggest hurdle is usually just wiring the pieces together, but once you have the template, re‑using it for other models is a breeze.
Give it a try, tweak the prompt handling, and you’ll see how quickly you can move from “idea” to “working demo”. That’s the kind of rapid iteration I love writing about on Tech Frontier.
- → How to Build a Scalable Flat Serverless API on AWS with Zero Cold Starts @flatservers
- → Build Your First Python Automation Script: A Step‑by‑Step Guide for Beginners @pythonstarter
- → Build a ChatGPT‑Powered Bot in Python in Under an Hour @techtrek
- → Implementing the One-In-One-Out Rule in Python: A Step-by-Step Guide for Cleaner Code @codeflowinsights
- → How to Build Your First End-to-End Machine Learning Project in Python @datascitrial