Deploy Your First LLM Application on AWS Lambda

Ever wondered why you can now run a chat‑bot from a tiny function instead of a big server? The answer is simple: cloud providers have made it cheap and easy to spin up small pieces of code that can talk to powerful AI models. In this post I’ll walk you through the exact steps to get a Python‑based LLM (large language model) running on AWS Lambda. No fluff, just a practical guide you can follow today.

Why Lambda for LLMs?

Low cost, high scalability

Lambda charges you only for the milliseconds your code actually runs. If your LLM call takes 200 ms, you pay for that slice of time, not for an idle VM. When traffic spikes, Lambda automatically creates more instances, so you never have to worry about capacity planning.

No server maintenance

You don’t need to patch an OS, install drivers, or keep a server alive 24/7. All you need is a zip file with your code and a few environment variables. That fits perfectly with the “code‑first” mindset most Python developers have.

Easy integration with other AWS services

Lambda can read from S3, write to DynamoDB, or be triggered by API Gateway. This means you can build a full API around your LLM without touching a single line of infrastructure code.

What You Need

Item	Reason
AWS account	To create Lambda, IAM role, and API Gateway
Python 3.9+ installed locally	Lambda runtime is Python 3.9 (or 3.10)
`boto3` and `requests` libraries	To call the LLM endpoint and interact with AWS
An LLM API key (e.g., OpenAI, Cohere, or a hosted Hugging Face model)	The actual language model you’ll query

I keep a small “starter kit” in my GitHub repo, but you can copy the snippets below into a fresh folder and zip it up.

Step 1: Set Up the LLM API

For this tutorial I’ll use OpenAI’s gpt-3.5-turbo endpoint because it’s free for small tests. Sign up at openai.com, generate an API key, and store it safely. You’ll later add it to Lambda as a secret.

If you prefer a self‑hosted model, just replace the request URL and payload accordingly – the rest of the code stays the same.

Step 2: Write the Lambda Handler

Create a file called lambda_function.py with the following content:

import os
import json
import requests

# Read the API key from environment variables
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
if not OPENAI_API_KEY:
    raise RuntimeError('OPENAI_API_KEY not set')

def query_llm(prompt: str) -> str:
    url = 'https://api.openai.com/v1/chat/completions'
    headers = {
        'Authorization': f'Bearer {OPENAI_API_KEY}',
        'Content-Type': 'application/json'
    }
    data = {
        'model': 'gpt-3.5-turbo',
        'messages': [{'role': 'user', 'content': prompt}],
        'max_tokens': 150
    }
    response = requests.post(url, headers=headers, json=data)
    response.raise_for_status()
    result = response.json()
    return result['choices'][0]['message']['content'].strip()

def lambda_handler(event, context):
    # Expect a JSON body with a "prompt" field
    try:
        body = json.loads(event.get('body', '{}'))
        prompt = body.get('prompt', '')
        if not prompt:
            return {
                'statusCode': 400,
                'body': json.dumps({'error': 'Missing prompt'})
            }
    except json.JSONDecodeError:
        return {
            'statusCode': 400,
            'body': json.dumps({'error': 'Invalid JSON'})
        }

    try:
        answer = query_llm(prompt)
        return {
            'statusCode': 200,
            'body': json.dumps({'answer': answer})
        }
    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps({'error': str(e)})
        }

A few notes:

lambda_handler is the entry point Lambda expects. It receives an event dict that API Gateway will fill with the HTTP request.
We read the prompt from the JSON body, call query_llm, and return the answer as JSON.
Errors are turned into proper HTTP status codes – a small but useful habit.

Step 3: Package Dependencies

Lambda’s Python runtime does not include requests by default, so we need to bundle it.

mkdir package
pip install --target ./package requests
cp lambda_function.py package/
cd package
zip -r ../lambda_deploy.zip .
cd ..

Now you have a lambda_deploy.zip file containing your code and the requests library.

Step 4: Create an IAM Role

Your Lambda function needs permission to write logs to CloudWatch. In the AWS console:

Go to IAM → Roles → Create role.
Choose “Lambda” as the trusted entity.
Attach the managed policy AWSLambdaBasicExecutionRole.
Give the role a name like lambda-llm-exec.

Copy the ARN; you’ll need it in the next step.

Step 5: Deploy the Function

You can use the AWS console or the CLI. I’ll show the CLI because it’s reproducible.

aws lambda create-function \
    --function-name llm-demo \
    --runtime python3.9 \
    --role arn:aws:iam::123456789012:role/lambda-llm-exec \
    --handler lambda_function.lambda_handler \
    --zip-file fileb://lambda_deploy.zip \
    --timeout 30 \
    --memory-size 512

A few things to watch:

Timeout – LLM calls can sometimes take a second or two, so give yourself at least 30 seconds.
Memory – More memory means a higher CPU allocation, which can shave off a few hundred milliseconds.

Step 6: Store the API Key Securely

Never hard‑code secrets. In the console, go to the Lambda function → Configuration → Environment variables. Add a key called OPENAI_API_KEY and paste your key. Mark it as “encrypted” (the console does this automatically).

Step 7: Expose the Function via API Gateway

Now you need an HTTP endpoint that your front‑end or curl can hit.

aws apigateway create-rest-api --name llm-demo-api

Take note of the id returned. Then:

# Get the root resource id
PARENT_ID=$(aws apigateway get-resources --rest-api-id $API_ID --query "items[?path=='/'].id" --output text)

# Create a POST method on the root
aws apigateway put-method \
    --rest-api-id $API_ID \
    --resource-id $PARENT_ID \
    --http-method POST \
    --authorization-type "NONE"

# Link the method to Lambda
aws apigateway put-integration \
    --rest-api-id $API_ID \
    --resource-id $PARENT_ID \
    --http-method POST \
    --type AWS_PROXY \
    --integration-http-method POST \
    --uri arn:aws:apigateway:$AWS_REGION:lambda:path/2015-03-31/functions/arn:aws:lambda:$AWS_REGION:123456789012:function:llm-demo/invocations

# Give API Gateway permission to invoke Lambda
aws lambda add-permission \
    --function-name llm-demo \
    --statement-id apigateway-invoke \
    --action lambda:InvokeFunction \
    --principal apigateway.amazonaws.com \
    --source-arn arn:aws:execute-api:$AWS_REGION:123456789012:$API_ID/*/POST/

Finally, deploy the API:

aws apigateway create-deployment --rest-api-id $API_ID --stage-name prod

Your endpoint will be:

https://{api_id}.execute-api.{region}.amazonaws.com/prod

Step 8: Test It Out

Grab a terminal and run:

curl -X POST https://{api_id}.execute-api.{region}.amazonaws.com/prod \
     -H "Content-Type: application/json" \
     -d '{"prompt":"Tell me a joke about cloud computing"}'

You should see a JSON response with an answer field containing the LLM’s reply. If you get a 500 error, check CloudWatch logs – they are gold for debugging.

Tips for Production Use

Cold start mitigation – Lambda containers spin up on first request. If you expect low latency, keep a small amount of provisioned concurrency (a few warm instances) to avoid the first‑request delay.
Rate limiting – Most LLM providers enforce request limits. Use API Gateway usage plans or a simple token bucket in front of the function.
Cost monitoring – Enable AWS Cost Explorer alerts. Even a few hundred requests can add up if you forget to set a usage cap.

Wrap‑Up

Deploying an LLM on AWS Lambda is surprisingly straightforward. You get a serverless, pay‑as‑you‑go endpoint that can power chat‑bots, content generators, or any text‑based feature you can imagine. The biggest hurdle is usually just wiring the pieces together, but once you have the template, re‑using it for other models is a breeze.

Give it a try, tweak the prompt handling, and you’ll see how quickly you can move from “idea” to “working demo”. That’s the kind of rapid iteration I love writing about on Tech Frontier.