How to Build a Real-World Image Classifier with PyTorch Transfer Learning

Read this article in clean Markdown format for LLMs and AI context.

Ever looked at a photo and wondered how a computer could tell a cat from a coffee mug? In 2024, image classifiers are everywhere—from phone cameras that auto‑tag pictures to medical tools that spot anomalies. If you can follow a recipe, you can build one too. Let’s walk through a complete, hands‑on tutorial that takes you from a blank notebook to a working model that you can actually use.

Why Transfer Learning?

Training a deep network from scratch needs millions of images and a lot of GPU time. Transfer learning lets us borrow knowledge from a model that has already learned to see edges, textures, and shapes. Think of it as hiring a seasoned chef who already knows how to chop vegetables; you only need to teach them the new dish’s spices.

What You’ll Need

  • Python 3.9+ – the language we’ll write in.
  • PyTorch – the deep‑learning library we’ll use.
  • torchvision – provides ready‑made models and image utilities.
  • A modest GPU (or the free tier of Google Colab).
  • A small, labeled image folder (we’ll use a public “flowers” dataset).

If any of these sound unfamiliar, don’t worry. I explain each step in plain language, and you can copy‑paste the code directly.

Step 1: Set Up the Environment

First, install the required packages. Open a terminal or a notebook cell and run:

pip install torch torchvision matplotlib tqdm

torch is the core library, torchvision gives us pre‑trained models and data loaders, matplotlib helps us plot results, and tqdm adds a nice progress bar.

Step 2: Get the Data

For this tutorial I like to use the “Oxford 102 Flowers” dataset because it’s small enough to run quickly but still realistic. Download and unzip it into a folder called data/flowers.

import os
import urllib.request
import zipfile

url = "https://download.microsoft.com/download/3/E/1/3E1E2A0A-6F7A-4F4A-9C6A-0E8F9F2C0A5C/flowers.zip"
zip_path = "data/flowers.zip"
os.makedirs("data", exist_ok=True)

if not os.path.exists(zip_path):
    print("Downloading dataset...")
    urllib.request.urlretrieve(url, zip_path)

with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall("data")

The folder now contains subfolders train and val, each with one folder per flower class.

Step 3: Prepare the Data Loaders

PyTorch works with datasets and data loaders. A dataset knows how to read an image and its label; a loader batches those images and shuffles them during training.

from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Common image transformations
train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224),   # random crop to 224x224
    transforms.RandomHorizontalFlip(), # data augmentation
    transforms.ToTensor(),              # convert to tensor
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225]) # match ImageNet stats
])

val_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225])
])

train_dataset = datasets.ImageFolder('data/flowers/train', transform=train_transform)
val_dataset   = datasets.ImageFolder('data/flowers/val',   transform=val_transform)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=2)
val_loader   = DataLoader(val_dataset,   batch_size=32, shuffle=False, num_workers=2)

class_names = train_dataset.classes
print(f"Found {len(class_names)} classes: {class_names}")

ImageFolder expects the folder structure root/class_name/image.jpg. The transforms we use are standard for ImageNet‑pretrained models: they resize, crop, and normalize the images so the network sees data in the same range it was trained on.

Step 4: Load a Pre‑Trained Model

We’ll use ResNet‑50, a popular architecture that balances speed and accuracy. The model comes with weights trained on ImageNet (a huge collection of everyday objects).

import torch
import torch.nn as nn
from torchvision import models

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

base_model = models.resnet50(pretrained=True)
base_model = base_model.to(device)

Freeze the Feature Extractor

The early layers already know how to detect edges and textures. Freezing them saves memory and training time.

for param in base_model.parameters():
    param.requires_grad = False

Replace the Final Layer

ResNet‑50 ends with a fully‑connected layer that outputs 1000 classes (the ImageNet categories). We replace it with a new layer that matches our flower count.

num_features = base_model.fc.in_features
base_model.fc = nn.Linear(num_features, len(class_names))
base_model.fc = base_model.fc.to(device)

Now only the new layer’s weights will be updated during training.

Step 5: Define Loss, Optimizer, and Scheduler

We’ll use cross‑entropy loss, the standard for multi‑class classification. For the optimizer, Adam works well with a small learning rate.

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(base_model.fc.parameters(), lr=1e-3)

# Optional: reduce learning rate when validation loss plateaus
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer,
                                                       mode='min',
                                                       factor=0.5,
                                                       patience=3,
                                                       verbose=True)

Step 6: Training Loop

Below is a compact training loop that prints loss and accuracy each epoch. I like to keep it simple so you can see what’s happening.

from tqdm import tqdm

def train_one_epoch(model, loader, criterion, optimizer):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for inputs, labels in tqdm(loader, leave=False):
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)
        _, preds = torch.max(outputs, 1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total
    return epoch_loss, epoch_acc

def evaluate(model, loader, criterion):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for inputs, labels in loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)

            running_loss += loss.item() * inputs.size(0)
            _, preds = torch.max(outputs, 1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)

    val_loss = running_loss / total
    val_acc = correct / total
    return val_loss, val_acc

num_epochs = 10
for epoch in range(num_epochs):
    train_loss, train_acc = train_one_epoch(base_model, train_loader, criterion, optimizer)
    val_loss,   val_acc   = evaluate(base_model, val_loader, criterion)
    scheduler.step(val_loss)

    print(f"Epoch {epoch+1}/{num_epochs} | "
          f"Train loss: {train_loss:.4f}, acc: {train_acc:.3f} | "
          f"Val loss: {val_loss:.4f}, acc: {val_acc:.3f}")

You’ll notice the validation accuracy climbs quickly in the first few epochs and then plateaus. That’s typical when only the final layer is being tuned.

Step 7: Test the Model on New Images

Let’s load a single picture, run it through the model, and see what it predicts.

from PIL import Image

def predict_image(image_path):
    img = Image.open(image_path).convert('RGB')
    img_t = val_transform(img).unsqueeze(0).to(device)  # add batch dim
    base_model.eval()
    with torch.no_grad():
        out = base_model(img_t)
        _, pred = torch.max(out, 1)
    return class_names[pred.item()]

sample_path = 'data/flowers/val/daisy/image_00123.jpg'
print(f"Prediction: {predict_image(sample_path)}")

Swap the path with any picture you like—maybe a snap of your garden. The model should return a flower name that matches the image.

Step 8: Save and Load the Model

After training, store the weights so you can reuse the model later.

torch.save(base_model.state_dict(), 'flower_classifier.pth')

To load it back:

model = models.resnet50(pretrained=False)
model.fc = nn.Linear(num_features, len(class_names))
model.load_state_dict(torch.load('flower_classifier.pth', map_location=device))
model = model.to(device)

Now you have a portable classifier that can be deployed in a Flask app, a mobile prototype, or even a simple command‑line tool.

Tips for Real‑World Use

  1. More Data Helps – If you can collect more images per class, the model becomes more robust.
  2. Fine‑Tune Deeper Layers – After the final layer is stable, unfreeze the last block of ResNet and train with a lower learning rate.
  3. Watch for Over‑fitting – If training accuracy keeps rising while validation stalls, add dropout or more augmentation.
  4. Deploy Wisely – For edge devices, consider converting the model to ONNX or TorchScript to reduce size and latency.

Wrap‑Up

Building an image classifier with transfer learning is less about reinventing the wheel and more about wiring together proven pieces. In this tutorial we:

  • Grabbed a pre‑trained ResNet‑50 model.
  • Replaced its head to match our flower classes.
  • Trained only the new head, saving time and compute.
  • Evaluated, saved, and tested the model on fresh images.

Give it a try on a different dataset—maybe cats, traffic signs, or even your own product photos. The same steps apply, and the sense of watching a model learn is truly rewarding. As always, the ML Tutorial Hub is here to help you turn curiosity into code.

Reactions
Do you have any feedback or ideas on how we can improve this page?