Build a Real-World Image Classifier in PyTorch: A Step-by-Step Tutorial for Beginners

Ever looked at a photo app that instantly tells you what’s in the picture and thought “I could build that”? In 2024 the tools are cheap, the data is plentiful, and PyTorch makes the whole process feel like a friendly puzzle. Let’s walk through a complete, hands‑on project that takes you from raw pictures to a working model you can run on your laptop.

Why an Image Classifier?

Images are everywhere – from social media feeds to medical scans. Being able to label them automatically saves time and opens doors to new products. For a beginner, building a classifier teaches you data handling, model design, training loops, and evaluation – the core skills you’ll reuse in any deep‑learning job.

What You’ll Need

Python 3.8 or newer
PyTorch (latest stable release)
torchvision (for datasets and transforms)
A modest GPU is nice but not required; the tutorial runs on CPU too
A folder of images organized by class (we’ll use a small public dataset)

If you haven’t installed PyTorch yet, run:

pip install torch torchvision

Step 1: Choose a Real‑World Dataset

For this tutorial I like the “Oxford Flowers 102” set – 8,000 pictures of 102 flower species. It’s big enough to be realistic but small enough to train quickly. Download it from the official site or use torchvision.datasets.ImageFolder to point to a local copy.

data_dir = "data/oxford_flowers"

The folder should look like:

oxford_flowers/
    daisy/
        img1.jpg
        img2.jpg
    rose/
        img3.jpg
        ...

Step 2: Load and Transform the Images

Images come in all sizes and color formats. We need to resize them, turn them into tensors, and normalize the pixel values. Normalization means subtracting the mean and dividing by the standard deviation of the ImageNet dataset – a common practice that helps the model learn faster.

import torch
from torchvision import datasets, transforms

# Define a set of transformations
transform = transforms.Compose([
    transforms.Resize((224, 224)),          # make all images 224x224
    transforms.ToTensor(),                  # convert to tensor (0‑1 range)
    transforms.Normalize(                   # standard ImageNet stats
        mean=[0.485, 0.456, 0.406],
        std =[0.229, 0.224, 0.225])
])

# Load the dataset
full_dataset = datasets.ImageFolder(root=data_dir, transform=transform)

Train‑Validation Split

We keep 80 % for training and 20 % for validation.

train_size = int(0.8 * len(full_dataset))
val_size   = len(full_dataset) - train_size
train_set, val_set = torch.utils.data.random_split(full_dataset,
                                                  [train_size, val_size])

Create loaders that feed batches to the model.

batch_size = 32
train_loader = torch.utils.data.DataLoader(train_set,
                                           batch_size=batch_size,
                                           shuffle=True,
                                           num_workers=2)
val_loader   = torch.utils.data.DataLoader(val_set,
                                           batch_size=batch_size,
                                           shuffle=False,
                                           num_workers=2)

Step 3: Pick a Model Architecture

For beginners, a pre‑trained ResNet‑18 works wonders. It already knows how to detect edges, textures, and shapes. We only need to replace the final fully‑connected layer to match our number of classes.

from torchvision import models

num_classes = len(full_dataset.classes)   # 102 for Oxford Flowers
model = models.resnet18(pretrained=True)

# Freeze early layers (optional but speeds up training)
for param in model.parameters():
    param.requires_grad = False

# Replace the classifier
model.fc = torch.nn.Linear(model.fc.in_features, num_classes)

Step 4: Define Loss, Optimizer, and Device

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

criterion = torch.nn.CrossEntropyLoss()          # standard for classification
optimizer = torch.optim.Adam(model.fc.parameters(), lr=0.001)

Step 5: Training Loop

The heart of any deep‑learning project is the loop that feeds data, computes loss, and updates weights. I keep it simple and add a few prints so you can see progress.

def train_one_epoch(epoch):
    model.train()
    running_loss = 0.0
    for batch_idx, (inputs, targets) in enumerate(train_loader):
        inputs, targets = inputs.to(device), targets.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if (batch_idx + 1) % 10 == 0:
            print(f"Epoch {epoch} [{batch_idx+1}/{len(train_loader)}] "
                  f"Loss: {running_loss / 10:.4f}")
            running_loss = 0.0

Validation Step

After each epoch we check how well the model does on unseen data.

def validate(epoch):
    model.eval()
    correct = 0
    total   = 0
    with torch.no_grad():
        for inputs, targets in val_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            total   += targets.size(0)
            correct += (predicted == targets).sum().item()
    acc = 100 * correct / total
    print(f"Validation after epoch {epoch}: {acc:.2f}% accuracy")

Run the Training

num_epochs = 5
for epoch in range(1, num_epochs + 1):
    train_one_epoch(epoch)
    validate(epoch)

Five epochs on a modest GPU usually lands you above 70 % accuracy on this dataset – a solid start for a beginner project.

Step 6: Save and Load the Model

torch.save(model.state_dict(), "flower_classifier.pth")

Later you can load it with:

model = models.resnet18()
model.fc = torch.nn.Linear(model.fc.in_features, num_classes)
model.load_state_dict(torch.load("flower_classifier.pth"))
model.eval()

Step 7: Put It to Use

Write a tiny function that takes a file path, runs the same transforms, and returns the predicted class name.

def predict(image_path):
    img = Image.open(image_path).convert("RGB")
    img = transform(img).unsqueeze(0).to(device)   # add batch dim
    with torch.no_grad():
        output = model(img)
        _, pred = torch.max(output, 1)
    return full_dataset.classes[pred.item()]

Try it on a new flower picture and watch the model speak!

Tips for Going Further

More data: Augment with random flips, rotations, or color jitter to improve robustness.
Fine‑tune deeper layers: Unfreeze the last few blocks of ResNet and train with a lower learning rate.
Experiment with architectures: MobileNetV2 is lighter, EfficientNet is more accurate.
Deploy: Convert the model to TorchScript or ONNX and serve it with a simple Flask API.

Building this classifier gave me a fresh reminder of why I love teaching. The moment the model correctly names a rose that I never saw before feels like magic, and it’s a magic anyone can learn to perform.

Happy coding, and may your next project be as colorful as a garden in full bloom.