Build a Real‑World Image Classifier with PyTorch: A Step‑by‑Step Tutorial

You’ve probably seen those cool apps that can tell if a picture has a cat, a dog, or a pizza. Building something like that yourself feels like a big mountain, but it doesn’t have to be. In today’s post on ML Tutorial Hub we’ll walk through a simple image classifier using PyTorch, and we’ll keep the code short enough to fit on a coffee‑stained notebook page.

Why this matters right now

Every day more companies need quick ways to sort images – from checking if a product photo is blurry to flagging unsafe content. Knowing how to make a tiny classifier gives you a solid base for bigger projects. Plus, it’s a great way to practice the basics of deep learning without getting lost in theory.

What you’ll need

Item	Reason
Python 3.8+	The language we’ll write in
PyTorch (latest stable)	The deep‑learning library
torchvision	Handy tools for image data
A small image dataset (we’ll use CIFAR‑10)	Gives us 10 classes to practice
A GPU is nice but not required	The code will run on a laptop CPU too

You can install everything with one line:

pip install torch torchvision

If you run into permission errors, add --user at the end. I once spent an hour trying to fix a “permission denied” that turned out to be a missing --user. Lesson learned: always read the error message.

Setting up the project folder

Create a folder called image_classifier. Inside, make two sub‑folders: data and src. Put the code we write in src/main.py. Keeping things tidy helps you find files later, especially when you have many experiments.

mkdir -p image_classifier/data
mkdir -p image_classifier/src
touch image_classifier/src/main.py

Loading the data

We’ll use the CIFAR‑10 dataset because it’s small (about 170 MB) and already split into train and test sets. The images are 32 × 32 pixels, which is perfect for a quick demo.

import torch
import torchvision
import torchvision.transforms as transforms

# Transform: turn images into tensors and normalize them
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

train_set = torchvision.datasets.CIFAR10(
    root='../data', train=True, download=True, transform=transform)

test_set = torchvision.datasets.CIFAR10(
    root='../data', train=False, download=True, transform=transform)

train_loader = torch.utils.data.DataLoader(
    train_set, batch_size=64, shuffle=True, num_workers=2)

test_loader = torch.utils.data.DataLoader(
    test_set, batch_size=64, shuffle=False, num_workers=2)

A quick note: Normalize moves pixel values from [0, 1] to [-1, 1]. This helps the model learn faster. If you’re new to this, just think of it as “making the numbers easier for the computer”.

Building a simple model

We’ll create a tiny convolutional neural network (CNN). A CNN is just a set of layers that look at small patches of the image, then combine what they see. Here’s a three‑layer version that works well on CIFAR‑10.

import torch.nn as nn
import torch.nn.functional as F

class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        # First conv layer: 3 input channels (RGB), 16 output channels, 3x3 kernel
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
        # Second conv layer: 16 -> 32 channels
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        # Fully connected layer: flatten then map to 10 classes
        self.fc1 = nn.Linear(32 * 8 * 8, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        # Apply conv1 + ReLU + max pool
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2)   # reduces size to 16x16
        # Apply conv2 + ReLU + max pool
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2)   # reduces size to 8x8
        # Flatten
        x = x.view(-1, 32 * 8 * 8)
        # Fully connected layers
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

If any of those words sound scary, don’t worry. “Conv” means convolution, a fancy word for “look at a small window”. “ReLU” is just a function that turns negative numbers into zero – it helps the network stay lively.

Training the model

Training is the part where the computer learns from the data. We’ll use the cross‑entropy loss (the standard for classification) and the Adam optimizer (a popular choice that works well out of the box).

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = SimpleCNN().to(device)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

epochs = 10
for epoch in range(epochs):
    running_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()          # clear old gradients
        outputs = model(inputs)        # forward pass
        loss = criterion(outputs, labels)  # compute loss
        loss.backward()               # backpropagation
        optimizer.step()               # update weights

        running_loss += loss.item()
    print(f'Epoch {epoch+1}/{epochs}, Loss: {running_loss/len(train_loader):.4f}')

You’ll see the loss go down each epoch. If it stays flat, try a larger learning rate or check that the data is loading correctly. I once forgot to move the data to the GPU and the loss barely moved – a classic rookie mistake!

Checking how well it works

After training, let’s see the accuracy on the test set.

correct = 0
total = 0
model.eval()  # set model to evaluation mode
with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Test Accuracy: {100 * correct / total:.2f}%')

On my laptop I usually get around 70 % accuracy with this tiny network. Not perfect, but good enough to show the idea works. If you want higher numbers, you can add more layers or train longer – but that’s a story for another ML Tutorial Hub post.

Saving and loading the model

You don’t want to retrain every time you run the script. PyTorch makes saving easy.

torch.save(model.state_dict(), 'simple_cnn.pth')

Later you can load it with:

model = SimpleCNN()
model.load_state_dict(torch.load('simple_cnn.pth'))
model.eval()

Now you have a ready‑to‑use classifier that you can plug into a web app, a phone app, or just a quick test script.

A quick personal note

When I first started teaching machine learning, I used a huge dataset of flower photos. The code was long, the errors were cryptic, and I spent more time debugging than learning. Switching to a tiny dataset like CIFAR‑10 and a small network helped me focus on the core ideas. That’s why I love sharing these bite‑size tutorials on ML Tutorial Hub – they let you see results fast, and you can build confidence before tackling the big stuff.

Wrap‑up

We’ve covered the whole pipeline: install PyTorch, load data, build a small CNN, train it, check accuracy, and save the model. All of this fits in a single Python file, and you can run it on a regular laptop. Next time you need an image classifier, you’ll have a solid starter that you can tweak to fit your own data.

Happy coding, and see you in the next ML Tutorial Hub tutorial!