Understanding Perceptrons: The Foundation of Modern AI

“We now have a new kind of programming paradigm. Instead of telling the computer what to do, we show it examples of what we want, and it figures out how to do it.”– Michael Nielsen

My Journey Back to the Beginning

My first encounter with Artificial Intelligence was during my college days. I didn’t understand the exam questions, and I’m pretty sure the professor didn’t understand my answers either.

Fast forward 20 years of building software systems. In all that time, I barely touched AI or ML. Sure, I designed applications that integrated with black box, AI/ML systems for OCR, but that was it.

Then ChatGPT happened.

Like many of you, I started with the ChatGPT web interface. Then I built RAG chatbots, experimented with chunking strategies, embedding models, and retrieval techniques. I built agents, worked with MCPs, and agentic patterns. I was using these tools, building with them-but something bothered me.

I didn’t understand how any of it actually worked.

So I decided to go back. Not to the latest paper or the newest framework, but to the very beginning. To the first artificial neuron. To understand AI from first principles.

Why This Matters

You might wonder why bother to learn about decades-old concept when we have ChatGPT, Claude and countless AI tools at our fingertips.

Here’s why: Every single neuron in GPT-4, in every transformer, in every neural network you’ve ever used, works on the same basic principles as that first artificial neuron. The perceptron isn’t history-It’s the foundation.

Understanding it means understanding what’s actually happening when you call an LLM API. It means knowing why things work, not just that they work.

From Biology to Silicon

Rosenblatt was inspired by biological neurons. Here’s how they compare:

Biological Neuron:

    Dendrites (receive signals)
            ↓
    Cell Body (process)
            ↓
    Threshold met?
            ↓
    Axon (fires signal)

Artificial Neuron (Perceptron):

x₁ ──×w₁──┐
x₂ ──×w₂──┤
x₃ ──×w₃──├──→ Σ(xᵢ×wᵢ) ──→ [≥ threshold?] ──→ {0 or 1}
   ...    │
xₙ ──×wₙ──┘

The key insight: Learning happens by adjusting the weights.

How a Perceptron Works

Let’s break it down to basics.

A perceptron takes inputs, multiplies each by a weight, adds them up, and makes a decision.

def perceptron_forward(inputs, weights, bias):
    # Multiply each input by its weight
    weighted_sum = sum(x * w for x, w in zip(inputs, weights))

    # Add bias (shifts the decision boundary)
    weighted_sum += bias

    # Activation: output 1 if positive, 0 otherwise
    return 1 if weighted_sum > 0 else 0

That’s it. That’s the core of a perceptron.

What’s happening:

Each input has a weight (how important is this input?)
We sum up: (input₁ × weight₁) + (input₂ × weight₂) + … + bias
If the sum is positive, output 1. Otherwise, output 0.

Example: AND gate

Let’s say we want to implement the AND logic gate:

Input: [0, 0] → Output: 0
Input: [0, 1] → Output: 0
Input: [1, 0] → Output: 0
Input: [1, 1] → Output: 1

Traditional way (if/else):

def and_gate_traditional(input1, input2):
    if input1 == 1 and input2 == 1:
        return 1
    else:
        return 0

Perceptron way (learned weights):

With the right weights ([0.5, 0.5] and bias -0.7), the perceptron can solve this:

[0, 0]: 0×0.5 + 0×0.5 – 0.7 = -0.7 → Output: 0 ✓
[0, 1]: 0×0.5 + 1×0.5 – 0.7 = -0.2 → Output: 0 ✓
[1, 0]: 1×0.5 + 0×0.5 – 0.7 = -0.2 → Output: 0 ✓
[1, 1]: 1×0.5 + 1×0.5 – 0.7 = 0.3 → Output: 1 ✓

The difference? The traditional way is hardcoded. The perceptron learns these weights from examples. That’s the new programming paradigm Nielsen talked about.

What Clicked for Me

After implementing and testing the perceptron, here’s what became clear:

Weights are just numbers. There’s no magic. A weight of 0.5 means “this input matters half as much as an input with weight 1.0.”

The bias shifts the boundary. Without bias, the decision boundary always goes through the origin. Bias lets it move anywhere.

Learning is adjustment. When the perceptron makes a mistake, we adjust the weights. That’s learning.

It’s a linear classifier. The perceptron draws a straight line (or hyperplane) to separate classes. This is both its power and its limitation.

Explore the Code

I’ve implemented a complete perceptron from scratch with visualizations:

GitHub Repository: perceptrons-to-transformers

What you’ll find:

01-perceptron/perceptron.py – Full implementation with learning algorithm
01-perceptron/perceptron.ipynb – Interactive exploration
01-perceptron/perceptron_playground.py – Streamlit app to play with it

Note: This is a learning project created for educational purpose. Code in this repo has been developed with the help of AI coding assistant tools.

What’s Next

The perceptron can learn AND, OR, and NAND gates perfectly. But it has a fundamental limitation.

No matter how you adjust the weights, there’s one simple logic gate it cannot learn. This limitation exposed a critical weakness in single-layer networks.

In the next post, we’ll explore this limitation and see why it led to the invention of multilayer networks.

Spoiler: The problem is called XOR, and solving it ultimately enabled path to modern deep learning.

References

Nielsen, M. (2015). Neural Networks and Deep Learning. Determination Press. Available at: http://neuralnetworksanddeeplearning.com/

Tags: #MachineLearning #AI #DeepLearning #Perceptron #NeuralNetworks

Series: From Perceptron to Transformers – Part 1 of 18

Code: GitHub Repository