From Basics to Bots: My Weekly AI Engineering Adventure-5

Hi Pythonistas!

Ever wondered how the Hogwarts Sorting Hat might sort students if it ran on machine learning?
Today, we'll turn the magical hat into a k-Nearest Neighbors (k-NN) classifier letting personality traits decide which house you belong to, the way a real witchy wizard bot would!

About k-Nearest Neighbors (k-NN) Algorithm

k-Nearest Neighbors (k-NN) is a simple, intuitive machine learning algorithm used for classification and regression tasks. It works by finding the k closest data points (neighbors) to a new input, based on a distance metric like Euclidean or Manhattan distance. The algorithm then predicts the label for the new input by taking a majority vote (for classification) or averaging (for regression) of those neighbors’ labels.

k-NN makes decisions based on stored examples directly and is easy to understand and implement. It’s widely used in tasks like image recognition, recommendation systems, and anomaly detection.

How’s This Gonna Work?

Each house has its own "student profile": boldness, cleverness, loyalty, ambition (just numbers for demo).
For a new student (with their own trait scores), the Sorting Hat looks for the most similar students in history (nearest neighbors).
The most common house among those neighbors is the sorting result!

The House Data (Training Set)

import numpy as np

# Features: [bravery, intelligence, loyalty, ambition] (scale: 0–10)

X_train = np.array([
    [9, 5, 6, 2],  # Gryffindor
    [3, 9, 5, 2],  # Ravenclaw
    [6, 4, 9, 2],  # Hufflepuff
    [5, 6, 7, 3],  # Hufflepuff
    [7, 4, 5, 8],  # Slytherin
    [5, 5, 5, 9],  # Slytherin
    [8, 8, 3, 3],  # Gryffindor
    [4, 7, 4, 4]   # Ravenclaw
])

y_train = np.array([
    'Gryffindor',
    'Ravenclaw',
    'Hufflepuff',
    'Hufflepuff',
    'Slytherin',
    'Slytherin',
    'Gryffindor',
    'Ravenclaw'
])

The Sorting Hat (k-NN Algorithm)

def l2_distance(a, b):
    return np.sqrt(np.sum((a - b) ** 2))

def sort_hat_predict(X_train, y_train, x_new, k=3):
    distances = [l2_distance(x, x_new) for x in X_train]
    k_indices = np.argsort(distances)[:k]
    k_labels = y_train[k_indices]
    # Majority "house"
    values, counts = np.unique(k_labels, return_counts=True)
    return values[np.argmax(counts)]

How this works

Here what we do is find the l2 distance between all the training dataset and testing data se
Then we find k(here 3) nearest item's index.
Fetch labels in those postions
Then group label by occurance
Then return the label with max occurance

Make the Magic Happen!

test_students = np.array([
    [8, 4, 5, 1],   # Brave and daring
    [4, 7, 4, 3],   # Super brainy
    [6, 6, 8, 2],   # Loyal and kind
    [4, 4, 6, 9]    # Ambitious and resourceful
])

for i, student in enumerate(test_students, 1):
    house = sort_hat_predict(X_train, y_train, student, k=3)
    print(f"Student {i} (traits: {student}) goes to... {house}!")

Output:

Student 1 (traits: [8 4 5 1]) goes to... Gryffindor!
Student 2 (traits: [4 7 4 3]) goes to... Ravenclaw!
Student 3 (traits: [6 6 8 2]) goes to... Hufflepuff!
Student 4 (traits: [4 4 6 9]) goes to... Slytherin!

What I Learned

Logic behind k-NN:
Translates number crunching into something familiar, fun, and easy to visualize.

What’s Next?

In the upcoming post we will learn about the basic of probability