Hi Pythonistas!
In the previous post we have learned the math behind the text matching. Today we are making own text search engine using that concept
What We’re Building
- Enter a search phrase.
- The code finds and ranks your notes/lines/sentences by similarity.
- The result? The most relevant lines pop up first, like magic (or math).
Step-by-Step: Tiny Search Engine in Python
Step 1: Gather Some Notes
notes = [
"Hermione studied spells in the library",
"Harry practiced flying on his broomstick",
"Hagrid took care of the magical creatures",
"Dumbledore gave Harry wise advice",
"Ron played wizard chess in the common room"
]
Step 2: Grab the Needed Tools
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
Step 3: Ask for a Search Query
query = input("Search for: ")
Step 4 : Stick the Query Onto Your Notes, and Vectorize
all_sentences = notes + [query] # Search at the end
vectorizer = CountVectorizer()
vectors = vectorizer.fit_transform(all_sentences)
Step 5: Compare Query to Every Note
sim_scores = cosine_similarity(vectors[-1], vectors[:-1]).flatten()
best_idx = sim_scores.argmax()
if best_idx == 0:
print("No match found")
else:
print("\nMost relevant note:")
print(notes[best_idx])
print(f"Similarity score: {sim_scores[best_idx]:.3f}")
Some Examples
Search for: harry
Most relevant note:
Dumbledore gave Harry wise advice
Similarity score: 0.447
Search for: broomstick
Most relevant note:
Harry practiced flying on his broomstick
Similarity score: 0.408
Search for: ronaldo
No match found
We’re using bag of words to turn text into count lists, and cosine similarity to find the closest matches. It’s fast, totally offline, and beginner-friendly, my kind of Python magic!
Up Next
In the upcoming post we learning about moving shapes with matrix math