Exploring the difflib Module in Python

Posted by Afsal on 26-Jul-2024

The difflib module in Python is a powerful tool for comparing sequences, such as strings, lists, or lines of text. It is particularly useful for generating differences or "diffs" between two sequences. In this post, we'll explore the basics of diff modules. Let us dive into the code

difflib.SequenceMatcher

The SequenceMatcher class is used to compare pairs of sequences of any type. It provides a way to identify matching blocks and differences between sequences.

Code

import difflib

seq1 = "abcdef"
seq2 = "abdfgh"

s = difflib.SequenceMatcher(None, seq1, seq2)

for tag, i1, i2, j1, j2 in s.get_opcodes():
    print(f"{tag} seq1[{i1}:{i2}] ({seq1[i1:i2]}) seq2[{j1}:{j2}] ({seq2[j1:j2]})")

Output

equal seq1[0:2] (ab) seq2[0:2] (ab)

delete seq1[2:3] (c) seq2[2:2] ()

equal seq1[3:4] (d) seq2[2:3] (d)

delete seq1[4:5] (e) seq2[3:3] ()

equal seq1[5:6] (f) seq2[3:4] (f)

insert seq1[6:6] () seq2[4:6] (gh)

difflib.get_close_matches

The get_close_matches function finds the best matches for a word from a list of possibilities. It is useful for spell-checking and auto-completion.

code

import difflib

word = "ape"

words_list = ["apple", "grapes", "ape", "cat", "app"]

matches = difflib.get_close_matches(word, words_list)

print(matches)

Output

['ape', 'apple', 'grapes']

Difflib.ndiff

The ndiff function compares sequences of lines of text and produces a human-readable delta. It is great for generating simple, readable diffs.

Code

import difflib

text1 = """line 1
line 2
line 3
line 4"""

text2 = """line 1
line 3
line 4
line 5"""

text1_lines = text1.splitlines()
text2_lines = text2.splitlines()


diff = difflib.ndiff(text1_lines, text2_lines)
print('\n'.join(diff))

Output

 line 1

- line 2

  line 3

  line 4

+ line 5

difflib.unified_diff

The unified_diff function generates a delta (difference) in a unified diff format, commonly used by version control systems like Git.

code

import difflib

text1 = """line 1
line 2
line 3
line 4"""

text2 = """line 1
line 3
line 4
line 5"""

text1_lines = text1.splitlines()
text2_lines = text2.splitlines()

diff = difflib.unified_diff(text1_lines, text2_lines, lineterm='')

print('\n'.join(list(diff)))

Output

---

+++

@@ -1,4 +1,4 @@

 line 1

-line 2

 line 3

 line 4

+line 5

difflib.context_diff

The context_diff function generates a delta in a context diff format, which includes additional context around the changes.

code

import difflib

text1 = """line 1
line 2
line 3
line 4"""

text2 = """line 1
line 3
line 4
line 5"""

text1_lines = text1.splitlines()
text2_lines = text2.splitlines()

diff = difflib.context_diff(text1_lines, text2_lines, lineterm='')

print('\n'.join(list(diff)))

Output

***

---

***************

*** 1,4 ****

  line 1

- line 2

  line 3

  line 4

--- 1,4 ----

  line 1

  line 3

  line 4

+ line 5

The difflib module provides a versatile set of tools for comparing sequences and generating differences. You can visit the official documentation if you need to learn more about this package

I hope you have learned something from this post please share valuable suggestion with afsal@parseltongue.co.in