The difflib module in Python is a powerful tool for comparing sequences, such as strings, lists, or lines of text. It is particularly useful for generating differences or "diffs" between two sequences. In this post, we'll explore the basics of diff modules. Let us dive into the code
difflib.SequenceMatcher
The SequenceMatcher class is used to compare pairs of sequences of any type. It provides a way to identify matching blocks and differences between sequences.
Code
import difflib
seq1 = "abcdef"
seq2 = "abdfgh"
s = difflib.SequenceMatcher(None, seq1, seq2)
for tag, i1, i2, j1, j2 in s.get_opcodes():
print(f"{tag} seq1[{i1}:{i2}] ({seq1[i1:i2]}) seq2[{j1}:{j2}] ({seq2[j1:j2]})")
Output
equal seq1[0:2] (ab) seq2[0:2] (ab)
delete seq1[2:3] (c) seq2[2:2] ()
equal seq1[3:4] (d) seq2[2:3] (d)
delete seq1[4:5] (e) seq2[3:3] ()
equal seq1[5:6] (f) seq2[3:4] (f)
insert seq1[6:6] () seq2[4:6] (gh)
difflib.get_close_matches
The get_close_matches function finds the best matches for a word from a list of possibilities. It is useful for spell-checking and auto-completion.
code
import difflib
word = "ape"
words_list = ["apple", "grapes", "ape", "cat", "app"]
matches = difflib.get_close_matches(word, words_list)
print(matches)
Output
['ape', 'apple', 'grapes']
Difflib.ndiff
The ndiff function compares sequences of lines of text and produces a human-readable delta. It is great for generating simple, readable diffs.
Code
import difflib
text1 = """line 1
line 2
line 3
line 4"""
text2 = """line 1
line 3
line 4
line 5"""
text1_lines = text1.splitlines()
text2_lines = text2.splitlines()
diff = difflib.ndiff(text1_lines, text2_lines)
print('\n'.join(diff))
Output
line 1
- line 2
line 3
line 4
+ line 5
difflib.unified_diff
The unified_diff function generates a delta (difference) in a unified diff format, commonly used by version control systems like Git.
code
import difflib
text1 = """line 1
line 2
line 3
line 4"""
text2 = """line 1
line 3
line 4
line 5"""
text1_lines = text1.splitlines()
text2_lines = text2.splitlines()
diff = difflib.unified_diff(text1_lines, text2_lines, lineterm='')
print('\n'.join(list(diff)))
Output
---
+++
@@ -1,4 +1,4 @@
line 1
-line 2
line 3
line 4
+line 5
difflib.context_diff
The context_diff function generates a delta in a context diff format, which includes additional context around the changes.
code
import difflib
text1 = """line 1
line 2
line 3
line 4"""
text2 = """line 1
line 3
line 4
line 5"""
text1_lines = text1.splitlines()
text2_lines = text2.splitlines()
diff = difflib.context_diff(text1_lines, text2_lines, lineterm='')
print('\n'.join(list(diff)))
Output
***
---
***************
*** 1,4 ****
line 1
- line 2
line 3
line 4
--- 1,4 ----
line 1
line 3
line 4
+ line 5
The difflib module provides a versatile set of tools for comparing sequences and generating differences. You can visit the official documentation if you need to learn more about this package
I hope you have learned something from this post please share valuable suggestion with afsal@parseltongue.co.in