Hi Pythonistas!
Today we will learn about PyPDF2, which can be used for reading contents from PDF files, merging 2 pdf files, rotation pdf files etc. Today we will learn how to extract text from PDF files.
Let us dive into the code
Installation
pip install PyPDF2
Extracting text from PDF
from PyPDF2 import PdfReader
reader = PdfReader("sample.pdf")
pages = reader.pages
for page in pages:
text = page.extract_text()
print(text)
Explanation
PdfReader("sample.pdf") - read pdf file with name sample.pdf
reader.pages - Get all pages output will be an iterable
page.extract_text() - Extract the text from the page
Output
A Simple PDF File
This is a small demonstration .pdf file -
just for use in the Virtual Mechanics tutorials. More text. And more
text. And more text. And more text. And more text.
And more text. And more text. And more text. And more text. And more
text. And more text. Boring, zzzzz. And more text. And more text. And
more text. And more text. And more text. And more text. And more text.
And more text. And more text.
And more text. And more text. And more text. And more text. And more
text. And more text. And more text. Even more. Continued on page 2 ...
Simple PDF File 2
...continued from page 1. Yet more text. And more text. And more text.
And more text. And more text. And more text. And more text. And more
text. Oh, how boring typing this stuff. But not as boring as watching
paint dry. And more text. And more text. And more text. And more text.
Boring. More, a little more text. The end, and just as well.
Reading PDF is as simple as this using pyPDF2. In the upcoming posts we will learn about how to merge 2 PDFs using this package.
I hope you have learned something new from this post. Please share your valuable suggestions with afsal@parseltongue.co.in