Pdf-to-Text

Extract Text from a PDF file using Python

Languages and Tools

installing libraries

pip install PyPDF2

PyPDF2 is a pure-python PDF library capable of splitting, merging together, cropping,
and transforming the pages of PDF files

Breaking the code

Importing required modules

import PyPDF2

Creating a pdf file object

pdfFileObj = open('file location', 'rb')  #Replace file location

Creating a pdf reader object

pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

Printing number of pages in pdf file

print(pdfReader.numPages)

Creating a page object

pageObj = pdfReader.getPage(0)

Extracting text from page

print(pageObj.extractText())

Closing the pdf file object

pdfFileObj.close()

Submitted By

Ankush Mishra

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
PDFtoText.py		PDFtoText.py
README.md		README.md
_config.yml		_config.yml
pip.png		pip.png
python.png		python.png
vscode.png		vscode.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Pdf-to-Text

Languages and Tools

installing libraries

Breaking the code

Submitted By

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Uh oh!

Uh oh!

Py-geeks/Pdf-to-Text

Folders and files

Latest commit

History

Repository files navigation

Pdf-to-Text

Languages and Tools

installing libraries

Breaking the code

Submitted By

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages