Extract Text from PDF File Using Python

Last Updated: 15 Jan, 2025

Extract Text from PDF File Using Python

In this article, we will let you know how to extract text from PDF file using Python.

PDF stands for Portable Document Format is a popular digital document format. This format is designed to allow documents to be viewed or shared easily and reliably, regardless of software, hardware or operating system. PDF files have the extension .pdf.

To extract text from a PDF file using Python, these libraries are commonly used. We will show you how to extract text from a PDF using both of them.

How to Extract Text from a PDF File Using pypdf in Python

Here are the steps.

Install pypdf
Run the code given in this article
See the output

Install pypdf

You can install pypdf using the following command

pip install pypdf

Sample Code to Extract Text from PDF using pypdf

sample.pdf - Download Link (This sample PDF will be used in the code, but you can certainly use your own PDF.)

screenshot of sample.pdf

Sample Input PDF Screenshot

Code

Here is a complete code example for extracting text from a PDF using pypdf.

Output

Here is the output of the sample code provided above.

How to Extract Text from a PDF File Using PyMuPDF in Python

Here are the steps.

Install PyMuPDF
Run the code given in this article
See the output

Install PyMuPDF

Install PyMuPDF, also known as fitz, using the following command.

pip install pymupdf

Sample Code to Extract Text from PDF using PyMuPDF

We used the same pdf as used before

sample.pdf - Download Link (This sample PDF will be used in the code, but you can certainly use your own PDF.)

Code

Here is a complete code example for extracting text from a PDF using PyMuPDF.

Output

Here is the output of the sample code provided above.

Conclusion

In this article, we provide a sample Python code, a sample file, and their output to demonstrate how to extract text from a PDF using two libraries: PyPDF and PyMuPDF.

If you have any questions or encounter any issues while running the code, feel free to leave a comment in our forums!

Extract Text from PDF File Using Python#

How to Extract Text from a PDF File Using pypdf in Python#

Install pypdf#

Sample Code to Extract Text from PDF using pypdf#

Code#

Output#

How to Extract Text from a PDF File Using PyMuPDF in Python#

Install PyMuPDF#

Sample Code to Extract Text from PDF using PyMuPDF#

Code#

Output#

Conclusion#

See Also#

Extract Text from PDF File Using Python

How to Extract Text from a PDF File Using pypdf in Python

Install pypdf

Sample Code to Extract Text from PDF using pypdf

Code

Output

How to Extract Text from a PDF File Using PyMuPDF in Python

Install PyMuPDF

Sample Code to Extract Text from PDF using PyMuPDF

Code

Output

Conclusion

See Also