Apache POI (Poor Obfuscation Implementation) is a popular open-source Java library developed by the Apache Software Foundation. POI stands for “Poor Obfuscation Implementation” humorously referencing Microsoft’s proprietary binary file formats. The main purpose of Apache POI is to provide Java developers with a set of APIs that allows them to read, write, and manipulate various Microsoft Office file formats, such as Excel spreadsheets (.xls and .xlsx), Word documents (.doc and .docx), and PowerPoint presentations (.ppt and .pptx).
Brief History of Apache POI
In the early 2000s when the need arose for Java developers to work with Microsoft Office files without getting into the underlying details of file formats, Apache Foundation started working on reverse engineering the Microsoft file formats. This resulted in findings that the formats were poorly obfuscated and were reverse-engineered. That is why the name POI i.e. Poor Obfuscation Implementation. Over the years, the library has undergone significant development, adding support for new features and file formats, improving performance, and enhancing usability.
Supported File Formats
Apache POI supports working with Microsoft Excel, Microsoft Word, and Microsoft PowerPoint file formats.
Apache POI has the following APIs for working with Microsoft Excel Spreadsheets.
HSSF: Horrible Spreadsheet Format – Supports working with 97-2003 and before Excel Spreadsheet XLS file format
XSSF: XML SpreadSheet Format – Supports Office Open XML file format of Excel Spreadsheet XLSX file formats
Apache POI has the following APIs for working with Microsoft Word Documents.
HWPF: Horrible Word Processor Format – read and write Microsoft Word 97-2003 DOC file format
XWPF: XML Word Format – similar feature set to HWPF, but for Office Open XML DOCX file format
Apache POI has the following APIs for working with Microsoft PowerPoint presentations.
HSLF: Horrible Slide Layout Format – Java implementation for Microsoft PowerPoint 97-2003 PPT file format
XSLF: XML Slide Layout Format – Java implementation for Office Open XML Microsoft PowerPoint files i.e. PPTX file format
HSMF: Horrible Stupid Mail Format -Java implementation for Microsoft Outlook MSG file format
HPBF: Horrible PuBlisher Format – Java implementation for Microsoft Publisher PUB file format
HDGF: Horrible DiaGram Format – Java implementation for Microsoft Visio VSD file format
Install Apache POI for Java
As of writing this article, the latest stable release of Apache POI is 5.2.3 which is available to download from the Apache POI website, GitHub, and Maven. We’ll have a look at how you can install the API from Maven as well as download it from the Apache POI website for use in your Java project.
How to install Apache POI from Maven?
Apache has published the Apache POI maven artifacts for automatic installation in Maven projects using the pom.xml files. We can set the dependency in the maven project so that it automatically fetches the jar files used to run the application. Following are the steps to include the dependency in your Maven project’s pom.xml.
Step 1: Open your Maven project in your Java IDE. You can use NetBeans, Eclipse, or IntelliJ IDEA as per your own choice.
Step 2: Add the following dependency to the POM file.
<dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>3.9</version> </dependency>
Step 3: Add the poi component dependency for Office Open XML file formats as follow.
<dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>5.2.3</version> </dependency>
Step 4: Add the commons-io dependency as follow.
<dependency> <groupId>commons-io</groupId> <artifactId>commons-io</artifactId> <version>2.7</version> </dependency>
Step 5: Add the log4j dependency as follow.
<dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-core</artifactId> <version>2.20.0</version> </dependency>
At this stage, your project will fetch the dependencies mentioned above in the pom.xml file and include respective jar files in your project to work with Microsoft Office file formats.
Install Apache POI from GitHub
Apache POI has provided a mirror instance on GitHub to access and download the source code. You can access these from Apache POI GitHub repository.
Apache POI Download
You can also install Apache POI by downloading the latest version of Apache POI from the official download page of Apache. Once downloaded, unzip the contents of the package to a folder and include the jar files in your project to get started with the Apache POI API.
Apache POI Resources
In our upcoming articles, we’ll further write articles with examples about:
- Using Apache POI for working with MS Excel Spreadsheet files
- Using Apache POI for working with MS Word Files
- Using Apache POI for working with MS PowerPoint Presentation Files
So stay tuned for these.