Monday, 18 April 2022

Copy Table Data from PDF to Excel in Java

The PDF is a document exchange and storage program, while Excel is a spreadsheet program. In some cases, you may need to copy or import table from PDF to Excel for analyzing and editing data using Excel. This article will demonstrate how to programmatically copy or import table from PDF to Excel using Java.

DEPENDENCY

Before running codes, you need to import a JAR file to the Java program. Download Free Spire.Office for Java from this link, find Spire.Office.jar in the lib folder and add it to your program. Or if you use Maven, it’s easier to install the Jar file and you just need to type the following code in the pom.xml file.

<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId> e-iceblue </groupId>
<artifactId>spire.office.free</artifactId>
<version>5.3.1</version>
</dependency>
</dependencies>

USING THE CODE

With Free Spire.Office for Java, you can extract all tables from a single page of a PDF document and save them as individual Excel worksheets. The following are detailed steps.

l  Load a PDF sample document while initializing the PdfDocument object.

l  Create a PdfTableExtractor instance, and use PdfTableExtractor.extactTable(int pageIndex) method to extract all tables in a specified page.

l  Create a Workbook instance, and remove all default worksheets using Workbook.getWorksheets().clear() method.

l  Loop through tables in the PdfTable[] array, and get the specific one by its index.

l  Add a worksheet to the workbook using Workbook.getWorksheets.add() method.

l  Loop through rows and columns in the PDF table, and get the value of a specific cell using PdfTable.getText(int rowIndex, int columnIndex) method. Then insert the value to the worksheet using Worksheet.get(int row, int column).setText(String string) method.

l  Save the workbook to an Excel file using Workbook.saveToFile() method.

 import com.spire.pdf.PdfDocument;

import com.spire.pdf.utilities.PdfTable;
import com.spire.pdf.utilities.PdfTableExtractor;
import com.spire.xls.ExcelVersion;
import com.spire.xls.Workbook;
import com.spire.xls.Worksheet;

public class ExtractTableDataAndSaveInExcel {
public static void main(String[] args) {
//Load a PDF sample document
PdfDocument pdf = new PdfDocument("C:\\Users\\Tina\\Desktop\\sample.pdf");

//Create a PdfTableExtractor instance
PdfTableExtractor extractor = new PdfTableExtractor(pdf);

//Extract tables from the first page
PdfTable[] pdfTables = extractor.extractTable(0);

//Create a Workbook instance
Workbook wb = new Workbook();

//Remove default worksheets
wb.getWorksheets().clear();

//If tables are found
if (pdfTables != null && pdfTables.length > 0) {

//Loop through the tables
for (int tableNum = 0; tableNum < pdfTables.length; tableNum++) {

//Add a worksheet to workbook
String sheetName = String.format("Table - %d", tableNum + 1);
Worksheet sheet = wb.getWorksheets().add(sheetName);

//Loop through rows in the current table
for (int rowNum = 0; rowNum < pdfTables[tableNum].getRowCount(); rowNum++) {

//Loop through columns in the current table
for (int colNum = 0; colNum < pdfTables[tableNum].getColumnCount(); colNum++) {

//Extract data from the current table cell
String text = pdfTables[tableNum].getText(rowNum, colNum);

//Insert data into a specific cell
sheet.get(rowNum + 1, colNum + 1).setText(text);

}
}

//Auto fit column width
for (int sheetColNum = 0; sheetColNum < sheet.getColumns().length; sheetColNum++) {
sheet.autoFitColumn(sheetColNum + 1);
}
}
}

//Save the workbook to an Excel file
wb.saveToFile("output/ExportTableToExcel.xlsx", ExcelVersion.Version2016);
}
}



No comments:

Post a Comment

Change PDF Versions in Java

In daily work, you might need to change the version of a PDF document you have in order to ensure compatibility with another version which a...