Tina's blog: Copy Table Data from PDF to Excel in Java

Monday, 18 April 2022

Copy Table Data from PDF to Excel in Java

The PDF is a document exchange and storage program, while Excel is a spreadsheet program. In some cases, you may need to copy or import table from PDF to Excel for analyzing and editing data using Excel. This article will demonstrate how to programmatically copy or import table from PDF to Excel using Java.

DEPENDENCY

Before running codes, you need to import a JAR file to the Java program. Download Free Spire.Office for Java from this link, find Spire.Office.jar in the lib folder and add it to your program. Or if you use Maven, it’s easier to install the Jar file and you just need to type the following code in the pom.xml file.

<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId> e-iceblue </groupId>
<artifactId>spire.office.free</artifactId>
<version>5.3.1</version>
</dependency>
</dependencies>

USING THE CODE

With Free Spire.Office for Java, you can extract all tables from a single page of a PDF document and save them as individual Excel worksheets. The following are detailed steps.

l Load a PDF sample document while initializing the PdfDocument object.

l Create a PdfTableExtractor instance, and use PdfTableExtractor.extactTable(int pageIndex) method to extract all tables in a specified page.

l Create a Workbook instance, and remove all default worksheets using Workbook.getWorksheets().clear() method.

l Loop through tables in the PdfTable[] array, and get the specific one by its index.

l Add a worksheet to the workbook using Workbook.getWorksheets.add() method.

l Loop through rows and columns in the PDF table, and get the value of a specific cell using PdfTable.getText(int rowIndex, int columnIndex) method. Then insert the value to the worksheet using Worksheet.get(int row, int column).setText(String string) method.

l Save the workbook to an Excel file using Workbook.saveToFile() method.

import com.spire.pdf.PdfDocument;

import com.spire.pdf.utilities.PdfTable;
import com.spire.pdf.utilities.PdfTableExtractor;
import com.spire.xls.ExcelVersion;
import com.spire.xls.Workbook;
import com.spire.xls.Worksheet;

public class ExtractTableDataAndSaveInExcel {
    public static void main(String[] args) {
        //Load a PDF sample document
        PdfDocument pdf = new PdfDocument("C:\\Users\\Tina\\Desktop\\sample.pdf");

        //Create a PdfTableExtractor instance
        PdfTableExtractor extractor = new PdfTableExtractor(pdf);

        //Extract tables from the first page
        PdfTable[] pdfTables  = extractor.extractTable(0);

        //Create a Workbook instance
        Workbook wb = new Workbook();

        //Remove default worksheets
        wb.getWorksheets().clear();

        //If tables  are found
        if (pdfTables != null && pdfTables.length > 0) {

            //Loop through the tables
            for (int tableNum = 0; tableNum < pdfTables.length; tableNum++) {

                //Add a worksheet to workbook
                String sheetName = String.format("Table - %d", tableNum + 1);
                Worksheet sheet = wb.getWorksheets().add(sheetName);

                //Loop through rows in the current table
                for (int rowNum = 0; rowNum < pdfTables[tableNum].getRowCount(); rowNum++) {

                    //Loop through  columns in the current table
                    for (int colNum = 0; colNum < pdfTables[tableNum].getColumnCount(); colNum++) {

                        //Extract data from the current table cell
                        String text = pdfTables[tableNum].getText(rowNum, colNum);

                        //Insert data into a specific cell
                        sheet.get(rowNum + 1, colNum + 1).setText(text);

                    }
                }

                //Auto fit column width
                for (int sheetColNum = 0; sheetColNum < sheet.getColumns().length; sheetColNum++) {
                    sheet.autoFitColumn(sheetColNum + 1);
                }
            }
        }

        //Save the workbook to an Excel file
        wb.saveToFile("output/ExportTableToExcel.xlsx", ExcelVersion.Version2016);
    }
    }

Tina's blog

Monday, 18 April 2022

Copy Table Data from PDF to Excel in Java

No comments:

Post a Comment

Change PDF Versions in Java

Search This Blog