Tuesday, 22 March 2022

Extract Images from a PDF Document in Java

Extracting images from a PDF file is not the same as converting a PDF file to an image. When you extract images from a PDF file, you can save the images to a folder while converting a PDF file to an image will make the entire document an image. The main focus of this article will show you how to programmatically extract images from a PDF document using Java codes.

DEPENDENCY

Before running codes, you’re required to add Spire.Pdf.jar to your Java program. You can get it from this link. Or if you use Maven, just type the following code in the pom.xml file to easily import the JAR file.

<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.pdf.free</artifactId>
<version>5.1.0</version>
</dependency>
</dependencies>

USING THE CODE

The following are detailed steps to extract images from a PDF document.

l  Create a PdfDocument instance and load a PDF sample document using PdfDocument.loadFromFile() method.

l  Loop through all pages of the document and extract images from the given page using PdfPageBase.extractImages() method.

l  Specify the path and name of the output document.

l  Save images as .png files.

import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import javax.imageio.ImageIO;

public class ExtractImage {
public static void main(String[] args) throws IOException {
//create a PdfDocument instance
PdfDocument doc = new PdfDocument();

//load a PDF sample file
doc.loadFromFile("sample.pdf");

//declare an int variable
int index = 0;

//loop through all pages
for (PdfPageBase page : (Iterable<PdfPageBase>) doc.getPages()) {

//extract images from the given page
for (BufferedImage image : page.extractImages()) {

//specify the file path and name
File output = new File("C:\\Users\\Tina\\Desktop\\ExtractedImages\\" + String.format("Image_%d.png", index++));

//save images as .png files
ImageIO.write(image, "PNG", output);
}
}
}
}



No comments:

Post a Comment

Change PDF Versions in Java

In daily work, you might need to change the version of a PDF document you have in order to ensure compatibility with another version which a...