FlaDev / PDFLayoutTextStripper Download

Step 1. Add the JitPack repository to your build file

Add it in your root settings.gradle at the end of repositories:

	dependencyResolutionManagement {
		repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
		repositories {
			mavenCentral()
			maven { url 'https://jitpack.io' }
		}
	}

Add it in your settings.gradle.kts at the end of repositories:

	dependencyResolutionManagement {
		repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
		repositories {
			mavenCentral()
			maven { url = uri("https://jitpack.io") }
		}
	}

Add to pom.xml

	<repositories>
		<repository>
		    <id>jitpack.io</id>
		    <url>https://jitpack.io</url>
		</repository>
	</repositories>

Add it in your build.sbt at the end of resolvers:

 
    resolvers += "jitpack" at "https://jitpack.io"

Add it in your project.clj at the end of repositories:

 
    :repositories [["jitpack" "https://jitpack.io"]]

Step 2. Add the dependency

	dependencies {
		implementation 'com.github.FlaDev:PDFLayoutTextStripper:'
	}

	dependencies {
		implementation("com.github.FlaDev:PDFLayoutTextStripper:")
	}

	<dependency>
	    <groupId>com.github.FlaDev</groupId>
	    <artifactId>PDFLayoutTextStripper</artifactId>
	    <version></version>
	</dependency>


                            
    libraryDependencies += "com.github.FlaDev" % "PDFLayoutTextStripper" % ""


                            
    :dependencies [[com.github.FlaDev/PDFLayoutTextStripper ""]]

#PDFLayoutTextStripper

- Converts a PDF file into a text file while keeping the layout of the original PDF. Useful to extract the content from a table or a form in a PDF file. PDFLayoutTextStripper is a subclass of PDFTextStripper class (from the Apache PDFBox library).

Use cases
How to install
How to use

Use cases

Data extraction from a table in a PDF file

Data extraction from a form in a PDF file example

How to install

1) Install apache pdfbox through Maven (to get the v1.8.13 click here )

warning: currently only pdfbox versions strictly inferior to version 2.0.0 are compatible with PDFLayoutTextStripper.java

2) Copy PDFLayoutTextStripper.java inside your main java package

How to use

package pdftest.pt;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;

public class test {

	public static void main(String[] args) {
		String string = null;
        try {
            PDFParser pdfParser = new PDFParser(new FileInputStream("sample.pdf"));
            pdfParser.parse();
            PDDocument pdDocument = new PDDocument(pdfParser.getDocument());
            PDFTextStripper pdfTextStripper = new PDFLayoutTextStripper();
            string = pdfTextStripper.getText(pdDocument);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        };
        System.out.println(string);
	}

}

FlaDev/PDFLayoutTextStripper

Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).

Download

Readme

Use cases

Data extraction from a table in a PDF file

How to install

How to use