Step 1. Add the JitPack repository to your build file
Add it in your root settings.gradle at the end of repositories:
dependencyResolutionManagement {
repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
repositories {
mavenCentral()
maven { url 'https://jitpack.io' }
}
}
Add it in your settings.gradle.kts at the end of repositories:
dependencyResolutionManagement {
repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
repositories {
mavenCentral()
maven { url = uri("https://jitpack.io") }
}
}
Add to pom.xml
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
Add it in your build.sbt at the end of resolvers:
resolvers += "jitpack" at "https://jitpack.io"
Add it in your project.clj at the end of repositories:
:repositories [["jitpack" "https://jitpack.io"]]
Step 2. Add the dependency
dependencies {
implementation 'com.github.FlaDev:PDFLayoutTextStripper:'
}
dependencies {
implementation("com.github.FlaDev:PDFLayoutTextStripper:")
}
<dependency>
<groupId>com.github.FlaDev</groupId>
<artifactId>PDFLayoutTextStripper</artifactId>
<version></version>
</dependency>
libraryDependencies += "com.github.FlaDev" % "PDFLayoutTextStripper" % ""
:dependencies [[com.github.FlaDev/PDFLayoutTextStripper ""]]
#PDFLayoutTextStripper
- Converts a PDF file into a text file while keeping the layout of the original PDF. Useful to extract the content from a table or a form in a PDF file. PDFLayoutTextStripper is a subclass of PDFTextStripper class (from the Apache PDFBox library).
Data extraction from a form in a PDF file
1) Install apache pdfbox through Maven (to get the v1.8.13 click here )
warning: currently only pdfbox versions strictly inferior to version 2.0.0 are compatible with PDFLayoutTextStripper.java
2) Copy PDFLayoutTextStripper.java inside your main java package
package pdftest.pt;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;
public class test {
public static void main(String[] args) {
String string = null;
try {
PDFParser pdfParser = new PDFParser(new FileInputStream("sample.pdf"));
pdfParser.parse();
PDDocument pdDocument = new PDDocument(pdfParser.getDocument());
PDFTextStripper pdfTextStripper = new PDFLayoutTextStripper();
string = pdfTextStripper.getText(pdDocument);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
};
System.out.println(string);
}
}