Command line interface
pdf2Data Command Line Interface allows extracting data from PDF files from the command line. The output format for data extraction is XML or JSON
System requirements
- Java 8
- Recommended minimal hardware configuration:
- 2 core CPU
- Memory: 2 GB
- Temp storage: 2 GB free disk space
It is possible to use pdf2Data from the command line as long as you have Java 8 installed.
Installation
Download the CLI application from the Artifactory.
Usually, you don`t need to configure your environment specifically, as long as you have Java 8, you can use pdf2Data CLI from the command line.
The steps are similar to the ones you would typically do in code.
Using pdf2Data
Commands below assume the current directory in the command line is the same as where you save the downloaded CLI build.
PDF to XML parsing
java -jar cli.jar recognize -t template.p2d -s file\_for\_parsing.pdf -p recognized.pdf -x recognized.xml -l license.json
PDF to JSON parsing
java -jar cli.jar recognize -t template.p2d -s file\_for\_parsing.pdf -p recognized.pdf -j recognized.json -l license.json
Process pdf2Date 4.0 template
CLI is also a convenient tool to prepare Templates for being used with SDK. Another option is to use pdf2Data UI in full mode.
The preprocessing step for .p2dta files is required only for pdf2Data versions earlier than 4.3.0.
java -jar cli.jar preprocess -s template.p2dta -d template.p2d
Help information
java -jar cli.jar -h
java -jar cli.jar --help
java -jar cli.jar preprocess -h
java -jar cli.jar preprocess --help
java -jar cli.jar recognize -h
java -jar cli.jar recognize --help
Deprecated API
Note that recognize
command was introduced since 4.4.0 and will produce the results in new refined format.
Versions before 5.0.0 will still contain legacy parse
command which produces old result format
but since 5.0.0 it is going to be dropped, so it is recommended to migrate and new command and format.