Skip to main content

Command line interface

pdf2Data Command Line Interface allows extracting data from PDF files from the command line. The output format for data extraction is XML or JSON

System requirements

  • Java 8
  • Recommended minimal hardware configuration:
    • 2 core CPU
    • Memory: 2 GB
    • Temp storage: 2 GB free disk space

It is possible to use pdf2Data from the command line as long as you have Java 8 installed.

Installation

Download the CLI application from the Artifactory.

Usually, you don`t need to configure your environment specifically, as long as you have Java 8, you can use pdf2Data CLI from the command line.

The steps are similar to the ones you would typically do in code.

Using pdf2Data

important

Commands below assume the current directory in the command line is the same as where you save the downloaded CLI build.

PDF to XML parsing
java -jar cli.jar recognize -t template.p2d -s file\_for\_parsing.pdf -p recognized.pdf -x recognized.xml -l license.json
PDF to JSON parsing
java -jar cli.jar recognize -t template.p2d -s file\_for\_parsing.pdf -p recognized.pdf -j recognized.json -l license.json

Process pdf2Date 4.0 template

CLI is also a convenient tool to prepare Templates for being used with SDK. Another option is to use pdf2Data UI in full mode.

note

The preprocessing step for .p2dta files is required only for pdf2Data versions earlier than 4.3.0.

java -jar cli.jar preprocess -s template.p2dta -d template.p2d

Help information

java -jar cli.jar -h
java -jar cli.jar --help
java -jar cli.jar preprocess -h
java -jar cli.jar preprocess --help
java -jar cli.jar recognize -h
java -jar cli.jar recognize --help

Deprecated API

caution

Note that recognize command was introduced since 4.4.0 and will produce the results in new refined format. Versions before 5.0.0 will still contain legacy parse command which produces old result format but since 5.0.0 it is going to be dropped, so it is recommended to migrate and new command and format.