Regular expression selector
Overview
The regular expression selector is the most powerful selector in pdf2Data's toolbox. Unsurprisingly then, it is also the least user-friendly selector.
It implements the standard regular expression search, and accordingly requires knowledge of RegExp syntax from a user.
Most of the data you require from a PDF can be extracted without this selector, you can follow the Getting started for example usage. However, if you feel passionate about rex exps, you don't need anything but the regular expression selector for data extraction.
You can specify two-line regular expression, however in the majority of cases this can be replaced by using the Paragraph selector.
Parameters
Pattern
This selector has only one mandatory parameter - Pattern, that contains a regular expression to be found in a PDF. The regular expressions may also contain groups defined within round brackets. In this case, only the string captured by the group within brackets will be extracted.
For example, pattern Invoice\s+(\d{3})
returns a 3-digit number that appears after the word "Invoice",
this number should be separated from "Invoice" by one or more spaces.
Result overview
Resultant text will be presented in lines (see type of output in Picker selector).
The format and example of the actual result produced by the pdf2Data Engine is described in Recognition result specification.
Specification
To see more information about properties and expert usage visit specification page.