OCR Data Extraction

Automatically extract data from a wide variety of government issued documents to populate your adaptive forms.

There are a number of organizations providing this service and as long as they have well documented REST API’s you can easily integrate with AEM Forms using the data integration capability. For the purpose of this tutorial, I have used ID Analyzer to demonstrate the OCR data extraction of uploaded documents.

The following steps were followed to implement the OCR data extraction with AEM Forms using ID Analyzer service.

Create developer account

Create a developer account with ID Analyzer. Make a note of the API Key. This key will be needed to invoke REST API’s of the ID Analyzer’s service.

Create Swagger/OpenAPI file

OpenAPI Specification (formerly Swagger Specification) is an API description format for REST APIs. An OpenAPI file allows you to describe your entire API, including:

  • Available endpoints (/users?lang=en) and operations on each endpoint (GET /users, POST /users)
  • Operation parameters Input and output for each operation
    Authentication methods
  • Contact information, license, terms of use and other information.
  • API specifications can be written in YAML or JSON. The format is easy to learn and readable to both humans and machines.

To create your first swagger/OpenAPI file, please follow the OpenAPI documentation


AEM Forms supports OpenAPI Specification version 2.0 (fka Swagger).

Use the swagger editor to create your swagger file to describe the operations that send and verify OTP code sent using SMS. The swagger file can be created in JSON or YAML format. The completed swagger file can be downloaded from here

Considerations when defining the swagger file

  • Definitions are required
  • $ref need to be used for method definitions
  • Prefer to have consumes and produces sections defined
  • Do not define inline request body parameters or response parameters. Try to modularize as much as possible. For example the following definition is not supported
 "name": "body",
            "in": "body",
            "required": false,
            "schema": {
              "type": "object",
              "properties": {
                "Rollnum": {
                  "type": "string",
                  "description": "Rollnum"

The following is supported with a reference to requestBody definition

 "name": "requestBody",
            "in": "body",
            "required": false,
            "schema": {
              "$ref": "#/definitions/requestBody"

Create Data Source

To integrate AEM/AEM Forms with third party applications, we need to create data source in the cloud services configuration. Please use the swagger file to create your data source.

Create Form Data Model

AEM Forms data integration provides an intuitive user interface to create and work with form data models. Base the form data model on the data source created in the earlier step.


Create Client Lib

We would need to get base64 encoded string of the uploaded document. This base64 encoded string is then passed as one of the parameters of our REST invocation.
The client library can be downloaded from here.

Create Adaptive Form

Integrate the POST invocations of the Form Data Model with your adaptive form to extract data from the uploaded document by the user in the form. You are free to create your own adaptive form and use the form data model’s POST invocation to send the base64 encoded string of the uploaded document.

Deploy on your server

If you want to use the sample assets with your API key please follow the following steps:

On this page