Automate content extraction

Learn how to automate the extraction of content from a PDF document using the PDF Extract API. Extracting PDF content helps unlock critical business data, which can then be used for a variety of downstream processes.

Transcript
Learn how to automate the extraction of content from a PDF document using the PDF Extract API. Extracting PDF content helps unlock critical business data which can then be used for a variety of downstream processes. The PDF Extract API can help extract PDF content along with the content structure and reading order using Adobe Sensei AI. There are many options for how the Extract API can be invoked, such as using a programing language with the rest API. Or in our example here we’re going to use power Automate Microsoft’s low Code Automation Solution. The first step to get started is to generate the required credentials to invoke Acrobat services. To do so, go to developer DOT, Adobe Icon Select Create new project. Then add API document cloud in PDF Services API and then select next. There are two options for authentication. The connectors for power automate have recently been updated to include both. This is the preferred method as the JWT authentication is being deprecated. Select the Enterprise PDF Services Developer profile and then save the configured API. Next, you’ll need to generate an access token and once you’ve generated the access token, you’ll now copy the required information into the Adobe Services Power Automate Connector Configuration. Now let’s go ahead and create a new flow. Select automated cloud flow and create a name. Our flow can be triggered by many different events, but this flow here is triggered by a new document being added to a SharePoint folder. This is our source PDF to extract content from the parameters highlighted in red are the parameters that need to be customized when extracting content from a PDF. For this SharePoint action, we need to input a SharePoint site address and a folder ID. The second action here in our flow is to call the PDF Extract API using the Power Automate Acrobat Services Connector. It requires two inputs the document from which you want to extract the content, which in this case is the document uploaded to the SharePoint folder and an instruction that defines what to be extracted, such as images or text. So we’ll go ahead and add a new connection to the Extract API. After adding a connection name, we’ll copy the values from the acrobat services credentials that we created into the fields required for the Acrobat Services Power Automate Connector. Here are the client ID in the client secret values that we copied and now we can create our connection. The last section in this flow is to create a file in SharePoint, and the highlighted parameters show what the file should be named, what the content of the file is, and where it should be saved. This is a standard PDF document. It contains several different kinds of content text headers, fonts, various text position images and tables. The API allows you to select what you would like to extract. Uploading the PDF document into the SharePoint folder triggers the power automate flow. Then the next section calls the PDF Extract API via the Power Automate Acrobat Services Connector, and then our last section in the flow puts the results into another SharePoint folder. Let’s go ahead and see the final generated JSON output. Here you can see the associated PDF in the JSON and the corresponding content that has been extracted into the JSON output document, and that’s how PDF Extract API can help automate the extraction of content from long form documents such as contracts and reports that can be used downstream in qualitative analysis processes.
recommendation-more-help
61c3404d-2baf-407c-beb9-87b95f86ccab