Programmatically Disassembling PDF Documents programmatically-disassembling-pdf-documents
Samples and examples in this document are only for AEM Forms on JEE environment.
You can disassemble a PDF document by passing it to the Assembler service. Typically, this task is useful when the PDF document was originally created from many individual documents, such as a collection of statements. In the following illustration, DocA is divided into multiple resultant documents, where the first level 1 bookmark on a page identifies the start of a new resultant document.
To disassemble a PDF document, ensure that the PDFsFromBookmarks element is in the DDX document. The PDFsFromBookmarks element is a resultant element and can be only a child element of the DDX element. It does not have a result attribute because it can result in the generation of multiple documents.
The PDFsFromBookmarks element causes a single document to be generated for each level 1 bookmark in the source document.
For the purpose of this discussion, assume the following DDX document is used.
<?xml version="1.0" encoding="UTF-8"?>
<DDX xmlns="https://ns.adobe.com/DDX/1.0/">
<PDFsFromBookmarks prefix="stmt">
<PDF source="AssemblerResultPDF.pdf"/>
</PDFsFromBookmarks>
</DDX>
invokeOneDocument operation. However, to disassemble a PDF document, use the invokeDDX operation because although one input PDF document is passed to the Assembler service, the Assembler service returns a collection object that contains one or more documents.Summary of steps summary-of-steps
To disassemble a PDF document, perform the following tasks:
- Include project files.
- Create a PDF Assembler client.
- Reference an existing DDX document.
- Reference a PDF document to disassemble.
- Set run-time options.
- Disassemble the PDF document.
- Save the disassembled PDF documents.
Include project files
Include the necessary files in your development project. If you are creating a client application by using Java, include the necessary JAR files. If you are using web services, ensure that you include the proxy files.
The following JAR files must be added to your project’s class path:
- adobe-livecycle-client.jar
- adobe-usermanager-client.jar
- adobe-assembler-client.jar
- adobe-utilities.jar (required if AEM Forms is deployed on JBoss)
- jbossall-client.jar (required if AEM Forms is deployed on JBoss)
if AEM Forms is deployed on a supported J2EE application server that is not JBoss, you must replace adobe-utilities.jar and jbossall-client.jar with JAR files that are specific to the J2EE application server on which AEM Forms is deployed.
Create a PDF Assembler client
Before you can programmatically perform an Assembler operation, you must create an Assembler service client.
Reference an existing DDX document
A DDX document must be referenced to disassemble a PDF document. This DDX document must contain the PDFsFromBookmarks element.
Reference a PDF document to disassemble
To disassemble a PDF document, reference a PDF file that represents the PDF document to disassemble. When passed to the Assembler service, a separate PDF document is returned for each level 1 bookmark in the document.
Set run-time options
You can set run-time options that control the behaviour of the Assembler service while it performs a job. For example, you can set an option that instructs the Assembler service to continue processing a job if an error is encountered.
Disassemble the PDF document
After you create the Assembler service client, reference the DDX document, reference a PDF document to disassemble, and set run-time options, you can disassemble a PDF document by invoking the invokeDDX method. Provided that the DDX document contains instructions to disassemble the PDF document, the Assembler service returns disassembled PDF documents within a collection object.
Save the disassembled PDF documents
All disassembled PDF documents are returned within a collection object. Iterate through the collection object and save each PDF document as a PDF file.
See also
Disassemble a PDF document using the Java API disassemble-a-pdf-document-using-the-java-api
Disassemble a PDF document by using the Assembler Service API (Java):
-
Include project files.
Include client JAR files, such as adobe-assembler-client.jar, in your Java project’s class path.
-
Create a PDF Assembler client.
- Create a
ServiceClientFactoryobject that contains connection properties. - Create an
AssemblerServiceClientobject by using its constructor and passing theServiceClientFactoryobject.
- Create a
-
Reference an existing DDX document.
- Create a
java.io.FileInputStreamobject that represents the DDX document by using its constructor and passing a string value that specifies the location of the DDX file. - Create a
com.adobe.idp.Documentobject by using its constructor and passing thejava.io.FileInputStreamobject.
- Create a
-
Reference a PDF document to disassemble.
-
Create a
java.util.Mapobject that is used to store input PDF documents by using aHashMapconstructor. -
Create a
java.io.FileInputStreamobject by using its constructor and passing the location of the PDF document to disassemble. -
Create a
com.adobe.idp.Documentobject and pass thejava.io.FileInputStreamobject that contains the PDF document to disassemble. -
Add an entry to the
java.util.Mapobject by invoking itsputmethod and passing the following arguments:- A string value that represents the key name. This value must match the value of the PDF source element specified in the DDX document.
- A
com.adobe.idp.Documentobject that contains the PDF document to disassemble.
-
-
Set run-time options.
- Create an
AssemblerOptionSpecobject that stores run-time options by using its constructor. - Set run-time options to meet your business requirements by invoking a method that belongs to the
AssemblerOptionSpecobject. For example, to instruct the Assembler service to continue processing a job when an error occurs, invoke theAssemblerOptionSpecobject’ssetFailOnErrormethod and passfalse.
- Create an
-
Disassemble the PDF document.
Invoke the
AssemblerServiceClientobject’sinvokeDDXmethod and pass the following required values:- A
com.adobe.idp.Documentobject that represents the DDX document to use - A
java.util.Mapobject that contains the PDF document to disassemble - A
com.adobe.livecycle.assembler.client.AssemblerOptionSpecobject that specifies the run-time options, including the default font and the job log level
The
invokeDDXmethod returns acom.adobe.livecycle.assembler.client.AssemblerResultobject that contains the disassembled PDF documents and any exceptions that occurred. - A
-
Save the disassembled PDF documents.
To obtain the disassembled PDF documents, perform the following actions:
- Invoke the
AssemblerResultobject’sgetDocumentsmethod. This returns ajava.util.Mapobject. - Iterate through the
java.util.Mapobject until you find the resultantcom.adobe.idp.Documentobject. - Invoke the
com.adobe.idp.Documentobject’scopyToFilemethod to extract the PDF document.
- Invoke the
See also
Programmatically Disassembling PDF Documents
Quick Start (SOAP mode): Disassembling a PDF document using the Java API
Disassemble a PDF document using the web service API disassemble-a-pdf-document-using-the-web-service-api
Disassemble a PDF document by using the Assembler Service API (web service):
-
Include project files.
Create a Microsoft .NET project that uses MTOM. Ensure that you use the following WSDL definition when setting a service reference:
http://localhost:8080/soap/services/AssemblerService?WSDL&lc_version=9.0.1.note note NOTE Replace localhostwith the IP address of the server hosting AEM Forms. -
Create a PDF Assembler client.
-
Create an
AssemblerServiceClientobject by using its default constructor. -
Create an
AssemblerServiceClient.Endpoint.Addressobject by using theSystem.ServiceModel.EndpointAddressconstructor. Pass a string value that specifies the WSDL to the AEM Forms service (for example,http://localhost:8080/soap/services/AssemblerService?blob=mtom). You do not need to use thelc_versionattribute. This attribute is used when you create a service reference. -
Create a
System.ServiceModel.BasicHttpBindingobject by getting the value of theAssemblerServiceClient.Endpoint.Bindingfield. Cast the return value toBasicHttpBinding. -
Set the
System.ServiceModel.BasicHttpBindingobject’sMessageEncodingfield toWSMessageEncoding.Mtom. This value ensures that MTOM is used. -
Enable basic HTTP authentication by performing the following tasks:
- Assign the AEM forms user name to the field
AssemblerServiceClient.ClientCredentials.UserName.UserName. - Assign the corresponding password value to the field
AssemblerServiceClient.ClientCredentials.UserName.Password. - Assign the constant value
HttpClientCredentialType.Basicto the fieldBasicHttpBindingSecurity.Transport.ClientCredentialType. - Assign the constant value
BasicHttpSecurityMode.TransportCredentialOnlyto the fieldBasicHttpBindingSecurity.Security.Mode.
- Assign the AEM forms user name to the field
-
-
Reference an existing DDX document.
- Create a
BLOBobject by using its constructor. TheBLOBobject is used to store the DDX document. - Create a
System.IO.FileStreamobject by invoking its constructor. Pass a string value that represents the file location of the DDX document and the mode in which to open the file. - Create a byte array that stores the content of the
System.IO.FileStreamobject. You can determine the size of the byte array by getting theSystem.IO.FileStreamobject’sLengthproperty. - Populate the byte array with stream data by invoking the
System.IO.FileStreamobject’sReadmethod and passing the byte array, the starting position, and the stream length to read. - Populate the
BLOBobject by assigning itsMTOMproperty with the contents of the byte array.
- Create a
-
Reference a PDF document to disassemble.
- Create a
BLOBobject by using its constructor. TheBLOBobject is used to store the input PDF document. ThisBLOBobject is passed to theinvokeOneDocumentas an argument. - Create a
System.IO.FileStreamobject by invoking its constructor and passing a string value that represents the file location of the input PDF document and the mode in which to open the file. - Create a byte array that stores the content of the
System.IO.FileStreamobject. You can determine the size of the byte array by getting theSystem.IO.FileStreamobject’sLengthproperty. - Populate the byte array with stream data by invoking the
System.IO.FileStreamobject’sReadmethod and passing the byte array, the starting position, and the stream length to read. - Populate the
BLOBobject by assigning itsMTOMfield the contents of the byte array. - Create a
MyMapOf_xsd_string_To_xsd_anyTypeobject. This collection object is used to store the PDF to disassemble. - Create a
MyMapOf_xsd_string_To_xsd_anyType_Itemobject. - Assign a string value that represents the key name to the
MyMapOf_xsd_string_To_xsd_anyType_Itemobject’skeyfield. This value must match the value of the PDF source element specified in the DDX document. - Assign the
BLOBobject that stores the PDF document to theMyMapOf_xsd_string_To_xsd_anyType_Itemobject’svaluefield. - Add the
MyMapOf_xsd_string_To_xsd_anyType_Itemobject to theMyMapOf_xsd_string_To_xsd_anyTypeobject. Invoke theMyMapOf_xsd_string_To_xsd_anyTypeobject’Addmethod and pass theMyMapOf_xsd_string_To_xsd_anyTypeobject.
- Create a
-
Set run-time options.
- Create an
AssemblerOptionSpecobject that stores run-time options by using its constructor. - Set run-time options to meet your business requirements by assigning a value to a data member that belongs to the
AssemblerOptionSpecobject. For example, to instruct the Assembler service to continue processing a job when an error occurs, assignfalseto theAssemblerOptionSpecobject’sfailOnErrorfield.
- Create an
-
Disassemble the PDF document.
Invoke the
AssemblerServiceClientobject’sinvokeDDXmethod and pass the following values:- A
BLOBobject that represents the DDX document that disassembles the PDF document - The
MyMapOf_xsd_string_To_xsd_anyTypeobject that contains the PDF document to disassemble - An
AssemblerOptionSpecobject that specifies run-time options
The
invokeDDXmethod returns anAssemblerResultobject that contains the job results and any exceptions that occurred. - A
-
Save the disassembled PDF documents.
To obtain the newly created PDF documents, perform the following actions:
- Access the
AssemblerResultobject’sdocumentsfield, which is aMapobject that contains the disassembled PDF documents. - Iterate through the
Mapobject to obtain each resultant document. Then, cast that array member’svalueto aBLOB. - Extract the binary data that represents the PDF document by accessing its
BLOBobject’sMTOMproperty. This returns an array of bytes that you can write out to a PDF file.
- Access the
See also
Programmatically Disassembling PDF Documents