Create Watched Folder configuration node

To configure a Watched Folder, create a Watched Folder configuration node. Perform the following steps to create the configuration node:

  1. Login to CRX-DE lite as an administrator and navigate to the /etc/fd/watchfolder/config folder.

  2. Create a node of type nt:unstructured. For example, watchedfolder

    NOTE
    The Watched Folder node name cannot include spaces and special characters.
  3. Add the following properties to the node:

    • folderPath
    • inputProcessorType
    • inputProcessorId
    • outputFilePattern

    For complete list of supported properties, see Watched Folder properties.

  4. Click Save All. After the node is created and the properties are saved. The input, result, failure, preserve, and stagefolders are created at the path specified in the folderPath property.

    The scan-job starts scanning the Watched Folder at a defined time-interval.

Watched Folder properties

You can configure the following properties for a Watched Folder.

  • folderPath (String): The path of the folder to be scanned at a defined time-intervals. For a clustered environment, the folder must be at a shared location with all servers having full access to the server. It is a mandatory property.

  • inputProcessorType (String): The type of the process to start. You can specify workflow, script, or service. It is a mandatory property.

  • inputProcessorId (String): The behavior of the inputProcessorId property is based on the value specified for the inputProcessorType property. It is a mandatory property. The following list details all the possible values of the inputProcessorType property and corresponding requisite for the inputProcessorType property:

    • For workflow, specify the workflow model to be executed. For example, /etc/workflow/models/<workflow_name>/jcr:content/model
    • For script, specify JCR path of the script to be executed. For example, /etc/fd/watchfolder/test/testScript.ecma
    • For service, specify the filter used for locating an OSGi service. The service is registered as an implementation of com.adobe.aemfd.watchfolder.service.api.ContentProcessor Interface.
  • runModes (String): A comma-separated list of allowed run-modes for workflow execution. A few examples are:

    • author

    • publish

    • author, publish

    • publish, author

NOTE
If the server hosting the Watched Folder does not have any of the specified run-mode, then the Watched Folder always activates regardless of the run-modes on the server.
  • outputFilePattern (String): Pattern of the output file. You can specify a folder or file pattern. If a folder pattern is specified, then the output files have names as described in workflows. If a file pattern is specified, the output files have names as described in file pattern. File and folder pattern can also specify a directory structure for the output files. It is a mandatory property.

  • stageFileExpirationDuration (Long, default -1): The number of seconds to wait before an input file/folder which has already been picked up for processing should be treated as having timed out and marked as a failure. This expiration mechanism only activates when the value for this property is a positive number.

NOTE
Even when an input is marked as having timed out using this mechanism, it may still be processing in the background but just taking more time than expected. If the input contents were consumed before the timeout mechanism kicked in, the processing may even proceed to completion later and the output be dumped into the results folder. If the contents were not consumed before the timeout, it is very likely that the processing will error out later on trying to consume the contents, and this error will also be logged in the failure folder for the same input. On the other hand, if the processing for the input never activated due to an intermittent job/workflow misfire (which is the scenario the expiration mechanism aims to address), then neither of these two eventualities will occur. Hence, for any entries in the failure folder which were marked as failures due to a timeout (look for messages of the form “File not processed after a significant amount of time, marking as failure!” in the failure log), it is advisable to scan the result folder (and also the failure folder itself for another entry for the same input) to check whether any of the eventualities described previously actually occurred.
  • deleteExpiredStageFileOnlyWhenThrottled (Boolean, default true): Whether the expiration mechanism should activate only when the watch-folder is throttled. The mechanism is more relevant for throttled watch-folders since a small number of files which are lingering around in an unprocessed state (owing to intermittent job/workflow misfires) have the potential to choke processing for the entire batch when throttling is enabled. If this property is kept as true (the default), the expiration mechanism will not activate for watch-folders which are not throttled. If the property is kept as false, the mechanism will always activate as long as the stageFileExpirationDuration property is a positive number.

  • pollInterval (Long): The interval in seconds for scanning the Watched Folder for input. Unless the Throttle setting is enabled, poll Interval should be longer than the time to process an average job; otherwise, the system may become overloaded. The default value is 5. See the description for Batch Size for additional information. The value of the pollinterval must be greater than or equal to one.

  • excludeFilePattern (String): A semi-colon (;) delimited list of patterns that a Watched Folder uses to determine which files and folders to scan and pick up. Any file or folder with this pattern is not scanned for processing. This setting is useful when the input is a folder with multiple files. The contents of the folder can be copied into a folder with a name that are picked up by the Watched Folder. This prevents the Watched Folder from picking up a folder for processing before the folder is completely copied into the input folder. The default value is null.
    You can use file patterns to exclude:

    • Files with specific filename extensions; for example, *.dat, *.xml, .pdf, *.*

    • Files with specific names; for example, data* would exclude files and folders named data1, data2, and so on.

    • Files with composite expressions in the name and extension, as in these examples:

      • Data[0-9][0-9][0-9].[dD][aA]‘port’
      • *.[dD][Aa]‘port’
      • *.[Xx][Mm][Ll]

For more information about file patterns, see About file patterns.

  • includeFilePattern (String): A semi-colon (;) delimited list of patterns that the Watched Folder uses to determine which folders and files to scan and pick up. For example, if the IncludeFilePattern is input*, all files and folders that match input* are picked up. This includes files and folders named input1, input2, and so on. The default value is * and indicates all files and folders. You can use file patterns to include:

    • Files with specific filename extensions; for example, *.dat, *.xml, .pdf, *.*
    • Files with specific names; for example, data.* would include files and folders named data1, data2, and so on.
  • Files with composite expressions in the name and extension, as in these examples:

    • Data[0-9][0-9][0-9].[dD][aA]‘port’

      • *.[dD][Aa]‘port’
      • *.[Xx][Mm][Ll]

For more information about file patterns, see About file patterns

  • waitTime (Long): The time, in milliseconds, to wait before you scan a folder or file after it is created. For example, if the wait time is 3,600,000 milliseconds (one hour) and the file was created one minute ago, this file will be picked up after 59 or more minutes have passed. The default value is 0. This setting is useful to ensure that a file or folder is completely copied to the input folder. For example, if you have a large file to process and the file takes ten minutes to download, set the wait time to 10*60 *1000 milliseconds. This prevents the Watched Folder from scanning the file if it is not ten minutes old.

  • purgeDuration (Long): Files and folders in the result folder are purged when they are older than this value. This value is measured in days. This setting is useful in ensuring that the result folder does not become full. A value of -1 days indicates to never delete the results folder. The default value is -1.

  • resultFolderName (String): The folder where the saved results are stored. If the results do not appear in this folder, check the failure folder. Read-only files are not processed and are saved in the failure folder. This value can be an absolute or relative path with the following file patterns:

    • %F = filename prefix
    • %E = filename extension
    • %Y = year (full)
    • %y = year (last two digits)
    • %M = month
    • %D = day of month
    • %d = day of year
    • %H = hour (24-hour clock)
    • %h = hour (12-hour clock)
    • %m = minute
    • %s = second
    • %l = millisecond
    • %R = random number (between 0–9)
    • %P = process or job id

    For example, if it is 8 PM on July 17, 2009 and you specify C:/Test/WF0/failure/%Y/%M/%D/%H/, the result folder is C:/Test/WF0/failure/2009/07/17/20

    If the path is not absolute but relative, the folder is created inside the Watched Folder. The default value is result/%Y/%M/%D/, which is the Result folder inside the Watched Folder. For more information about file patterns, see About file patterns.

NOTE
The smaller the size of the result folders, the better Watched Folder perform. For example, if the estimated load for the Watched Folder is 1000 files every hour, try a pattern like result/%Y%M%D%H so that a new subfolder is created every hour. If the load is smaller (for example, 1000 files per day), you could use a pattern like result/%Y%M%D.
  • failureFolderName (String): The folder where failure files are saved. This location is always relative to the Watched Folder. You can use file patterns, as described for Result Folder. Read-only files are not processed and are saved in the failure folder. The default value is failure/%Y/%M/%D/.

  • preserveFolderName (String): The location where files are stored after successful processing. The path can be an absolute, a relative, or a null directory path. You can use file patterns, as described for Result Folder. The default value is preserve/%Y/%M/%D/.

  • batchSize (Long): The number of files or folders to be picked up per scan. Use to prevent an overload on the system; scanning too many files at one time can cause a crash. The default value is 2.

    The Poll Interval and Batch Size settings determine how many files Watched Folder picks up in every scan. Watched Folder uses a Quartz thread pool to scan the input folder. The thread pool is shared with other services. If the scan interval is small, the threads scan the input folder often. If files are dropped frequently into the Watched Folder, then you should keep the scan interval small. If files are dropped infrequently, use a larger scan interval so that the other services can use the threads.

    If there is a large volume of files being dropped, make the batch size large. For example, if the service started by the Watched Folder endpoint can process 700 files per minute, and users drop files into the input folder at the same rate, then setting the Batch Size to 350 and the Poll Interval to 30 seconds help Watched Folder performance without incurring the cost of scanning the Watched Folder too often.

    When files are dropped into the Watched Folder, it lists the files in the input, which can reduce performance if scanning is happening every second. Increasing the scan interval can improve performance. If the volume of files being dropped is small, adjust the Batch Size and Poll Interval accordingly. For example, if 10 files are dropped every second, try setting the pollInterval to 1 second and the Batch Size to 10

  • throttleOn (Boolean): When this option is selected, it limits the number of Watched Folder jobs that AEM Forms processes at any given time. The maximum number of jobs is determined by the Batch Size value. The defaut value is true. (See About throttling.)

  • overwriteDuplicateFilename (Boolean): When set to True, files in the results folder and preserve folder are overwritten. When set to False, files and folders with a numeric index suffix are used for the name. The default value is False.

  • preserveOnFailure (Boolean): Preserve input files if there is failure to run the operation on a service. The default value is true.

  • inputFilePattern (String): Specifies the pattern of the input files for a Watched Folder. Creates a allowlist of the files.

  • asynch (Boolean): Identifies the invocation type as asynchronous or synchronous. The default value is true (asynchronous). The file processing is a resource consuming task, keep the value of the asynch flag to true to prevent choking the main thread of the scan job. In a clustered environment, it is critical to keep the flag true to enable load-balancing for the files being processed across the available servers. If the flag is false, the scan job attempts to perform processing for each top-level file/folder sequentially within its own thread. Do not set the flag to false without a specific reason, such as, workflow-based processing on a single-server setup.

NOTE
By design, the workflows are asynchronous. Even If you set the value to false, the workflows are launched in the asynchronous mode.
  • enabled (Boolean): Deactivates and activates scanning for a Watched Folder. Set enabled to true, to start scanning the Watched Folder. The default value is true.

  • payloadMapperFilter: When a folder is configured as watched folder, a folder structure is created within the watched folder. The structure has folders to provide inputs, receive outputs (results), save data for failures, preserve data for long-lived processes, and save data for various stages. The folder structure of a Watched Folder can serve as a payload of Forms-centric workflows. A payload mapper lets you define structure of a payload which uses a Watched Folder for input, output, and processing. For example, if you use the default mapper, it maps content of Watched Folder with [payload]\input and [payload]\output folder. Two out-of-the-box payload mapper implementations are available. If you do not have a custom implementation, use one of out-of-the-box implementation:

    • Default mapper: Use the default payload mapper to keep input and output contents of the watched folders in separate input and output folders in the payload. Also, in payload path of a workflow, use [payload]/input/ and [payload]/output paths to retrive and save content.

    • Simple File-based payload mapper: Use the Simple File-based payload mapper to keep input and output contents directly in the payload folder. It does not create any extra hierarchy, like default mapper.

Custom configuration parameters

Along with the above listed Watched Folder configuration properties, you can also specify custom configuration parameters. The custom parameters are passed to the file processing code. It enables the code to change its behavior based on the value of the parameter. To specify a parameter:

  1. Log in to CRXDE-Lite and navigate to the Watched Folder configuration node.
  2. Add a property param.<property_name> to the Watched Folder configuration node. The type of the property can only be Boolean, Date, Decimal, Double, Long and String. You can specify single and multi-value properties.
NOTE
If the data type of the property is Double, then specify a decimal point in the value of such properties. For all the properties, where data type is Double and no decimal point is specified in the value, the type is converted to Long.

These properties are passed as an immutable map of type Map<String, Object> to the processing code. The processing code can be an ECMAScript, Workflow, or a Service. The values provided for the properties are available as key-value pairs in the map. Key is the name of the property and value is the value of the property. For more information about custom configuration parameters, see the following image:

A sample watch-folder configuration node with mandatory properties, a few optional properties, a few configuration parameters

A sample watch-folder configuration node with mandatory properties, a few optional properties, a few configuration parameters.

Mutable variables for workflows

You can create mutable variables for workflow-based file processing methods. These variables serve as containers for the data flowing between the steps of a workflow. To create such variables:

  1. Log in to CRXDE-Lite and navigate to the Watched Folder configuration node.

  2. Add a property workflow.var.<variable_name> to the Watched Folder configuration node.

    The type of the property can only be Boolean, Date, Decimal, Double, Long and String. Multi-valued properties are also supported. For multi-valued properties, the value available for the workflow step is an array of specified type.

    NOTE
    If the data type of the property is Double, then specify a decimal point in the value of such properties. For all the properties, where data type is Double and no decimal point is specified in the value, the type is converted to Long.
NOTE
JCR specification mandates a default value for the properties. The default values are available to the steps of a workflow for processing. So, specify proper default values.

custom-configuration-parameters2

Various methods for processing files

You can start a workflow, service, or script to process the documents places in a watch folder.

Using a Service to process files of a Watched Folder  

A Service is a custom implementation of the com.adobe.aemfd.watchfolder.service.api.ContentProcessor interface. It is registered with OSGi along with a few custom properties. The custom properties of the implementation make it unique and help in identifying the implementation.