Text parser
You can use the Text parser tool to parse text for use in other Adobe Workfront Fusion scenario modules. The Text parser does not require a connection.
Access requirements
You must have the following access to use the functionality in this article:
New: Standard
Or
Current: Work or higher
New:
- Select or Prime Workfront package: Your organization must purchase Adobe Workfront Fusion.
- Ultimate Workfront package: Workfront Fusion is included.
Or
Current: Your organization must purchase Adobe Workfront Fusion.
For more detail about the information in this table, see Access requirements in documentation.
For information on Adobe Workfront Fusion licenses, see Adobe Workfront Fusion licenses.
Text parser API information
The Text parser connector uses the following:
Text parser modules and their fields
When you configure Text parser modules, Adobe Workfront Fusion displays the fields listed below. A bolded title in a module indicates a required field.
If you see the map button above a field or function, you can use it to set variables and functions for that field. For more information, see Map information from one module to another.
Transformers
Get Elements from HTML
Retrieves the desired elements from HTML code.
Select the type of element you want to retrieve from the HTML code.
- Image
- Link
- iFrame element(s)
Get Elements from text
Parses elements from text based on the given pattern.
HTML to Text
Match Pattern
The Match pattern module enables you to find and extract string elements matching a search pattern from a given text. This module uses regular expressions (also known as regex or regexp).
A regular expression is a sequence of characters in which each character is either a metacharacter, having a special meaning, or a regular character that has a literal meaning. These character and metacharacters identify a pattern that can be used to search text. For example, if you wanted to search for names, you could set up a regular expression to search for a pattern that consists of two consecutive words that begin with capital letters. Regular expressions are a powerful tool for searching and manipulating text.
A discussion of regular expressions is beyond the scope of this article. We recommend the following resources:
- For the complete list of metacharacters, see Regular expressions in MDN web docs.
- For a tutorial on how to create regular expressions, we recommend RegexOne.
- For experimenting with regular expressions, we recommend the Regular Expressions 101 website. Select the ECMAScript (JavaScript) FLAVOR in the left panel.
Enter the regular expression pattern.
Example: [+-]?(\d+(\.\d+)?|\.\d+)([eE][+-]?\d+)?
extracts all numerals in the provided text.
Note:
The pattern should contain at least one capture group in parenthesis ()
. If the pattern does not contain any capture groups, the output bundle is empty.
^
and $
) matches the beginning or end of each line, not just the very beginning or end of the whole input string.\n
).Replace
Searches the entered text for a specified value or regular expression and replaces the result with the new value.
^
and $
) matches the beginning or end of each line, not just the very beginning or end of the whole input string.\n
).Data Scraping
Data scraping, sometimes called web scraping, data extraction, or web harvesting, is the process of collecting data from websites and storing it in your local database or spreadsheets. If you want to scrape data from a website and you are not familiar with regular expressions, you may use a data scraping tool.
If the data scraping tool provides a REST API, you can connect to it via our universal HTTP modules and Webhooks modules.
Text parser troubleshooting
Use this information if you can not get a text parser to produce any output.
Example:
The module should parse the filetype of a file document “filename.docx”, and the extension of the filename varies from DOCX to PDF to CSV.
The expression that you may choose to use in this case is ..+
This regular expression would normally result in a full match.
However, implementing this expression in your text parser does not result in a match:
The reason for this is that the “i” shows only the number of matches per match so in this case, we have 2 matches, threfore after the “i” there is a numerical value 1 and 2. The use case for this is that should you ever need to match or pass data through a filter only the second matched value you can specify which value that is represented by the numerical value.
To be able to get the match values that you require to add brackets to the part that you want to parse (for example, to extract from “filename.docx” - “docx” only), then, according to the regex expression we are using for this case scenario, the brackets should be applied on \.(.+)
This captures the DOCX, places it in a group, and leave the “.” out of it.
In the output shown in the picture below, the capturing group will match any character (except for line terminators).
Another workaround that also incorporates regex is using the replace function
{{replace("abcdefghijklmno pqr stuvw xyz.docx"; "/.\./"; ".")}}
Then replace abcdefghijklmno pqr stuvw xyz.docx
with your actual filename variable.