Smart Translation Search allows the use of non-English search terms to resolve to English content. To set up AEM for Smart Translation Search, the Apache Oak Search Machine Translation OSGi bundle must be installed and configured, as well as the pertinent free and open source Apache Joshua language packs that contain the translation rules.
Smart Translation Search must be set up on each AEM instance that requires it.
Download and install the Oak Search Machine Translation OSGi bundle
/system/console/bundles
.Download and update the Apache Joshua language packs
Download and unzip the desired Apache Joshua language packs.
Edit the joshua.config
file and comment out the 2 lines that begin with:
feature-function = LanguageModel ...
Determine and record the size of the language pack’s model folder, as this influence how much extra heap space AEM will require.
Move the unzipped Apache Joshua language pack folder (with the joshua.config
edits) to
.../crx-quickstart/opt/<source_language-target_language>
For example:
.../crx-quickstart/opt/es-en
Restart AEM with updated heap memory allocation
Stop AEM
Determine the new required heap size for AEM
AEM’s pre-language-lack heap size + the size of the model directory rounded up to the nearest 2GB
For example: If pre-language packs the AEM installation requires 8GB of heap to run, and the language pack’s model folder is 3.8GB uncompressed, the new heap size is:
The original 8GB
+ ( 3.75GB
rounded up to the nearest 2GB
, which is 4GB
) for a total of 12GB
Verify the machine has this amount of extra available memory.
Update AEM’s start-up scripts to adjust for the new heap size
java -Xmx12g -jar cq-author-p4502.jar
Restart AEM with the increased heap size.
The required heap space for language packs can grow large, especially when multiple language packs are used.
Always make sure the instance have enough memory to accommodate the increases in allocated heap space.
The base heap must always be calculated to support acceptable performance without any language packs installed.
Register the language packs via Apache Jackrabbit Oak Machine Translation Full-text Query Terms Provider OSGi configurations
For each language pack, create a new Apache Jackrabbit Oak Machine Translation Full-text Query Terms Provider OSGi configuration via the AEM Web Console’s Configuration manager.
Joshua Config Path
is the absolute path to the joshua.config file. The AEM process must be able to read all files in the language pack’s folder.
Node types
are the candidate node types whose full-text search will engage this language pack for translation.
Minimum score
is the minimum confidence score for a translated term for it to be used.
0.9
and also translate to the english word “human” with a confidence score 0.2
. Tuning the minimum score to 0.3
, would keep the “hombre” to “man” translation, but discard the ‘hombre’ to “human” translation as this translation score of 0.2
is less than the minimum score of 0.3
.Perform a full-text search against assets
Updating language packs
Apache Joshua language packs are wholey maintained by the Apache Joshua project, and their updating or correction is as the discretion of the Apache Joshua project.
If a language pack is updated, in order install the updates in AEM, the above steps 2 - 4 must be followed, adjusting the heap size up or down as needed.
If AEM does not require a restart, then the relevant Apache Jackrabbit Oak Machien Translation Fulltext Query Terms Provider OSGi configuration(s) that pertain to the updated language pack(s) must be re-saved so AEM processes the updated files.
In order for AEM Smart Tags to be affected by AEM Smart Translation, AEM’s /oak :index /damAssetLucene
index must be updated to mark the predictedTags (the system name for “Smart Tags”) to be part of the Asset’s aggregate Lucene index.
Under /oak:index/damAssetLucene/indexRules/dam:Asset/properties/predicatedTags
, ensure the configuration is as follows:
<damAssetLucene jcr:primaryType="oak:QueryIndexDefinition">
<indexRules jcr:primaryType="nt:unstructured">
<dam:Asset jcr:primaryType="nt:unstructured">
<properties jcr:primaryType="nt:unstructured">
...
<predictedTags
jcr:primaryType="nt:unstructured"
isRegexp="{Boolean}true"
name="jcr:content/metadata/predictedTags/*/name"
useInSpellheck="{Boolean}true"
useInSuggest="{Boolean}true"
analyzed="{Boolean}true"
nodeScopeIndex="{Boolean}true"/>