[Beta]{class="badge informative"}

Machine learning-assisted schema creation

AVAILABILITY
  • Machine learning-assisted schema creation is currently in beta. The documentation and the functionality are subject to change.

Use ML algorithms to generate a schema from sample data. This process saves time and increases accuracy when defining the structure, fields, and data types for large complex datasets.

With ML schema generation, you can quickly integrate new data sources and reduce the mistakes from manual creation. Non-technical users can use it to generate schemas or manage large and complex datasets without any extra effort. This assistance speeds up the process from getting data to gaining insights, as makes it easier to combine new data sources and perform data analysis.

Getting started

This tutorial requires a working understanding of the requirements for schema creation. Before continuing with this guide, you should read the UI guide to creating and editing schemas.

This guide explains how to create schemas using machine learning (ML) algorithms to generate a schema from sample data. See the manual schema creation workflow guide for information on creating schemas or the document on field-based workflows in the Schema Editor to enhance your understanding of the schema creation process.

NOTE
You can also compose a schema using the Schema Registry API. To create a schema manually using the API, first read the Schema Registry developer guide before attempting the tutorial on creating a schema using the API.

From the left navigation of the Platform UI, select the Schemas workspace. The Schemas workspace appears. Select Create schema to add a new schema to start a schema creation workflow.

The Schemas workspace with Schema in the left navigation and Create schema highlighted.

Create a schema create-a-schema

The Create a schema dialog appears. Select the [ML-Assisted] schema creation option, followed by Select to confirm your choice.

The Create a schema dialog with ML- Assisted highlighted.

Select a base class select-base-class

The Create schema workflow appears. Select a base class for your schema followed by Next.

The Schema details workspace with a class and next highlighted.

Upload a CSV file upload-csv

The Select data stage of the creation workflow appears. From the Upload files section, select Choose files or the Drag and Drop files section. Select a .csv file from your computer to generate a schema.

The Select data stage of the Create Schema workflow with the Upload files section highlighted.

Preview data preview-data

The Upload file section displays the name of the CSV file that you imported and the Preview section displays rows of sample data from the file you uploaded. Select Next to continue the workflow.

Rows of sample data highlighted in the preview section, and Next highlighted.

Review and edit schema review-schema

The Review and edit stage of the creation workflow now appears, displaying the machine learning-assisted Schema recommendation in a tabularized view. At this stage, you can edit, add, or remove fields from the recommended schema generated by the machine learning model. The table contains the following fields:

Field Name
Description
Data table
The dataset or database where the field originates.
Source Field
The original field name from the source system.
Target Field
The field name in the target system where the data will be mapped.
Display Name
The name used to display the field in the user interface. This name should be more user-friendly or descriptive.
Data Type
The type of data stored in the field (for example, String, Date).
Field Group
A categorization of the field based on its use or context (for example, Demographic Details, Commerce Details).

The Review and Edit stage of the schema creation workflow.

Add a field add-field

To add a field to the schema, select Add new field.

The Review and Edit stage of the schema creation workflow with Add new field highlighted.

The Select field dialog appears. The dialog contains a diagram of the schema as it currently exists. Select the desired field and select [Select] to add a new field to the schema. Select [Cancel] to close the dialog if needed.

The Select field dialog with a field selected and Select highlighted.

A new row appears on your recommended schema. You can now edit the field.

Edit a Field edit-field

To edit a field, select the pencil icon of the row you wish to edit. A details panel appears to the right where you can edit the custom field mapping. The details panel contains the Target field, Display Name, Data Type, and Field Group. Make any necessary changes and select Apply to confirm. Select the pencil icon again to close the details panel.

The Review and Edit stage of the schema creation workflow with the pencil icon and details panel highlighted.

Remove a field remove-field

To remove a field, select the minus icon on a row you want to delete.

CAUTION
No confirmation dialog appears when removing this item.

The Review and Edit stage of the schema creation workflow with the minus icon highlighted.

To approve your recommended schema and continue the Create schema workflow, select [Next].

The Review and Edit stage of the schema creation workflow with Next highlighted.

Name and save schema name-and-save

The Name and save stage of the creation workflow appears. Enter a [Schema display name] and an optional description. The [Schema generated] section provides a diagram of the ML-generated schema. Select [Finish] to complete the schema creation workflow.

The Name and Save Schema stage of the schema creation workflow with Finish highlighted.

View in the Schema Editor view-in-editor

The Schema Editor appears with your newly created schema displayed in the canvas. Select Save to return to the Schemas workspace.

The Schema Editor displaying your named ML-generated schema.

Next Steps

After creating your schema, you can use the Schema Editor to make further modifications, if necessary. Your new schema is now ready to be integrated with your data sources and used for data analysis.

See the Edit an existing schema guide for more information on using the Schema Editor.

recommendation-more-help
62e9ffd9-1c74-4cef-8f47-0d00af32fc07