Replication methods and rechecks are used to identify new or updated data in your database tables. Setting them correctly is crucial to ensuring both data accuracy and optimized update times. In this article, we will be focusing just on replication methods.
When new tables are synced in the Data Warehouse Manager, a replication method will automatically be chosen for the table. Understanding the various replication methods, how tables are organized, and how the table data behaves will allow you to choose the best replication method for your tables.
Replication methods fall into three groups -
Full Table, and
Incremental Replication means that MBI will replicate only new or updated data on every replication attempt. As these methods will greatly reduce latency, we highly recommend using it where possible.
Full Table Replication means that MBI will replicate the entire contents of a table on every replication attempt. Because of the potentially large amount of data to be replicated, these methods can increase latency and update times. If a table contains any timestamped or datetime columns, we recommend using an Incremental method instead.
Paused indicates that replication for the table is stopped, or paused. MBI will not check for new or updated data during an update cycle; this means no data will be replicated from a table that has this as its Replication Method.
Modified At replication method uses a datetime column - which is populated when a row is created and then updated when data changes - to find data to replicate. This method is designed to work with tables that meet the following criteria:
datetimecolumn that is initially populated when a row is created and is updated whenever the row is modified;
datetimecolumn is never null;
In addition to those criteria, we also highly recommend indexing the
datetime column used for
Modified At replication, as this will help optimize replication speed.
When the update runs, new or changed data is identified by searching for rows that have a value in the
datetime column that occurred after the most recent update. When new rows are discovered, they are replicated to your Data Warehouse. If any rows already exist in the Data Warehouse, they will be overwritten with the current database values.
For example, a table may have a column called
modified\_at that indicates the last time data was changed. If the most recent update ran Tuesday at noon, the update will search for all rows having a
modified\_at value greater than Tuesday at noon. Any discovered rows that were either created or modified since Tuesday at noon will be replicated to the Data Warehouse.
Did you know?
Even if your database cannot currently support an
Incremental Replication method, you may be able to make some changes to your database that would enable use of
Modified At or
Single Auto Incrementing PK.
Modified At is not only the most ideal replication method, it is also the fastest. This method not only produces noticeable speed increases with large data sets, it also does not require configuring a recheck option. Other methods will need to iterate through an entire table to identify changes, even if a small subset of data has changed.
Modified At iterates through only that small subset.
Auto Incrementing is a behavior that sequentially assigns primary keys to rows. If a table is
Auto Incrementing and the highest primary key in the table is currently 1000, then the next primary value will be 1001 or higher. A table that does not use
Auto Incrementing behavior may assign a primary key value that is less than 1000 or jump to a much bigger number, but this is not commonly used.
This method is designed to replicate new data from tables that meet the following criteria:
single-column primary key; and
primary keydatatype is
auto incrementingprimary key values.
When a table is using
Single Auto Incrementing Primary Key replication, new data is discovered by searching for primary key values that are higher than the current highest value in your Data Warehouse. For example, if the highest primary key value in your Data Warehouse is 500, when the next update runs it will search for rows with primary key values of 501 or higher.
Add Date method functions similarly to the
Single Auto Incrementing Primary Key method. Instead of using an integer for the table’s primary key, this method will use a
timestamped column to check for new rows.
When a table uses
Add Date replication, new data is discovered by searching for timestamped values that are greater than the latest date synced to your Data Warehouse. For example, if an update last ran on 20/12/2015 09:00:00, any rows with a timestamp greater than this will be marked as new data and replicated.
Modified At method,
Add Date will not check existing rows for updated information - it will only look forward to new rows.
Full table replication refreshes the entire table any time new rows are detected. This is by far the least efficient replication method, due to the fact that all data must be reprocessed during every update, assuming there are new rows.
New rows are detected by querying your database at the start of the synchronization process and counting the number of rows. If your local database contains more rows than MBI, then the table is refreshed. If the row counts are identical, or if MBI contains more rows than your local database, then the table is skipped.
This raises the important point that
Full Table replication is incompatible when:
In either of the above scenarios,
Full Table replication will not detect any changes and your data will become stale. Due to the inefficiency of this replication method, and the requirements mentioned above,
Full Table replication is only recommended as a last resort.
When a table uses
Primary Key Batch (PK Batch), new data is discovered by counting rows inside ranges, or batches, of primary key values. While we typically think of this being used with integers, even text values can be ordered in a way that allows the system to define constant ranges.
For example, Let us say an update runs and performs a row count for the range of keys from 1 to 100. In this update, the system finds and logs 37 rows. In the next update, a row count is performed again on the 1-100 range and finds 41 rows. Because there is a difference in the number of rows compared to the last update, the system will inspect that range (or batch) in more detail.
This method is intended to replicate data from tables that meet the following criteria:
This method is not ideal, as it is incredibly slow due to the amount of processing that must occur to examine batches and find changes. We do not recommend using this method unless it is not possible to make the modifications necessary to support the other replication methods. Expect update times to increase if this method must be used.
Replication methods are set on a table-by-table basis. To set a replication method for a table, you need
Admin permissions so you can access the Data Warehouse Manager.
Once in the Data Warehouse Manager, select the table from the
Synced Tables list to display the table’s schema.
The current replication method is listed below the table name. To change it, click the link.
In the pop-up that displays, click the radio button next to either
Full Table replication to select a replication type.
Next, click the Replication Method dropdown to select a method - for example,
Some Incremental methods require you to set a
Replication Key. MBI will use this key to determine where the next update cycle should begin.
For example, if we want to use the
modified at method for our
orders table, we need to set a
date column as the replication key. Several options for replication keys may exist, but we will select
created at, or the time the order was created. If the last update cycle stopped at 12/1/2015 00:10:00, the next cycle would begin replicating data with a
created at date greater than this.
When finished, Click Save.
Take a look at the whole process:
To finish up, we have put together this table that compares the various replication methods. We find it incredibly handy when selecting a method for the tables in our data warehouse.