In many cases we need to do some manipulation of the source data before committing it to the product database, so we use a TableScript to apply a bit of C# magic to perfect the data. In other schenarios we might use a NotificationSubscriber to fix up the data to perfection after it has been committed to the live database.
This has been a natural way of doing things, because these are the tools the platform provides for us for transforming the data we might get from e.g., Business Central to match a certain use case. But after considering the risks involved with executing my otherwise perfect piece of code at just the time in the synchronization process, this would keep me awake at night. Not only is it a fragile place in the process to start injecting code, but if something should happen (Murphy says it will), debugging is a pain and I would almost need to catch in in the act.
So, I started looking into some good old database tricks as an alternative. One thing that has really been missing in the standard integrations, where we would normally receive perfect data from the ERP, is a staging area for the data, where we could also validate the data before moving it into the production tables. It is not an out-of-the-box feature, but we can easily work the concept into the Data Integration app, so I wanted to share a few tricks for doing the transformation in a less risky place.
Setting up Staging Tables
We start by creating staging tables in the database, which can be done using the SQL Firehose. If you want to use CREATE TABLE, go ahead, but using this small script will create a table matching another table, e.g., EcomProduct, allowing you to import to a similar table before importing to the production table and catching the errors here instead:
SELECT TOP 0 * INTO Stage_EcomProducts from EcomProducts
Now, we can import to Stage_EcomProducts and do troubleshooting on the data before we update the shop table. In my experience, troubleshooting data using SQL is much easier and more powerful than troubleshooting an XML file or multiple batches of XML files.
I also use something similar to back up a single table, if I need to some dangerous work on the data (which off cause I never do on a product environment):
SELECT * INTO EcomProducts_bak from EcomProducts
The difference when backing up compared to making the staging table is to select everything, so I get both the schema (keys not included) and the data, but for making the staging table I select no records and get only the schema.
Views
If you’re using other sources than the ERP Plug-In Unit, e.g. importing from OData, the data structure will probably not be compatible with Dynamicweb. An example, if you are importing contact persons and addresses from ERP, In Dynamicweb addresses are associated to users/contacts, but in BC contacts and addresses are associated to companies, so we need to do some transformation.
Assuming we have this:
If we join these two tables by customer number, we have unique combinations of contacts and addresses, which we can map in to AccessUserAddress.
select Contacts.ContactNo + '_' + Addresses.AddressNumber as AddressExternalId, Contacts.*, Addresses.* from Contacts inner join Addresses on AddressCustomerNumber = ContactCustomerNumber
If we want to work with this in Data Integration, we can create a view with this join:
create view ContactAddresses as select Contacts.ContactNo + '_' + Addresses.AddressNumber as AddressExternalId, Contacts.*, Addresses.* from Contacts inner join Addresses on AddressCustomerNumber = ContactCustomerNumber
And then we have a view which we can use as data source in Data Integration, and we can even debug the data by querying that view.
If we ever want to modify the view, we first select the definition to not have to rewrite it completely:
select OBJECT_DEFINITION(OBJECT_ID('ContactAddresses')) as ViewDef
Then, we drop the view:
drop view ContactAddresses
And then we can create a new view with the same name but a new definition.
For all the queries in this post, you can create them as views on the database and then use that view in a Dynamicweb Source Provider in a Data Integration activity. If you don't use Dynamicweb as the staging area, you can use an SQL source provider, which might as well connect to an Azure hosted database shared between Dynamicweb and the source.
Joins
If you know everything about jois, just skip to the next part. Otherwise, let's have a look at some basic examples
Left Outer Join takes all records from the table specified on the left-hand side of the OUTER JOIN statement and joins them with whatever rows it can find in the table specified on the right-hand side. So here we find all contacts and see that Jill Doe’s company doesn’t have any addresses associated.
select * from Contacts left outer join addresses on Addresses.AddressCustomerNumber = contacts.ContactCustomerNumber
Right outer join does the opposite and takes all rows from the table specified on the right-hand side of OUTER JOIN and joins them with whatever available rows exist in the table specified on the left-hand side. We see that we don’t have any contacts associated with CUST1400 associated address.
select * from Contacts right outer join addresses on Addresses.AddressCustomerNumber = Contacts.ContactCustomerNumber
Full outer join takes everything from both tables and joins whatever rows it can. We see that both rows missing from left and right outer are shown here:
select * from Contacts full outer join addresses on Addresses.AddressCustomerNumber = contacts.ContactCustomerNumber
See the pattern with the yellow marked NULL values?
Destination Conditionals
On thing that is really cool about the staging approach is that it allows us to do imports based on conditions in the destination table – one thing we’ve often wanted to do, but the Data Integration app doesn’t support that. But let’s say we have a huge number of products in ERP, but we only have a limited range in eCommerce. We don’t want inventory for all ERP products in EcomStockUnit but only for the ones that we actually sell on the webshop, but the ERP is not aware of which product are in the webshop. Loading the total inventory into a staging table allows us to create a view that joins the staing table with EcomStockUnits to only contain inventory of the items that are in the shop. Let’s assume we have this inventory table:
But our product table only has BK100 and BK200:
By checking the existence of the products in EcomProducts, we only get the inventory numbers of the products in our database:
select * from Inventory where exists(select * from EcomProducts where ProductId = ProductNo)
In another scenario we might want to import delta only to boost performance of a synchronization, but the source system might not support that. Our staging approach also has a solution for that. Assume that in our source data the product name of one of the products has been updated:
And in EcomProducts we haven’t updated the name yet:
When joining the two tables the records that match the join condition:
select * from Products inner join EcomProducts on ProductId = products.ProductNo
But when using a checksum on the attributes that we expect to contain changes from the source, we have an easy way of detecting changes:
select BINARY_CHECKSUM(products.ProductNo, Products.ProductName) ProductChk, BINARY_CHECKSUM(EcomProducts.ProductNumber, EcomProducts.ProductName) EcomProductsCkh, * from Products
inner join EcomProducts on ProductId = products.ProductNo where BINARY_CHECKSUM(products.ProductNo, Products.ProductName) <> BINARY_CHECKSUM(EcomProducts.ProductNumber, EcomProducts.ProductName)
(We don’t need to select the checksum values, they are just shown here for demonstration here)
The checksums are calculated based on input, and they are not to be considered as unique. But when comparing a few fields in two almost identical records, I’d say they are unique enough.
Although we would still get all data from ERP into our staging table, this approach allows us to reduce the effect on the production table at the critical time of import. This can greatly reduce the time for the total synchronization to finalize, especially when our Data Integration activity has been configured to only update changes in the Lucene index, which may otherwise take many minutes after the import. And in that period, the active index instance may contain incorrect data.
Transformation
Sometimes we want to convert rows into columns and columns into rows, which there is also a SQL approach to. In a theoretical scenario we might want to import product specification from look-up table into custom fields on the EcomProduct table. Here we have two products:
Of which each are associated with a set of specifications:
Using Pivot we can take the rows from the specification table and add them as columns in our query or view:
select ProductNo, FrameSize, Gears
from(select Specifications.ProductNo, SpecId, SpecValue from Specifications) d
pivot(max(SpecValue) for SpecId in (FrameSize, Gears)
) p
We can do the opposite and transform columns into rows e.g. for populating EcomPrices from the Products table from before:
SELECT newid() as PriceId, PriceProductId, PriceProductVariantId, PriceCurrency, PriceAMount
FROM
(SELECT ProductNo as PriceProductId, '' as PriceProductVariantId, PriceDKK as 'DKK', PriceEUR as 'EUR', PriceUSD as 'USD'
FROM Products) d
UNPIVOT
(PriceAMount FOR PriceCurrency IN
(DKK, EUR, USD)
)AS u
I’m using a guid for PriceId because EcomPrices requires a unique PriceId, and I don’t have one in my source table. For the Data Integration activity, I would then use PriceProductId, PriceProductVariantId, and PriceCurrency as keys as they identify the unique rows in my source.
This is all good, if we’re dealing with regular data, but some places in the Dynamicweb database we’re not just dealing with regular data. E.g. OrderLineFieldValues in EcomOrderLines contains a chunk of XML in an nvarchar field. So, what if we want to send the orders to ERP, but the ERP doesn’t understand our XML? Luckily there is a solution for that too. So, let’s assume we have this chunk in OrderLineFieldValues:
<OrderLineFieldValueCollection>
<OrderLineFieldValue>
<OrderLineFieldSystemName>OrderLineDeliveryDate</OrderLineFieldSystemName>
<Value>9.5.2023</Value>
</OrderLineFieldValue>
</OrderLineFieldValueCollection>
Using an XPath expression, we can extract the value into its own column:
SELECT OrderLineId, OrderLineProductNumber, cast(OrderLineFieldValues as xml).query('/OrderLineFieldValueCollection/OrderLineFieldValue/Value[../OrderLineFieldSystemName/text() = "OrderLineDeliveryDate"]/text()') from EcomOrderLines
A few other benefits to move more of this kind of logic over to the database:
- The database is usually generously scaled and performs very well
- Adding a table script requires me to boot up visual studio, program a C# class, build, deploy test. Hence, the process from idea to feedback is very long compared to adding view through the firehose
- I can add validation in my views, so I don’t import corrupt data to production tables
- I can add a view on a standard Dynamicweb solution and won’t need to customize it