Delete missing rows from source for specific source

E-mail notifications

Adrian Ursu

Posted on 15/07/2024 08:21:15

Hi guys,

I have a project where we get products from multiple external sources.

The problem we have to solve is removing the missing products depending on the source of the data.

I am unsure if we can configure the DataIntegration job to delete the missing rows for a specific data source based on a filter on the table mapping.

Any suggestions?

Thank you,
Adrian

Replies

Rasmus Sanggaard

Posted on 15/07/2024 09:03:04

Hi Adrian,

You could create a data integration activity for each source with a source setting of deleting missing rows.

BR Rasmus Sanggaard

Adrian Ursu

Posted on 15/07/2024 09:14:56

Hi Rasmus,

I already have separate data integration jobs for each source, apart from setting the "delete missing rows".

How do I prevent each job to delete the products from the other sources?

Thank you,
Adrian

Imar Spaanjaars

Posted on 15/07/2024 09:43:36

I don't think that's possible, as it seems that Delete missing rows deletes all rows that are not in the source. So, running an import against Source A will delete all missing A rows, but also ALL B rows.

We normally solve this ourselves with some subscribers that run at the end of the job. A bit messy, quite invisible and open for index and cache issues, but it usually somewhat works.

What would be *really* nice if we could specify one or more filters on the Delete operation, like we do on the source. Then you could check "Delete missing rows" and then configure one or more filters like "ProductSource = 'A'" for job A which would then delete all missing rows but only when then their source equals A. Or we could add ProductUpdate < DateTime.Now.AddDays(-21) to only remove products we haven't seen since the last 3 weeks. I can't count the solutions where this would have saved tons of custom development work!

Imar

Rasmus Sanggaard

Posted on 15/07/2024 10:01:56

Hi,

Ah, I understand now. We will look for a solution to the problem. I have noted it down and added it to the backlog for future improvements.

BR Rasmus Sanggaard

Adrian Ursu

Posted on 15/07/2024 10:26:50

Hi guys,

Maybe using the filter you define on the Table mapping?

Of course, this would be complicated if you have multiple table mappings.

In the meantime, I will probably use Imar's approach.

Thank you,
Adrian

Imar Spaanjaars

Posted on 15/07/2024 10:35:11

@Adrian: reusing the table mappings filter might not be enough; I think it'll have to be a different filter as you may filter your incoming rowws differently than you do your rows to be deleted.

Adrian Ursu

Posted on 15/07/2024 11:08:36

@Imar: I agree. It should be something specific in the configuration of the job. But it should take into consideration what happens if you have multiple table mappings in the same job. For example EcomProducts and EcomPrices.

Adrian

Matthias Sebastian Sort

Posted on 28/08/2024 11:12:35

Hello guys,

The way our logic works today is that we move data from the source into temp tables. Once all source tables have been handled, we then merge the data from all temp tables into the main table.

If you have Remove missing rows set to true, we then loop through the temp-tables and execute the Remove missing rows operation one by one. So, if you have multiple destination tables, for example, EcomProducts, the logic will first remove missing rows from temp table one, then remove missing rows from temp table two, leaving only the data that both temp tables have in common.

With that said, would your issue be solved if it totals all the rows from the mappings that have the same destination (e.g., temp table one + temp table two) and then removes all missing rows from the total of these?

Matthias Sort

Imar Spaanjaars

Posted on 28/08/2024 11:58:47

If I understand you correctly, I don't think it does. The use case I have in mind is where we have multiple jobs importing into the same entity with an additional marker to identify the source. This happens often when we have multiple ERP systems (for different ERP companies, departments and so on). What we then do is this:

1. Create a job that maps XML from the ERP to EcomProducts. Set a hardcoded value to identity the ERP Source into a custom column (ErpSource)
2. Duplicate the job and change the hardcoded value to something else
3. Set up two scheduled tasks to fetch data from each ERP and run the associated job.

The final result of the data could look like this:

Here, I imported two rows for the UK and one for DE. Today, if I would use Remove missing rows on the UK job it would remove the German one as it's not in the source. Vice versa, it would remove all UK rows when running the DE job.

If we could add a filter (like we add conditionals today), I could specify:

Remove missing rows WHERE ErpSource = 'UK'
Remove missing rows WHERE ErpSource = 'DE'

on the respective jobs. Then when I reimport UK and only get back PROD1, it would remove only PROD2 because of the ErpSource = 'UK' filter. And likewise for the DE job.

I am using EcomProducts as an example here, but we have the same with other entities. For example, importing Companies as users into AccessUser. If you enable Remove missing rows, it would delete all manually created user accounts also. If I could add a filter like "WHERE ErpSource = 'UK' And IsImported = 1 then I could remove rows that are missing in the context of that one source only (i.e. only remove the rows that were created by that source initially).

Does this make any sense?

Imar

Rasmus Sanggaard

Posted on 28/08/2024 13:29:43

Hi,

I think in that scenario the best pracise way to do multiple imports from different ERP sources in DW9, would be to import the products into different shops and then select the respective shop:

The shop selector on the destination setting only deletes missing rows from the shop selected.

BR Rasmus Sanggaard

Imar Spaanjaars

Posted on 28/08/2024 14:00:55

That could work when you can use multiple shops, but that's not always the case. For example for products, I can have three different (language-based) ERPs that should all populate the same shop (and then present the different languages at the frontend in a single shop). For the other entities, like users, there is no shop at all.

Lars Hejgaard Sørensen

Posted on 28/08/2024 15:27:37

Hi,

@Arian @Imar just my personal opinion (and recomendation): I would never rely on that checkbox for interacting with live data. Let's say one morning something has gone wrong, and the data source is empy? Or Dynamicweb was updated to another ring, and they forgot to test that setting? You risk accidentally deleting all products, which the customer may have put a lot of effort into enriching in PIM, or may even have paid someone to do. It's IT, so you should expect whatever that can go wrong to actually go wrong. I've previously used "Deactivate missing rows", exactly to avoid loosing data, but that only goes for products when using the Dynamicweb provider. But in multiple scenarios, I've been glad I did it.

For many years we've been wishing for a data proofing area in Dynamicweb, where we could set up rules for verifying the quality of data before actually committing data changes to our production tables, but this has never been a focus in the product. So, another approach, which can also be applied here, is to set up your own proofing area using staging tables where you import data to from the actual source, and then create SQL views in which you can apply different rules, and using the views to import to your actual destination. In you scenario you can create a view to feed a dedicated job with "Deletre incoming rows", so that rather than relying on the provider logic, you can apply your own rules.

I've made an simple example here based on a staging setup for EcomPrices that I happened to have at hand. The situation is, that I have 12 records in EcomPrices, that are not in my staging table, so these should be removed from EcomPrices:

The deleteprice column indicates whether it's safe to remove the record. In this scenario I'm just checking that the staging table is not empty, but this could of cause be more sophisticated. So, In the data intgration activity I would make a conditional for "deleteprice equals true", so if my rule indicates that it is safe to delete, we go ahead and delete.

If I change my rule, just for the example here, so no records are returned, then the deleteprice column contains false, and with the conditional in the mapping, nothing gets deleted:

So, rather than relying on this and that checkbox in the UI, you can take control of the process and make an educated assesment of what it is actually safe to remove or - preferably - deactivate. As @Imar suggests, tagging records with the source allows you to delete within a limited range of records and not the entire table, and then the trick is just to build that into the rule.

Br.

Lars

You must be logged in to post in the forum

Developer forum

Delete missing rows from source for specific source

Replies