Posted on 07/02/2025 15:13:10
Apologies for taking a while to respond. Been a busy start to the year.
Just to note we have since moved out Azure instance back to a VM as we just had too many issues on Azure and all the issues disappeared.
Our solution is DW9.17.7 and we integrate using live integration with LS Retail.
The live integration connects to OMNI and has been modified the main modification being that all the discounts are being handled by LS Retail - so the integration is crucial in calculating the cart value.
We did have a WAF enabled and Azure CDN configured.
The issues we experienced could be due to setup and/or configuation as there is no real guidance as to what the min specs and exact setup requirement is. But we really did try everything for almost year before deciding that it is just lesss of a headache - We haven't had any of the issues since moving back to the VM.
The main issue we had were the following:
1. IOExceptions
Sometimes we woudl go days or even weeks with no issues and then we would get 40k+ IOException errors.
They varied between Could not delete and could not write:
Could not delete C:\local\Temp\RazorEngine_egtrnefw.4x2: System.UnauthorizedAccessException: Access to the path 'CompiledRazorTemplates.Dynamic.RazorEngine_0af8fc86b440401e8ea2689c855109b3.dll' is denied. at System.IO.Directory.DeleteHelper(String fullPath, String userPath, Boolean recursive, Boolean throwOnTopLevelDirectoryNotFound, WIN32_FIND_DATA& data) at System.IO.Directory.Delete(String fullPath, String userPath, Boolean recursive, Boolean checkHost) at RazorEngine.Compilation.CrossAppDomainCleanUp.CleanupHelper.DoCleanUp()
Exception message: Could not write parsed file: C:\home\site\wwwroot\Files\Templates\Designs\Rapido\_parsed\ContentPage.parsed.cshtml System.IO.IOException: The process cannot access the file 'C:\home\site\wwwroot\Files\Templates\Designs\Rapido\_parsed\ContentPage.parsed.cshtml' because it is being used by another process.
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
2. Indexing
Running Indexes would cause site to crash. It would take much longer to run (compared to the VM) and it would be a gamble as to whether it would complete or not. This would result in products disappearing from the site. Yes we have Primary and secondard indexes, but the problem with DW is that Primary would fail and then then run woud run Secondary with firs check if Primary is successful.
3. Thread Abort Exceptions
When there was too much going on the site crashed - by this I mean it could be increased traffic (not necessarily high), someone running a large index, Integrating data ( ie products), Sending of Marketing emails, Editing content, etc
The weird thing is that the processes monitor, didn't seem to have much load when it crashed. Memory would creep up, but neither CPU or Memory hit anywhere near server capacity.
The cart does recalculates (which means integration to OMNI web services) on each page refresh (cart load/add/rem/upd/inc qty/dec qty, etc) but not only this, it tended to run multiple times on a page (mini cart, notification subscribers, etc). We thought this may have been the issue, and we did a lot of work around this to minimise the nubmer of recalculations/integrations happening.
This helped with the load, but we still had issues.
4. Importing data from ERP (Data Import and INntegration)
This would take anywhere from 15-45min to complete - on hte VM it is less than 5min
5. App Service Restart Takes Ages
If there was a crash the site would take 5-6 minutes to reload (ie Stop and Start to App service). There were times that it just kept on crashing - so we would have to restrict to the VPN restart and then once restarted we could open it up to the public again.
6. Product Feeds
Product feeds would take long to run on the page, anywhere from 1.7s to > 8s.
7. Admin section
Content editing - sometimes the page would hang and not open, or take ages to open - especially the Visual Editor and Product Catalogues - generally a thread abort exception.
This was the case with most functions on the backend.
On the VM, this almost instantaneous and never gives issues.
I still think it was probably due to disk issues (all or any IO operations seemed be issue), but when we looked at those metrics it all seemd normal.
Would love to know what tiers and settings everyone has configured as well as any special or differing confiigurations (ie do you have integration, product counts, etc)