Developer forum

Forum » Ecommerce - Standard features » repository remove html-tags from "Long description" before indexing

repository remove html-tags from "Long description" before indexing

Thomas Jensen
Reply

Hi

Is there a way to remove the html tags in repository for indexing, like "striphtml" ?

if a products long description contains an image that has been added as content like 
(src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAgAAZABkAAD/7AAR ......)

Then search for part of a product number, or a short named product types (like RAM), can result in a hit

 

The best solution is not to inset images as content.. 

 

Regards Thomas


Replies

 
Nicolai Pedersen
Reply

Hi Thomas

Currently it is not possible to strip html out of the box. Analyze does not remove html I think. You can create an Lucene analyzer that strips html or create a ProductIndexBuilderExtender that strips html. Or create a custom (hidden) field where you store the strippen text and use a ProductSave notification to update the field...

Generally, yes, unstructed data i.e. in the editor should be avoided...

 

You must be logged in to post in the forum