Hi guys,
I'm trying to setup a freetext search on page items and their paragraph item content - An it's indexing just fine... but whenever the search query includes special characters it'll return nothing.
I have inspected the index with Luke, and found that all html editor fields have the special characters stored as html entities... ø = ø etc... Seems like it's CFK-editor which does this...
Is there any way we can have it match those entites?
Do we have to implement something like this?: https://lucene.apache.org/solr/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.HTMLStripCharFilterFactory
Nicolai linked to a Romanian analyzer in another post and in that same place I found this: https://github.com/apache/lucenenet/tree/master/src/Lucene.Net.Analysis.Common/Analysis/CharFilter
P.S. Just tested on Rapido 3.0... same problem there...
So I would say we need to have HTMLStripCharFilterFactory.cs added to standard DW... otherwise freetext search for ie. danish content made via rich text editors is sort of useless...