Developer forum

Forum » Ecommerce - Standard features » Lucene query with special characters

Lucene query with special characters

Rune Peter Olsen
Reply

It seems like some special characters is handled as a space divider in a query. So that "stål alu", "stål/alu", "stål?alu" etc. equals the same results. I want to be able to search specifically for e.g. "stål/alu", with the forward slash being handled as... you guessed it... a forward slash.

Does anyone know how to set this up in a DW 9.9.5 Rapido 3.4?

 


Replies

 
Nicolai Pedersen
Reply

Yes - you have to use the whitespace analyzer instead of the standard analyzer.

You can do that by creating a new field type and apply an other analyzer. See docs here: https://doc.dynamicweb.com/documentation-9/indexing/indexing-search/indexes#4786

So something like this:

As you can see I have given it a boost.

Then you can add an extra field to the index using this field type. Once for each field you want to handle in this way.

Also add an additional expression to the query using an "or" group. If you add this field, instead of replacing the one using the standard analyzer, both scenarios will work. The boost will then take care of moving the right result to the top.

 
Rune Peter Olsen
Reply

Thank you for your quick reply Nicolai. I will look into your suggestion right away :-)

 
Rune Peter Olsen
Reply

@nicolai-pedersen - it works.... kind of....  smiley

The WhiteSpaceAnalyzer is case sensitive, which means it's useless in a freetext search scenario (IMHO). Does this mean I need to do a custom analyzer, or do you know any trick for this? In regular Lucene I would set a LowerCaseFilter in conjunction with the WhiteSpaceTokenizer, but I cannot find anything about this in DW.

If so, do you have any documentation of how to insert a custom analyzer into DW? I cannot find anything about it on the doc site.

 
Nicolai Pedersen
Reply
This post has been marked as an answer

Yes, good point.

Just added a case insensitive whitespace analyzer to our analyzers. Attached find a pirate build of Dynamicweb.Indexing.Lucene that you can add to your project and see if that works for you.

It will be release officially on nuget very soon.

You can always just add any provider based features in a dll and put it into the bin and it will show in the backend. I.e. you can take this analyzer code and put in your own custom dll and then it will show up in the backend.

using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Miscellaneous;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
using LuceneNet = Lucene.Net;

namespace Dynamicweb.Indexing.Lucene.Analyzers
{
    public class DynamicwebCaseInsensitiveWhitespaceAnalyzer : Analyzer
    {
        /// <summary>
        /// </summary>
        public override TokenStream TokenStream(string fieldName, TextReader reader)
        {
            TokenStream t = null;
            t = new WhitespaceTokenizer(reader);
            t = new LowerCaseFilter(t);

            return t;
        }
    }
}

BR Nicolai

Votes for this answer: 2
 
Rune Peter Olsen
Reply

Everything worked fine with the custom analyzer. Easy to setup :-)

 

You must be logged in to post in the forum