Developer forum

Forum » Ecommerce - Standard features » Lucene product search - language, stemming, stopwords

Lucene product search - language, stemming, stopwords

Morten Bengtson
Reply

Sometimes the search results and suggestions presented in product search (lucene index search) can be a bit weird and I suspect that it has something to do with how languages are handled.

 

Is the language setting on each product used when indexing products or is everything treated as english?

 

Is it possible to improve the search results and suggestions by specifying the language along with the query or in some other way?

 

Are you (or is lucene by default) using stemming when searching for products and does this stemming rely on the current language?

 

And is it possible to specify stopwords for different languages?

 

In english, the words "is" and "and" is considered stopwords. In danish the word "is" means ice and "and" means duck, which are not stopwords. If I am searching for a duck, show me a duck! :) These are just a few simple examples.

 


Replies

 
Pavel Volgarev
Reply

Hi Morten,

 

  1. Yes, each language version of a product is a separate document in Lucene.
  2. You could use "Delimit filter" with "LanguageID" as a field name and the desired language Id as the value. By default, the system applies the current language Id.
  3. No stemming.
  4. No, but good idea, thank! TFS #11246.

 

-- Pavel

 

You must be logged in to post in the forum