Developer forum

Forum » CMS - Standard features » Synonyms with spaces not working

Synonyms with spaces not working

Nuno Aguiar Dynamicweb Employee
Nuno Aguiar
Reply

Hi,

 

I was trying to setup synonyms and despite the example I can't get an example to work, which is a stem matching multiple words

 

 

Did this ever work or is it a bug?

 

Best Regards,

Nuno Aguiar


Replies

 
Nicolai Pedersen
Reply

Analyzed vs. not analyzed?

"armored personnel carrier" when analyzed becomes "armored", "personnel", "carrier" - so when using synonyms, it will not detect "armored personnel carrier" because it is not there...

 
Nuno Aguiar Dynamicweb Employee
Nuno Aguiar
Reply

Hi Nicolai,

 

I am assuming you mean on the field with the Synonym field, but it still does not work for me, and now with worse results, since a basic synonym stops working

https://www.screencast.com/t/7wrLORyzy

 

Do you think it may be a bug? Or what am I doing wrong (besides working way too late just like you :P )

 

Nuno Aguiar

 
Nuno Aguiar Dynamicweb Employee
Nuno Aguiar
Reply

Hi Nicolai,

 

Did you get my latest reply? I need to get this working somehow. I have a 2nd customer requesting this now.

 

Best Regards,

Nuno Aguiar

 
Nicolai Pedersen
Reply
This post has been marked as an answer

Hi Nuno

Let me try to explain...: The synonyms are on the terms. If you have a field with the value "one two three four", when analyzed it becomes 4 terms "one", "two", "three", "four" and you can create synonyms on each value, i.e. "one" -> "uno".

If you do NOT analyze the field, you get one term  "one two three four" and you can create a synonym on the entire term, i.e. "one two three four" -> "Four Numbers". You cannot create a synonym on "two three" -> "two numbers" in either case because "two three" is not a term in either situation.

The synonyms filter comes from this code: https://www.codeproject.com/Articles/32201/Lucene-Net-Custom-Synonym-Analyzer - and that filter is wrapped in an Analyzer that encapsulates other built-in tokenizers and filters. So first the string is tokenized etc - then the synonyms are applied.

What you are trying to do will require you to take another approach. Because you need the synonym filter to execute before any of the other tokenizers and filters are applied - you would have to do string replace on the original text input to do that. And it would still not work very well - because your synonym "two three" -> "two numbers" would, after being analyzed later in the process , become 2 terms, "two", "numbers". That might be what you want, but does not make much sense... Unless you replace a "two three" with "twonumbers"...

I hope this makes things a little more clear and give you an idea of how to achieve what you need.

BR Nicolau

Votes for this answer: 1
 
Nicolai Pedersen
Reply

If you are trying to convert "two three" to "twonumbers", you might want to turn it upside down - replace the one word term to the two word synonym and not the other way around:

So this should work  - as you saw in the videos:

<group stem="apk">
    <synonym>armored personnel carrier</synonym>
</group>

This will not work:

<group stem="12-ply">
    <synonym>dwna</synonym>
</group>

But this should also work:

<group stem="dwna">
    <synonym>12-ply</synonym>
</group>

Noting that 12-ply would be analyzed to "12" "ply" and 12 is a number and I am not 100% sure how that is handled in the analyzer...

 
Adrian Ursu Dynamicweb Employee
Adrian Ursu
Reply

Hi guys,

@Nicolai: A very good explanation of the synonym process. It clarifies a lot. Thank you very much for it.

@Nuno: I have tried using synonyms for strings containing numbers and I did not succeed.

Reading the description, the numbers might get lost during the tokenization process since that part is the first one in the process. But this is just an assumption as it can also be related to some other misconfiguration.

Adrian

 

 
Nicolai Pedersen
Reply

Yea, I think the issue is the standard analyzer. It probably have to be the KeywordAnalyzer which works differently on ids etc.

Our version of the synonyms analyzer, does have an option to configure it. Try changing the config to what you see on my dump - that should help with numbers. In the example I am swithcing to keyword analyzer

Screenshot_2020-12-18_151646.JPG
 
Nicolai Pedersen
Reply

That is of course wrong. Keyword analyzer will not split into terms, but keep the entire string.

It has to be the WhiteSpaceTokenizer. See dump

Screenshot_2020-12-18_151949.JPG
 
Adrian Ursu Dynamicweb Employee
Adrian Ursu
Reply

Hi Nicolai,

Very good suggestion. I will try it out.

Thank you,

Adrian

 
Adrian Ursu Dynamicweb Employee
Adrian Ursu
Reply

Hi Nicolai,

IT WORKS!

Thank you very much.


Adrian

 
Nuno Aguiar Dynamicweb Employee
Nuno Aguiar
Reply

Hi Nicolai,

 

Thank you for the thorough explanation. That makes more sense to me knowing the order of how things are being processed. I am getting it to work, but one slight change "12-ply" is kept as a single term, not two terms (check attachment), and that prooves why I am not getting results when searching for "12 ply".

 

All in all, it's working and thank you very much smiley

 

Nuno

Terms.gif
 
Nicolai Pedersen
Reply

Cool - then you might be able to put "12 ply" in as a synonym as well with the others.

 
Nuno Aguiar Dynamicweb Employee
Nuno Aguiar
Reply

Hi Nicolai,

 

It does not work for me https://www.screencast.com/t/18hZW6mJr6vg

  • I've set the synonyms right
    • the main term as the stem
    • all synonyms below
  • I can get all expected results searching for a synonym
  • I cannot get results searching with spaces (despite the synonym)

 

 

I then looked into Luke to see how it all looked, and I felt I lost looking for a visual reference of "dwna". I am guessing the matching is done at query time, not index time because I don't get any results searching for it https://www.screencast.com/t/JkPdBJJTmU (in the screencast above I also show that we can't see "dwna" in the vector term.

 

If I am accurate about synonyms being matched at query time, that justifies not seeing them in the index, but it makes me wonder why searching for "12 ply" would not resolve to 12-ply. Maybe because the entire search term is being analyzed, so you'd get "12" (and try to resolve synonyms for it) and then "ply".

 

That actually explains why this would never work, unless there's a way for us to do something like Google does, which is to consider values within quotes as a single string (i.e. Nicolai Pedersen returns different results that "Nicolai Pedersen"

 

Is there any way we can tell the Index to consider the entire search parameter as a single string? I was "expecting" that since the Search parameter was of type System.String instead of System.String[] would be enough.

 

Thoughts?

Nuno Aguiar

 

You must be logged in to post in the forum