Developer forum

Forum » CMS - Standard features » Sort by substring relevance

Sort by substring relevance

Nuno Aguiar Dynamicweb Employee
Nuno Aguiar
Reply

Hi,

 

Is there a way, using the Query Publisher, that we can sort by (sub)string relevance? And if not, could this be implemented?

 

Consider the following example https://www.screencast.com/t/3LIIeIIcc

 

What the customer is requesting is that we could sort equally scored results (2nd, 3rd and 4th) based on relevance/proximity of the page name. In their mind, the 3rd result should be 2nd.

I googled a bit and came across a way to do it in Python https://stackoverflow.com/questions/47682491/python-how-to-sort-a-list-of-strings-by-substring-relevance but I failed to find a C# version and/or think how it could get this to work in Dynamicweb.

 

Has anyone came across a similar need and/or has a solution for this?

 

Best Regards,

Nuno Aguiar


Replies

 
Nicolai Pedersen
Reply

You cannot get Lucene to do it unless you write something to change scoring.

But given the list of result that you have, you can calculate something based on the names of the articles etc. and add a relevance to this page of search result and resort the results based on that.

This is some code from our SEO module - using i.e. density (how many percent a given term is in a phrase) or prominence (how early in a given term is in a phrase) can help you.

using System;
using System.Text.RegularExpressions;
using Dynamicweb.Core;

namespace Dynamicweb.Analytics.Seo
{
    /// <summary>
    /// Calculates various SEO related numbers
    /// </summary>
    public class Calc
    {

        /// <summary>
        /// Gets the frequency of word in the given text.
        /// </summary>
        /// <param name="text">The text.</param>
        /// <param name="word">The word.</param>
        public static int GetFrequency(string text, string word)
        {
            if (string.IsNullOrEmpty(word))
            {
                return 0;
            }

            word = word.Replace('-', ' ');
            MatchCollection matchCollection = Regex.Matches(Converter.ToString(text), "\\b" + word + "\\b", RegexOptions.IgnoreCase);

            return matchCollection.Count;
        }

        /// <summary>
        /// Gets the word count of the specified text.
        /// </summary>
        /// <param name="text">The text.</param>
        public static int GetWordCount(string text)
        {
            MatchCollection matchCollection = Regex.Matches(Converter.ToString(text), "\\b\\w+?\\b", RegexOptions.IgnoreCase);
            return matchCollection.Count;
        }

        /// <summary>
        /// Gets the phrase count of the specified text.
        /// </summary>
        /// <param name="text">The text.</param>
        public static int GetPhraseCount(string text)
        {
            if (!string.IsNullOrEmpty(text))
            {
                return text.Split(',').Length;
            }
            else
            {
                return 0;
            }
        }

        /// <summary>
        /// Gets the density of words in the specified text.
        /// </summary>
        /// <param name="text">The text.</param>
        /// <param name="word">The word.</param>
        /// <returns>Equeals WordCount/Frequency</returns>
        public static double GetDensity(string text, string word)
        {
            int frequency = GetFrequency(text, word);
            int TotalWordCount = GetWordCount(text);

            if (frequency > 0 && TotalWordCount > 0)
            {
                return Math.Round(((double)frequency / TotalWordCount) * 100, 2);
            }
            else
            {
                return 0;
            }
        }

        /// <summary>
        /// Gets the prominence of the word in the specified text.
        /// </summary>
        /// <param name="text">The text.</param>
        /// <param name="word">The word.</param>
        public static double GetProminence(string text, string word)
        {
            if (string.IsNullOrEmpty(word))
            {
                return 0;
            }

            word = word.Replace('-', ' ');
            WordCollection wordCollection = Word.GetWords(text);

            int wordCount = GetWordCount(word);
            if (wordCount > 1)
            {
                wordCollection = HtmlDocument.GetDocumentPhrases(wordCollection, wordCount);
            }

            int findings = 0;
            double prominence = 0;

            for (int i = 0; i <= wordCollection.Count - 1; i++)
            {
                if (string.Compare(wordCollection[i].Text, word, true) == 0)
                {
                    if (wordCollection.Count > 1)
                    {
                        prominence += ((double)100 / wordCollection.Count - 1) * (wordCollection.Count - i - 1);
                    }
                    else
                    {
                        prominence += 100;
                    }

                    findings++;
                }
            }

            if (findings > 0)
            {
                prominence = Math.Round(prominence / findings, 0);

                if (prominence == 0)
                {
                    prominence = 1;
                }
                return prominence;
            }
            else
            {
                return 0;
            }

        }

        /// <summary>
        /// Gets the character count.
        /// </summary>
        /// <param name="text">The text.</param>
        public static int getCharacterCount(string text)
        {
            return text.Length;
        }
    }
}
 
Nuno Aguiar Dynamicweb Employee
Nuno Aguiar
Reply

Hi Nicolai,

 

This is perfect, thanks.

 

Depending on how bad they twist my arm I can try to do this in a Query.AfterQuery notification. Worst case scenario will be if the result they'd want re-sorted would land between 2 pages, but it would be much much harder for them to even find and repro, so it should be safe :P

 

Best Regards,

Nuno Aguiar

 

You must be logged in to post in the forum