Developer forum

Forum » Feature requests » Lucene: Document AsDocument(bool createSumaryField)

Lucene: Document AsDocument(bool createSumaryField)

Anders Ebdrup
Reply

Hi

 

I would be nice to have the method "internal Document AsDocument(bool createSumaryField)" made virtual in "Dynamicweb.Searching.IndexEntry", so it is possible to decide how to store the Lucene document.

 

Best Regards

Anders


Replies

 
Anders Ebdrup
Reply

Does anyone have an idea of how to store multiple multiple values in a field for a Lucene document? As this is not possible yet with the internal method and it does not seem like the SetValue or GetValue support this..

 

I would like to achive this:

You can have a field for a Lucene document occur multiple times. Create the document, add the values for the the name and author, then do the same for each category

  • create new lucene document
  • add name field and value
  • add author field and value
  • for each category:
    • add category field and value
  • add document to index

When you search the index for a category, it will return all documents that have a category field with the value you're after. The category should be a 'Keyword' field.

 
Pavel Volgarev
Reply

Hi Anders,

 

Making "AsDocument" method public would violate our rule of abstracting the caller from the native Lucene.Net APIs (for your own safety, in this case we can, for example, freely upgrade the engine without worrying about compatibility issues). Could you elaborate on what exactly are you trying to achieve?

 

If you have a product field that can accept multiple values (for example, checkbox list) then the system will take care of this automatically - the field values will be separated by single space allowing to querying single "tokens": productfield_field01:Value1 where the actual value can be "Value1 Value2 Value3".

 

-- Pavel

 
Anders Ebdrup
Reply

Hi Pavel

 

I do understand your view on the problem, but if I override a method like that, I think it will be my problem if you upgrade the engine :-)

 

What I try to achieve, is to have multiple values indexed in one field. If I use your approach it will be possible for my to search the index, but I will not be able to calculate my facets properly.

 

My facets are calculated by this method:

        protected virtual IEnumerable<KeyValuePair<String, Int32>> GetQueryFacets(Query query, string facetField)
        {
            facetField = facetField.ToLowerInvariant();
            if (!FacetsCacheContainer.ContainsKey(facetField))
            {
                var cache = new List<FacetValueCache>();
                var allDistinctField = FieldCache_Fields.DEFAULT.GetStrings(IndexReader, facetField).Distinct();

                foreach (var fieldValue in allDistinctField)
                {
                    if (!String.IsNullOrWhiteSpace(fieldValue)) // We do not want to search for null values
                    {
                        var facetQuery = new TermQuery(new Term(facetField, fieldValue));
                        var facetQueryFilter = new CachingWrapperFilter(new QueryWrapperFilter(facetQuery));
                        cache.Add(new FacetValueCache(fieldValue, facetQueryFilter));
                    }
                }
                FacetsCacheContainer.Add(facetField, cache);
            }

            //now calculate facets.
            var mainQueryFilter = new CachingWrapperFilter(new QueryWrapperFilter(query));
            var facetDefinition = FacetsCacheContainer[facetField];
            return facetDefinition.Select(fd =>
                new KeyValuePair<String, Int32>(fd.FacetValue, fd.GetFacetCount(IndexReader, mainQueryFilter)));
        }

And the method: FieldCache_Fields.DEFAULT.GetStrings(IndexReader, facetField).Distinct(); cannot understand multiple values in the same field, and that is why I need to have the values indexed correctly in the index.

 

Otherwise I need you to support indexing types of List<string>, which may can be done by changing IndexEntryFieldInfo.Value to object and change the method internal Document AsDocument(bool createSumaryField) to (please note that I have not tested if the code actually works):

internal Document AsDocument(bool createSumaryField)
        {
            var document = new Document();
            var dictionary = new Dictionary<string, string>(this.Fields);
            if (createSumaryField)
                dictionary.Add(IndexEntry.GetSystemFieldName(IndexEntrySystemField.Summary), this.Summary);
            foreach (string name in dictionary.Keys)
            {
                if (!string.IsNullOrEmpty(dictionary[name]))
                {
                    string str2 = dictionary[name];
                    IndexEntryFieldInfo indexEntryFieldInfo = this.InitializeField(name, str2);
                    if (indexEntryFieldInfo != null)
                    {
                        if (indexEntryFieldInfo.Value is Enumerable)
                        {
                            foreach (var value in indexEntryFieldInfo.Value)
                            {
                                document.Add(
                                    new Field(indexEntryFieldInfo.Name, value,
                                              indexEntryFieldInfo.Store ? Field.Store.YES : Field.Store.NO,
                                              indexEntryFieldInfo.Tokenize ? Field.Index.ANALYZED : Field.Index.NOT_ANALYZED,
                                              indexEntryFieldInfo.UseTermVector ? Field.TermVector.YES : Field.TermVector.NO));
                            }
                        }
                        else
                        {
                            document.Add(
                                new Field(indexEntryFieldInfo.Name, Convert.ToString(indexEntryFieldInfo.Value),
                                          indexEntryFieldInfo.Store ? Field.Store.YES : Field.Store.NO,
                                          indexEntryFieldInfo.Tokenize ? Field.Index.ANALYZED : Field.Index.NOT_ANALYZED,
                                          indexEntryFieldInfo.UseTermVector ? Field.TermVector.YES : Field.TermVector.NO));
                        }
                    }
                }
            }
            return document;
        }

Please let me know what you think?

 

Best regards, Anders

 
Pavel Volgarev
Reply

Hi Anders,

 

There are "BulkEstimateResults" (specific to a filter) and "BulkQueryIndex" (generic) methods that you can use. Here's an example of how to return a map "Group Id -> Number of search results" using the API (group Ids are indexed as a space-separated string):

 

using Dynamicweb.Searching;
using Dynamicweb.Searching.Queries;
using Dynamicweb.Searching.Queries.Criterias;

...

// Select distinct group Ids either from DB or search index
IEnumerable<string> groupIds = GetGroupIds();

// Queuing group Ids
var groups = new Queue<string>(groupIds);

// The actual search query, e.g. "sys_description:bike"
var query = new SearchQuery(/* ... */);

// Serving next group Id without rebuilding the query (performance)
query.Add(new VolatileQueryElement((reset) =>
	{
		return groups.Any() ?
			new Criteria("groups", groups.Dequeue()) 
			: null;
	}));

// Querying the index (bulk mode)
BulkQuerySearchResult<string, GeneralIndexEntry> result =
	IndexManager.Current.BulkQueryIndex<string, GeneralIndexEntry>("Products", query, groupIds);

var estimates = new Dictionary<string, long>();

// Building a dictionary with estimates (group Id -> # of search results)
foreach (var groupId in result.Keys)
	estimates.Add(groupId, result.GetResult(groupId).Total);

 

 

Hope this helps.

 

-- Pavel

 

 
Anders Ebdrup
Reply

Hi Pavel

 

Thanks for trying to help me, but the main problem with this approach is that I have to know the groups for the facets to be calculated, which I don't.

Because of that I need a generic method to fetch the facets (as the one I have showed you: GetQueryFacets), and as I see it, the only way to do this is to use the Lucene properly and index as purposed to with multiple values for a field.

 

I really hope that this clarifies my needs?

 

Best regards, Anders

 
Pavel Volgarev
Reply

Hi Anders,

 

This line:

 

var allDistinctField = FieldCache_Fields.DEFAULT.GetStrings(IndexReader, facetField).Distinct();

 

does exactly what the "GetGroupIds" would do (I omitted the implementation but it can be as simple as "SELECT DISTINCT [GroupID] FROM [EcomGroups]" for above example if it should be done against the DB since the index is usually in sync with it).

 

I can see now that what my example does is not what you need (mine returns the number of search results for each facet whereas yours return the number of facets for each result) but it's relatively easy to add the transformation once you have search results (and the performance will vary depending on how many search results you get back and the algorithm you choose to transform them, linear in worth case).

 

Let me know.

 

-- Pavel

 
Anders Ebdrup
Reply

Hi again

No, the line:


var allDistinctField = FieldCache_Fields.DEFAULT.GetStrings(IndexReader, facetField).Distinct();


Only returns the first value in the field if it is tokenized and the values are separated with a whitespace.

 

The performance in my implementation is much faster as I only use the strength of the index and not querying the database.

Best regards, Anders

 
Anders Ebdrup
Reply

At the same time it would be great to have the ability to disable the summary in the index posts as it takes half of the space in the index.

 

Best regards,

Anders