Tutorial 5: Extending Indexing

The Dynamicweb indexing engine – sometimes referred to as New Indexing – is a powerful and fast generalized search framework based on Lucene 3.0.3.

Broadly speaking, New Indexing consists of the following elements:

  • Indexes – which are data structures optimized for data retrieval
  • Queries – which are requests for data limited by criteria you define
  • Facets – which are used to create filters in frontend by passing parameters to a query
  • Tasks – which are used to rebuild indexes at an interval

All of these elements can be heavily configured to suit your particular scenario – and all exist within a so-called repository, which is simply a folder in the file archive containing configuration files.

New Indexing may be extended in a number of ways:

  • Indexes may be extended by extending an existing IndexBuilder, creating a custom IndexBuilder, or creating a custom Schema Extender
  • Queries may be extended by creating custom macros, code providers, and value mappers

An index is a data structure optimized for data retrieval operations – which means that querying it is much faster than querying a database directly. It consists of the following elements:

  • An instance is a physical data structure which can be queried for data
  • A build configuration is a set of instructions to an IndexBuilder for retrieving data from a data source and building an instance.
  • Field mappings are a set of instructions for which data from the source to should be added to which fields in the instance, and how the data should be analyzed and stored. A schema extender is a predefined set of field mappings for e.g. a product index or a content index.

Dynamicweb ships with four standard builders; a ProductIndexBuilder, a ContentIndexBuilder, an SQLIndexBuilder, and a UserIndexBuilder.

Likewise, Dynamicweb ships with three schema extenders; a ContentIndexSchemaExtender, a ProductIndexSchemaExtender, and a UserIndexSchemaExtender.

Both the default builders and the schema extenders are described in the Indexing & Search documentation article.

Of course, you can also extend the default functionality by either:

  • Extending an existing IndexBuilder
  • Creating a custom IndexBuilder
  • Creating a custom SchemaExtender

Most of the default IndexBuilders – the Product-, Content- and UserIndexBuilders – contains support for using IndexBuilderExtenders to extend the build process with data from e.g. a remote source.

The process is:

  • Make sure the SkipExtenders setting on the IndexBuilder is set to False
  • Manually add a field with a custom source to the index field mappings
  • Write some awesome code which will populate the custom field with data

Before proceeding, you must make sure that the IndexBuilder setting SkipExtenders is set to False.

  • Go to Settings > Repositories > Your Index
  • Open the build definition
  • Verify that the SkipExtenders is False (Figure 4.1)
Figure 4.1 Activating IndexBuilderExtenders

It should be set to False by default – but it never hurts to check. Next you must create a place in the index to store the remote data.

In order to use data from a remote source, you must have a place to put it – an index field with a custom source.

To add an index field with a custom source to the index:

  • Click Add field
  • Select a Field or Grouping type field and enter a name and a system name
  • Click the green plus icon and enter a custom source name, then select the custom source using the dropdown
  • Select a data type matching the source data, then check the stored and indexed checkboxes (Figure 5.1)
  • Click OK
  • Save
Figure 5.1 Creating a data destination

This leaves only one thing – adding data to the field.

The final step is to write the code which will populate your custom field with values during the indexing process.

You must write the class that will implement the interface of the index builder you want to extend. In the case of the ProductIndexBuilder that will be IIndexBuilderExtender<ProductIndexBuilder>, for the other IndexBuilders it will be IIndexBuilderExtender<[YourIndexBuilder]>.

Here is a sample code that will populate the custom field with some string value:

using Dynamicweb.Ecommerce.Indexing; namespace Dynamicweb.Indexing.Examples { public class ProductIndexExtender : IIndexBuilderExtender<ProductIndexBuilder> { public void ExtendDocument(IndexDocument indexDocument) { string myCustomFieldValue = "sample value"; if (!indexDocument.ContainsKey("MyCustomField")) { indexDocument.Add("MyCustomField", myCustomFieldValue); } else { indexDocument["MyCustomField"] = myCustomFieldValue; } } } }

To compile the code you will need to include a NuGet reference to Dynamicweb.Indexing and Dynamicweb.Ecommerce.

Once compiled and uploaded to the bin folder, you can build your index and verify that the new field has been assigned the value from the IndexBuilderExtender.

If the default IndexBuilders supplied by us are inadequate for your project, you can of course create a custom IndexBuilder from scratch.

To do so you must implement the IndexBuilderBase class from the Dynamicweb.Indexing package.

In the code sample below a custom FileIndexer is build. It will extract the content of PDF-files and make the textual content searchable.

Notice that open-source iTextSharper has been used for parsing/reading the PDF-documents. The most easy way to add the assembly, is to add from NuGet. Just search itextsharp from nuget.org. At the time of writing this, the version is 5.5.13.
using Dynamicweb.Diagnostics.Tracking; using Dynamicweb.Indexing.Schemas; using Dynamicweb.Configuration; using Dynamicweb.Content.Files; using System; using System.IO; using System.Linq; using System.Collections.Generic; using iTextSharp; namespace Dynamicweb.Indexing { public class CustomPDFIndexBuilder : IndexBuilderBase { // No http context available - getting domain from custom setting. Used for building complete link to file. private string Domain = SystemConfiguration.Instance.GetValue("/Globalsettings/Settings/CustomPDFFileIndexer/Domain"); // your-domain.com private string StartFolder = FilesAndFolders.GetFilesFolderName(); /// <summary> /// List of supported actions /// </summary> public override IEnumerable<string> SupportedActions { get { return new string[] { "Full", "Update" }; } } /// <summary> /// Gets default settings collection /// </summary> public override IDictionary<string, object> DefaultSettings { get { return new Dictionary<string, object> { { "StartFolder", StartFolder }, { "Domain", Domain } }; } } /// <summary> /// Default constructor /// </summary> public CustomPDFIndexBuilder() { Action = "Full"; Settings = new Dictionary<string, string>(); } /// <summary> /// Creates new object using settings data /// </summary> /// <param name="settings"></param> public CustomPDFIndexBuilder(IDictionary<string, string> settings) { Action = "Full"; Settings = settings; } /// <summary> /// Gets index builder fields /// </summary> /// <returns>Set of key-value pairs</returns> public override IEnumerable<FieldDefinitionBase> GetFields() { FileIndexSchemaExtender extender = new FileIndexSchemaExtender(); var schemaExtenderFields = extender.GetFields() as List<FieldDefinitionBase>; // Add your custom fields if (schemaExtenderFields != null) { schemaExtenderFields.Add(new FieldDefinition() { Name = "Text Content", SystemName = "TextContent", Source = "TextContent", TypeName = "System.String", Group = "PDF Specific", Indexed = true, Analyzed = false, Stored = true }); schemaExtenderFields.Add(new FieldDefinition() { Name = "Link to file", SystemName = "LinktToFile", Source = "LinkToFile", TypeName = "System.String", Group = "PDF Specific", Indexed = true, Analyzed = false, Stored = true }); } return schemaExtenderFields; } /// <summary> /// Builds current sql index /// </summary> /// <param name="writer"></param> /// <param name="tracker"></param> public override void Build(IIndexWriter writer, Tracker tracker) { string directory = string.Empty; tracker.LogInformation("{0} building using {1}", GetType().FullName, writer.GetType().FullName); try { tracker.LogInformation("Opening index writer."); writer.Open(false); tracker.LogInformation("Opened index writer to overwrite index"); //load builder settings if (Settings.ContainsKey("StartFolder")) StartFolder = Settings["StartFolder"]; if (Settings.ContainsKey("Domain")) Domain = Settings["Domain"]; tracker.LogInformation("StartFolder: '{0}'", StartFolder); tracker.LogInformation("Domain: '{0}'", Domain); if (Action.Equals("Full")) { //process files tracker.LogInformation("Starting processing files."); directory = Core.SystemInformation.MapPath("/Files/") + "\\" + StartFolder.Trim(new char[] { '/', '\\' }); if (Directory.Exists(directory)) { List<string> fileList = FileList(directory, tracker); tracker.Status.TotalCount = fileList.Count(); foreach (string file in fileList) { try { FileInfo fileInfo = new FileInfo(file); IndexDocument document = new IndexDocument(); document["FileName"] = fileInfo.Name; document["FileFullName"] = fileInfo.FullName; document["LinkToFile"] = LinkToFile(fileInfo.FullName); document["Extension"] = fileInfo.Extension; document["TextContent"] = GetPdfText(fileInfo.FullName, tracker); document["DirectoryFullName"] = fileInfo.DirectoryName; WriteDocument(writer, tracker, document, fileInfo.FullName); } catch (Exception ex) { tracker.LogInformation(string.Format("Failed getting file-info from '{0}'. Failed with exception: {1}", file, ex.Message)); } } } tracker.LogInformation("--- Finished processing files ---"); } else { //check other actions and handle them } } catch (Exception ex) { tracker.Fail("Custom index builder experienced a fatal error: ", ex); } } private void WriteDocument(IIndexWriter writer, Tracker tracker, IndexDocument document, string filePath) { //allow extenders to process the index document foreach (var extender in Extenders) { extender.ExtendDocument(document); } //write to index writer.AddDocument(document); tracker.Status.Meta["CurrentFile"] = filePath; tracker.IncrementCounter(); } private List<string> FileList(string dir, Tracker tracker) { // Prepare the final list of PDF files string[] files = Directory.GetFiles(dir, "*.pdf", SearchOption.AllDirectories); List<string> returnList = new List<string>(); for (int i = 0; i < files.Length; i++) { try { if (files[i].Length > 260) { tracker.LogInformation(string.Format("Length of full path to file exceeded 260 characters. File ignored: '{0}'", files[i].ToString())); } else { FileInfo fileInfo = new FileInfo(files[i].ToString()); if (fileInfo != null) returnList.Add(files[i].ToString()); } } catch (Exception ex) { tracker.LogInformation(string.Format("Preparing file list failed with the exception: '{0}'", ex.Message)); } } return returnList; } private string GetPdfText(string InputFile, Tracker tracker) { string sOut = string.Empty; try { iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(InputFile); for (int i = 1; i < reader.NumberOfPages; i++) { iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy tes = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy(); sOut += iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, i, tes); } } catch (Exception ex) { tracker.LogInformation(string.Format("iTextSharper failed parsing PDF: '{0}'. Failed with exception: {1}", InputFile, ex.Message)); } return sOut; } private string LinkToFile(string File) { try { if (Domain == string.Empty) return ""; string file = File.Substring(File.IndexOf(@"\Files")); file = file.Replace(@"\", "/"); string link = string.Format("https://{0}{1}", Domain, file); return link; } catch (Exception) { return ""; } } } }

Recap: the example above creates a custom FileIndexBuilder that extracts the text content of PDF files and builds an index with the content, making PDF content searchable.

For adding references to your VS project, these are the ones used for this task:

And a brief rundown:

  • In the SupportedActions property you can define the builder actions the builder can handle, e.g. Full or Update.
  • In the DefaultSettings property you can define builder settings with default values that your builder supports – the user will be able to change them in GUI.
  • In the GetFields() method you can define the list of fields that you want to be saved in the index, usually it contains an instance of the “Schema extender” class that returns list of the fields.
  • In the Build() method you need to handle the actions and build your index data. In this example based on the start folder you can process the files from this folder and save them to the index.

Following the example, you can write you own index builder and index any other data you need. Once your custom IndexBuilder has been built and uploaded to the bin folder for the solution, it will be available alongside all the standard IndexBuilders when creating a new build, with the actions and settings you created.

Please note that a builder retrieves data from a source and also handles the build process – but depends on field mappings to know where in the index the data should be placed. This can be done by manually adding fields to the index definition – or through a schema extender, which is a predefined set of field mappings and storage instructions tailored to a particular IndexBuilder.

If you want to know how index settings are stored, you can go to the /Files/System/Repositories/[Your repository name] folder and look into *.index xml files with settings.

You may want to define your own schema extender to allow you to specify a default list of field mappings for a particular IndexBuilder.

The field object contains the following basic properties:

  • Name – field name that will be shown in UI
  • System Name and Source – name that will be stored in the index configuration/settings and in the index column name
  • TypeName – the .NET type name, for example: “System.String”, “System.Int32”, “System.Int64” or “System.String[]” if you want to store array of values in one field.

The following storage instruction can also be enabled:

  • Analyzed – the field is run through an analyzer, and tokens emitted are indexed. This only makes sense as long as the field is also indexed.
  • Indexed – the field is made searchable, and stored as a single value. This is appropriate for keyword or single-word fields, and for multi-word fields you want to retrieve and display as single values (e.g. for facets).
  • Stored - field has its value stored as-is in the index. This does not affect indexing or searching – it simply controls whether you want the index to act as data store for value. Since most of your data will be in the Dynamicweb database, you usually don’t need to store a field in the index.

To create your own Schema extender you need to implement the IIndexSchemaExtender interface – this example returns two fields, matching the example of the custom IndexBuilder from the previous section:

using System.Collections.Generic; using Dynamicweb.Indexing.Schemas; namespace Dynamicweb.Indexing { public class CustomIndexBuilderSchemaExtender : IIndexSchemaExtender { public IEnumerable<FieldDefinitionBase> GetFields() { List<FieldDefinitionBase> fields = new List<FieldDefinitionBase>(); fields.Add(new FieldDefinition { Name = "File full name", SystemName = "FileFullName", Source = "FileFullName", TypeName = "System.String", Analyzed = false, Indexed = true, Stored = true }); fields.Add(new FieldDefinition { Name = "Extension", SystemName = "Extension", Source = "Extension", TypeName = "System.String", Analyzed = false, Indexed = true, Stored = true }); return fields; } } }

After compiling this code together with custom index builder class, you will be able to select the schema extender in the default manner (when adding field mappings to an index definition). Save the index to see the list of schema extender fields in the Schema Extender Fields section.

Please note that you need to (re)build the index before you will be able to query the index fields.

For debugging purposes you may find the Query publisher app useful, or any other external tool that can open your index files, e.g. the Luke All tool (provided that you are using the default LuceneIndexProvider). The index files are usually located in the /Files/System/Indexes/[Your repository name]/[Your index name]/[Your index instance name] folder.

A query is simply a request for data submitted to an index – e.g. ‘Return all active products with a price below 100’.

Queries are created by stringing together a number of expressions (Figure 9.1) which limit the data you receive from a query (an empty query returns all data in the index).

Figure 9.1 Query expressions

An expression is not particularly complicated – it consists of a field in the index, an operator, and a test value.

Test values can be either constants, parameter values, macros, term values, or dynamic values from a code provider.

You can extend the default functionality by:

  • Creating custom macros
  • Creating custom code providers
  • Creating custom value mappers

A macros is a dynamic value retrieved from the context, e.g. a PageID, WebsiteID, UserID, CartID, etc.

To create a custom macro, you need to inherit from the base class Dynamicweb.Extensibility.Macros.Macro in the Dynamicweb.Extensibility NuGet package.

After that you will need to implement the following abstract methods:

  • Name – this is you macro name that will appear in the macro list when selecting a test value
  • SupportedActions – the list of actions your macro supports
  • object Evaluate(string action) – method to handle actions and output the results of them.

See this sample custom Favorites Macro:

using Dynamicweb.Core; using Dynamicweb.Data; using Dynamicweb.Extensibility.Macros; using Dynamicweb.Logging; using System; using System.Collections.Generic; using System.Data; namespace Dynamicweb.Indexing.Examples { public class FavoritesMacro : Macro { public override object Evaluate(string action) { try { if (action == "FavoritesByRequestUserId") { return GetFavoritesByUserId(); } } catch (Exception ex) { ILogger logger = LogManager.Current.GetLogger("FavoritesMacro"); logger?.Error("FavoritesMacro error", ex); } return null; } private static object GetFavoritesByUserId() { int userid = Converter.ToInt32(Context.Current.Request["UserID"]); if (userid > 0) { List<string> ids = new List<string>(); string sql = string.Format("SELECT favoritesProducts.ProductId FROM AccessUser users JOIN EcomCustomerFavoriteLists favoritesList ON users.AccessUserId = favoritesList.AccessUserId JOIN EcomCustomerFavoriteProducts favoritesProducts ON favoritesList.Id = favoritesProducts.FavoriteListId WHERE users.AccessUserId = {0}", userid); using (IDataReader reader = Database.CreateDataReader(sql)) { while (reader.Read()) { ids.Add(reader.GetString(0)); } } return ids.ToArray(); } return null; } public override string Name { get { return "Custom.Ecommerce.Favorites"; } } public override IEnumerable<string> SupportedActions { get { return new List<string>() { "FavoritesByRequestUserId" }; } } } }

This macro will return the user favorite products ids.

If you compile the sample code and add the library to the bin folder you will be able to select the custom macro as a test value in an expression (Figure 10.2).

Figure 10.2 Using a custom macro

Like macros, a code provider allows you to dynamically construct a test value for an expression. However, a code provider allows you to define extra parameters and use them to calculate the test value whenever a query is being executed.

To create a custom code provider you need to create a class inherited from the Dynamicweb.Extensibility.CodeProviderBase Class. Here is a sample code from the already implemented “DateTime” code provider:

using System; using System.Collections; using Dynamicweb.Extensibility.AddIns; using Dynamicweb.Extensibility.Editors; using Dynamicweb.SystemTools; using Dynamicweb.Extensibility; namespace Extensibility.CodeProviders { [AddInName("Dynamicweb.ICodeProvider"), AddInLabel("Custom Code Provider"), AddInActive(true), AddInGroup("System.DateTime")] public class CustomCodeProvider : CodeProviderBase, IDropDownOptions { [AddInParameter("Number"), AddInParameterEditor(typeof(IntegerNumberParameterEditor), "inputClass=inputControl")] public int Number { get; set; } [AddInParameter("Interval"), AddInParameterEditor(typeof(DropDownParameterEditor), "inputClass=inputControl")] public string Interval { get; set; } public override string BuildCodeString() { return string.Format("@Code(DateTime.Now.Add{0}({1}))", Interval, Number); } public override string BuildDisplayValue() { return string.Format("Today {0} {1} {2}", Number >= 0 ? "+" : "-", Math.Abs(Number), Interval); } public Hashtable GetOptions(string dropdownName) { Hashtable options = new Hashtable(); options.Add("Minutes", Translate.Translate("Minutes")); options.Add("Hours", Translate.Translate("Hours")); options.Add("Days", Translate.Translate("Days")); options.Add("Months", Translate.Translate("Months")); options.Add("Years", Translate.Translate("Years")); return options; } } }

This code provider evaluates the expression based on the selected parameters Interval and Number (Figure 11.2).

Figure 11.2 A custom code provider

In the BuildCodeString method, the expression evaluation is carried out. In this case, it returns a DateTime.Now.AddHours(2) DateTime object.

In the BuildDisplayValue method, the display value for the UI is calculated – in this case, the display value is Today + 2 hours (Figure 11.3).

Figure 11.3 A code provider generating as a test value

A value mapper allows you to map a list of terms from your already built index to the appropriate object.
For example, the DateTimeValueMapper converts the index terms to a date time object, the GroupIDsValueMapper gets the available Ecommerce product groups by terms values and returns the list of group id - group name pairs, and it’s the same for the LanguageIDValueMapper, the ManufacturerIDValueMapper, and the VariantGroupValueMapper.

Value mapper must implement the abstract ValueMapperBase class – here is an example of a custom ProductType value mapper:

using System; using System.Collections.Generic; using Dynamicweb.Extensibility.AddIns; using Dynamicweb.Indexing.Querying; using Dynamicweb.Ecommerce.Products; [AddInName("ProductType"), AddInGroup("Dynamicweb.Ecommerce.Indexing.ProductIndexBuilder, Dynamicweb.Ecommerce"), AddInName("Type")] public class CustomProductTypeValueMapper : ValueMapperBase { public override IEnumerable<FieldValueMapping> MapValues(IEnumerable<string> terms) { List<FieldValueMapping> mappings = new List<FieldValueMapping>(); foreach (string term in terms) { ProductType type; if(Enum.TryParse(term, out type)) { mappings.Add(new FieldValueMapping { Key = term, Value = type.ToString() }); } } return mappings; } }

This mapper converts the term values to the “Dynamicweb.Ecommerce.Products.ProductType” enum.
In the AddInGroup attribute you need to define the type name of the index builder which the mapper will be used for.
In the AddInName attribute you must specify the column names for which this mapper will be applied.

After building this mapper and uploading the library to the bin folder, you will be able to view the mapper results in the UI when using the term selector on an expression querying the product Type column (Figure 12.2).

Figure 12.2 A custom value mapper in action

In this tutorial, you’ve learned how to:

  • Extend an existing IndexBuilder with custom data using an IndexBuilderExtender
  • Create an IndexBuilder from scratch
  • Create a custom SchemaExtender for automatically mapping incoming data to fields in the index

In the next tutorial you will learn about extending the integration area by creating integration providers, using table script, and creating a custom scheduled task add-in.