Indexes

An index is a data structure optimized for data retrieval operations – which means that querying it is much faster than searching through each row in the database whenever a table is accessed.

To create an index:

Click Add index in the repository toolbar (Figure 1.1)
Name it
Click OK

Figure 1.1 Creating an index

This will open the index configuration page (Figure 1.2) from which you can add and configure the various index components.

Figure 1.2 The index configuration page

An index consists of the following components:

Instances – the data structures which are queried
Build configurations – a set of instructions for retrieving data from Dynamicweb and building an instance
Field definitions – a set of instructions detailing what goes in the index and how it should be stored
(Optional) Field types – custom field types can be used when you need to analyze data in a non-standard manner.

Read about the components below.

Instances

Instances are used to specify a location and an IndexProvider, which will create the index files at the location.

By default, Dynamicweb uses Lucene 3.0.3 and comes with a LuceneIndexProvider which builds index files in a folder in the file archive, e.g. /Files/System/Indexes/YourIndexName/YourInstanceName.

To create an instance:

Click Add instance on the index configuration page to open the Instance configuration (Figure 2.1)
Provide a name
Select a provider – this step is easy, as we supply only one provider out of the box
Specify a folder
Click OK

Figure 2.1 Creating an instance

Once created, an instance will look like the Lucene B index in Figure 2.2 – this is because it has not been built yet. Once it has been built it will look like the Lucene A index.

Figure 2.2 Instance configuration page

An instance only contains the data which existed at the time it was built. This means that your instances must be periodically rebuilt to include new data - and that you should always have more than one instance defined, since indexes cannot be queried when they are being built.

You can create scheduled tasks for automatically rebuilding your instances at an interval – see details below – and these tasks will build the instances sequentially, and will not rebuilt the last index if the previous instanced failed to be built correctly. This means that you will never be without an instance to query – even if something goes wrong during the built process.

Balancing instances

If you have more than three instances defined, you can choose between two different methods for selecting an alternative index when the primary index is being rebuilt:

ActivePassive mode selects the next active instance available on the list. So if instances A is unavailable (being built, has failed to build), instance B will be used unless it is unavailable, in which case instance C will be used, and so forth
LastUpdated mode selects the most recently built index and uses that

ActivePassive mode is used by default – to change to LastUpdated mode:

Click Balancer in the ribbon bar (Figure 3.1)
Use the dropdown to select the LastUpdated balancer
Click OK

Figure 3.1 Choosing a balancing mode

For solutions receiving heavy traffic and frequent product data updates we recommend using LastUpdated to ensure that visitors are always shown the most recently updated data.

Builds

A build is a set of instructions for retrieving data from Dynamicweb and delivering it to the IndexProvider on an instance, which will then build the physical index files.

To create a build configuration:

Click Add build on the index configuration page to open the build configuration dialog
Provide a name
Select a builder
(Optional) Configure the settings exposed by the builder
(Optional) Set up notifications on run or failure
Click OK

See below for a detailed look at the builders provided by Dynamicweb.

ProductIndexBuilder

The ProductIndexBuilder is used to index products. It indexes data from multiple Ecommerce data tables, calculating group hierarchies and more – so the index contains all products fields, variant group fields, custom fields, category fields, stock location fields and a number of generated fields.

Figure 5.1 The ProductIndexBuilder

The ProductIndexBuilder supports the following builder actions:

Full builds everything from scratch
Update rebuilds only the products which have been edited within the timespan which falls between the current time and the HoursToUpdate setting
UpdateWithIds is a mode used by auto-builds from e.g. Dynamicweb PIM. Although this action cannot be called manually with a list of Ids, we recommend creating a separate builder action called AutoBuild to avoid conflicts between regular Update build and auto-builds.

The following settings can be configured:

Setting	Value	Comments
BulkSize	Integer – default is 500	The number of products being built at a time
DoNotAnalyzeDefaultFields	Boolean - defaults to False	If True, schema extender fields not set to analyzed by default
DoNotStoreDefaultFields	Boolean – defaults to False	If True, schema extender fields are not set to stored by default
DoNotFailOnMismatchingProductCount	Boolean - defaults to False	If true, building an index will not fail even if the product count before indexing and after indexing is different. This may be desirable if an import job happens while the index is being built.
EmptyStringReplacement	String – default is an empty string	NULL values are not indexed by Lucene, so to be able to locate an empty field you need to index it with a dummy value – this dummy value can be specified here.
HandleInheritedCategoryValues	Boolean - defaults to False	If True, inherited product category values are indexed. This is very slow, so please don't set this to true unless you really need to.
HoursToUpdate	An integer – not set by default	If combined with the builder action Update, only the products updated within the hours specified here are rebuilt
MaxProductsToIndex	Integer – default is 2147483647	The maximum number of products to index
OnlyIndexActiveProducts	Boolean – defaults is False	If set to True, only active products are indexed
ShopsToIndex	Comma-separated list of shop IDs	This setting makes it possible to create indexes which only contain products from the specified shops/warehouses.
SkipAllExtendedFields	Boolean – defaults to False	If set to True, the fields "CampaignStartTime", "CampaignEndTime", "CampaignShowProductsAfterExpiration", "IsVariant", "ManufacturerName", "AssortmentIDs", and "StockLocationProductAvailable" are skipped
SkipCategoryFields	Boolean – defaults to False	If set to True, all product category fields are skipped
SkipDetailImages	Boolean – defaults to false	If set to true, Details images are not indexed
SkipExtenders	Boolean – defaults to False	If set to True, no custom Extenders can extend (update, remove, add) the fields in the index
SkipGrouping	Boolean – defaults to False	If set to True, the fields "GroupIDs", "ShopIDs", "GroupNames", "GroupNumbers", "GroupDescriptions", "PrimaryGroupSort", "ParentGroupIDs", and "ParentGroupNames" are skipped
SkipGroupSorting	Boolean – defaults to false	If set to true, group sorting fields are not indexed – this may improve performance.
SkipImages	Boolean - defaults to False	If true, image paths are not indexed
SkipImagePatternImages	Boolean – defaults to false	If true, image pattern images are not indexed
SkipPrices	Boolean – defaults to false	If true, product prices are not indexed
SkipRelatedProducts	Boolean – defaults to false	If true, related products are not indexed

ContentIndexBuilder

The ContentIndexBuilder is used for indexing content – pages, their paragraphs, and their item fields.

The index is built by enumerating all available pages, then handling active paragraphs & item fields for each page.

The corresponding schema extender – the ContentIndexSchemaExtender – contains the following types of fields:

All fields from the Page table – e.g. PageActive, PageID, PageItemType, etc.
A number of Page content fields:
- Paragraph headers contains an array of all paragraph headers on a page
- Paragraph texts contains an array of all paragraph text content on a page
- Paragraph content contains an array of the item type properties for each item-based paragraph on a page
- Page property item type contains the name of the item type used to extend the page properties of this page (if relevant)
All item type fields in the format [item.SystemName]_[itemField.SystemName] and Property_[item.SystemName]_[itemField.SystemName], except the fields marked as 'do not include in search' in the item field settings.
Possibly a number of App fields – see more below.

Figure 6.1 The ContentIndexBuilder

The following settings are available:

ExcludeItemsFromIndex allows you to control whether or not item-based content should be indexed. False by default – which means item content IS indexed.
AppsToHandle allows you to specify exactly which ContentAppIndexProviders to include. Valid input is a comma-separated list of ContentAppIndexProviders to include. If nothing is set here, all ContentAppIndexProviders are included.

By default, we deliver a ContentAppIndexProvider for the forum – review the API doc on the ContentForumIndexProvider here.

Due to complexity issues the ItemListEditor field type is not indexed.

Creating a custom ContentAppIndexProvider

If you want to extend the content index with app-specific fields or documents, you must create a class inheriting from the ContentAppIndexProvider Class and override the relevant methods.

As example please take a look at ContentForumIndexProvider class.

SqlIndexBuilder

The SqlIndexBuilder is used to index a table from the sql server database – it executes a query without manipulating any data. Currently only understands the builder action Execute.

Figure 7.1 The SqlIndexBuilder

The following settings are available:

Connection String can contain an SQL connection string, e.g. “Server=.;Database=test;User Id=sa;Password=sa;”
Query can contain an SQL query which retrieves the columns and rows which should be indexed, e.g. “SELECT * FROM AccessUser”
Query to get count can contain an SQL query which returns a count of the rows being added to the index, e.g. “SELECT COUNT(*) FROM AccessUser”
UseStoredProcedure – a setting which can be set in the index XML config file. When set to True, the stored procedure name must be set in the “Query” setting which will be executed when running the builder to index the data.

UserIndexBuilder

The UserIndexBuilder indexes all fields on users, including custom fields (but not user behavior, like orders placed, or order value, etc.).

It executes the following query to retrieve users:

"SELECT * FROM AccessUser WHERE AccessUserType in (1, 3, 5)"

As of Dynamicweb 9.7, the UserIndexBuilder supports the following builder actions (Figure 8.2):

Full builds everything from scratch
Update rebuilds only the products which have been edited within the timespan which falls between the current time and the HoursToUpdate setting
UpdateWithIds updates a set of users passed to the builder - this action is used by the system when a users is saved or user impersonation settings are changed

Figure 8.2 The UserIndexBuilder

In addition to the standard user fields, the index contains the following generated fields for each user:

Field	Field content
Groups	An array of user group IDs where the user is a member
GroupNames	An array of user group names where the user is a member
Is Admin	True if System Administrator or Administrator
Combined order totals	The sum of Order Price with VAT from orders completed by this user
Largest order price	Largest Order Price with VAT entry associated with this user
Order count for last 30 days	A count of completed orders associated with this user within the last 30 days
Bought products	An array of product IDs from order completed by this user
Loyalty points total	The sum of LoyaltyUserTransactionPoints from EcomLoyaltyUserTransaction associated with this user
Loyalty point last added	A DateTime entry of the last time loyalty points were added to the users
Loyalty point next expirery	Oldest loyalty point transaction date by the user summed with global setting /Globalsettings/Ecom/LoyaltyPoints/ExpirationPeriodInMonths

FileIndexBuilder

The FileIndexBuilder (Figure 9.1) indexes various data about the files in the file system – NOT the content of the files. This can be used to create e.g. a searchable media library for images, pdf files, etc.

The following standard data is indexed:

File name
Directory path (/Files/whatever/Folder/OtherFolder/)
Directory (OtherFolder)
ParentDirectory (Folder)
RootDirectory (Files)
Extension (i.e. jpg, png, txt etc)
Filesize in bytes
LastWriteTime

The following fields are generated:

FileFullName - file path and name
Date created time/Date created time UTC
Last access time/Last access time UTC
Last write time UTC
Is read only

We also index metadata (EXIF, XMP, and IPTC) for certain types of (image) files.

Currently, we can index metadata for following file formats; .pdf, .gif, .jpg, .jpeg, .psd, .bmp, .png, .tiff, .tif, and .ai.

Figure 9.1 The FileIndexBuilder

The following settings can be used to tweak the builder behavior:

Recursive can contain a Boolean value, and controls whether subfolder content is indexed. Defaults to True.
StartFolder contains the path to a folder, defaults to /Files.
SkipMetadata contains a Boolean value, and controls whether metadata (EXIF, XMP, and IPTC) on image files is indexed.

Schema Extender fields

Fields are mappings between the data retrieved by the builder and the index – a set of instructions detailing which fields should be added to the index and how they should be stored.

To make things easier for you, we’ve created schema extenders for products, content and users – these are predefined sets of field mappings with everything defined for you.

To use a schema extender:

Click Add field on the index configuration page to open the build configuration overlay (Figure 10.1)
Select the schema extender field type
Select the appropriate schema extender; FileIndexSchemaExtender, ContentIndexSchemaExtender, ProductIndexSchemaExtender, or UserIndexSchemaExtender
Click OK
Save

Figure 10.1 Using the Schema Extender

Once you’ve saved the index, you will see a list of fields provided by the schema extender in question, e.g. the fields provided by the ProductIndexSchemaExtender in Figure 10.2.

Figure 10.2 The ProductIndexSchemaExtender

Now, the schema extender naturally makes some choices on your behalf – that’s the tradeoff with a predefined set.

Here are some headlines:

All string type fields are analyzed by default which means that spaces are considered a divider (which in turn makes it possible to conduct free-text searches on the data).
The fields cannot be assigned a custom boost value

If this behavior is a problem for your setup – which it often will be – you can exclude fields from the schema extender, and then add them manually and with the settings matching your needs.

To exclude a field from the schema extender:

Click the schema extender in the Fields area to open the settings (Figure 10.3)
Under Excluded fields click Add
Select the field(s) you want to exclude
Click OK
Save

The field will now be excluded the next time the index is built.

Figure 10.3 Excluded fields in the Schema Extender