Sitecore Synonym Search with Lucene

Sitecore’s Content Search API comes configured with the standard analyzer by default, however its possible to configure a synonym analyzer if you need this functionality (i.e. searching for a synonym of a word in content finds that result). If you’re not sure what an analyzer is, head over to Adam Conn’s blog to understand it because its a key component of how both indexing and searching work.

Sitecore ships with its own implementation of a synonym analyzer: Sitecore.ContentSearch.LuceneProvider.Analyzers.SynonymAnalyzer. As you can see in Adam’s post referenced above, this analyzer uses the same standard tokenizer as the standard analyzer, and also leverages the same lowercase and stop filters to rule out case and stop words from searches. The key to the synonym analyzer is providing it a list of synonyms, which need to be set in your own custom XML file. The reason for this is that Sitecore includes its own synonym engine implementation that uses XML files to store the synonym mappings.

Configuring the Synonym Analyzer

In Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config (or you can patch the patch), change the inner defaultAnalyzer parameter reference from the standard analyzer to the synonym analyzer:

[xml]
<analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.PerExecutionContextAnalyzer, Sitecore.ContentSearch.LuceneProvider">
<param desc="defaultAnalyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.DefaultPerFieldAnalyzer, Sitecore.ContentSearch.LuceneProvider">
<param desc="defaultAnalyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.SynonymAnalyzer, Sitecore.ContentSearch.LuceneProvider">
[/xml]

Now unlike the standard analyzer, the synonym analyzer requires an implementation of an ISynonymEngine as its parameter:

[xml]
<param hint="engine" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.XmlSynonymEngine, Sitecore.ContentSearch.LuceneProvider">
[/xml]

Sitecore’s implementation of that engine is able to read from XML files, and its requires a path to the XML file as its only parameter:

[xml]
<param hint="xmlSynonymFilePath">C:\inetpub\wwwroot\yoursite\Data\synonyms.xml</param>
[/xml]

Putting it all together:

[xml]
<analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.PerExecutionContextAnalyzer, Sitecore.ContentSearch.LuceneProvider">
<param desc="defaultAnalyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.DefaultPerFieldAnalyzer, Sitecore.ContentSearch.LuceneProvider">
<param desc="defaultAnalyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.SynonymAnalyzer, Sitecore.ContentSearch.LuceneProvider">
<param hint="engine" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.XmlSynonymEngine, Sitecore.ContentSearch.LuceneProvider">
<param hint="xmlSynonymFilePath">C:\inetpub\wwwroot\yoursite\Data\synonyms.xml</param>
</param>
</param>
</param>
[/xml]

Defining Synonyms in XML

Now that you have the synonym analyzer configured to use your custom XML file, you need to use the following basic XML structure:

[xml]
<?xml version="1.0" encoding="utf-8" ?>
<synonyms>
<group>
<syn>fast</syn>
<syn>quick</syn>
<syn>rapid</syn>
</group>
<group>
<syn>slow</syn>
<syn>decrease</syn>
</group>
<group>
<syn>google</syn>
<syn>search</syn>
</group>
<group>
<syn>check</syn>
<syn>lookup</syn>
<syn>look</syn>
</group>
</synonyms>
[/xml]

All terms listed in the same group are synonyms of each other. So for example, if a content item has the word “quick” in its CMS content but you search for the word “rapid” you will get that content item as a result.

Understanding the Full Context of the Analyzer Configuration

If you want to understand how this little piece fits into the overarching analyzer configuration, its actually quite impressive how Sitecore has abstracted so much to be configured:

  1. The overarching <analyzer> is the PerExecutionContextAnalyzer which is described on the Sitecore 7 Dev blog. Its essentially a overarching “switcher analyzer” that allows a separate analyzer to be used per context.
  2. That analyzer takes a default analyzer and a mapping which can map unique analyzers per culture (also described in the same post above). If the culture matches a mapped analyzer it will use it, otherwise it will fall back to the default analyzer.
  3. The default analyzer is the DefaultPerFieldAnalyzer which looks at the same config file below for field-specific analyzers in the <fieldMap> section.
  4. It too has a default analyzer in the case that an individual field doesn’t have a specific analyzer set. This default analyzer is now our new SynonymAnalyzer which is the final fall back scenario.
 

Mark Ursino

Mark is Sr. Director at Rightpoint and a Sitecore MVP.

 

4 thoughts on “Sitecore Synonym Search with Lucene

  1. Thanks for this post. Very helpful.
    Do you know if its possible to get the synonyms data from the CMS itself instead of having a file in the filesystem?

    Cheers

  2. Great post – do you know if it is possible to allow the CMS editor to control the synonyms instead of an XML file?

  3. Hi Mark,

    I did as per the instructions given in this post, and what I can understand is if we have synonyms “test” and “testing” in same group and if someone is searching for test then it should return result for both test and testing.

    I tried same thing but it is still giving result for one keyword only.
    Any other steps I need to follow ?

    Any help would be appreciated!

    Regards,
    Yogesh

Leave a Reply to Yogesh Cancel reply

Your email address will not be published. Required fields are marked *

 

This site uses Akismet to reduce spam. Learn how your comment data is processed.