Creating fast, Examine indexable widgetized Umbraco websites (Part 2)

Previously I outlined how widget based Umbraco sites allow for greater content reuse and more page layout flexibility. When using this approach however there are topics that require a little more care from the developer in order to keep Umbraco working as well as it does normally.

Search

Search functionality in Umbraco is achieved using Examine, a wrapper around Lucene. Examine works by indexing content from the data saved in the umbraco nodes; this is different to how Google will index a site as Google scrapes the markup on the page itself. What this means is that when using a widget based site the results you get back from the Examine searcher may not be what you are expecting.

This is because the content defined in the widgets is not saved in the page’s node but is only referenced. In order to get Examine to index all the content that gets displayed on the page we have to do a little bit of work.

Now when I first came across this problem I was stumped and my confused brain concocted a wide range of nasty ways to get the appropriate content indexed for the node. This included schemes such as populating a field in the node with all the content from its children and widgets whenever a node was published. I even thought of indexing the widgets or child pages and redirecting to places where they were included. Luckily I found another way that was both elegant and simple to implement!

Examine is both powerful and extensible and you can add things into the index when Examine is doing its thing. There is an event that Examine raises called GatheringNodeData which fires every time a node is published or a manual reindex is triggered. Below is a simple example on how to use this event to add fields into the index that don't actually exist on the Umbraco nodes. These extra fields can then be used to store the data of some selected child nodes or widgets.

1. Create a class that implements IApplicationEventHandler

2. In the OnApplicationStarting method subscribe to the event raised by Examine when gathering all the data:

ExamineManager.Instance.IndexProviderCollection["{NameOfYourIndexerHere}"].GatheringNodeData += GatheringNodeDataHandler;

3. In that handler you create (GatheringNodeDataHandler(object sender, IndexingNodeDataEventArgs e) in this example)you can add fields to the data saved.

    // All this stuff uses uSiteBuilder, so it’s not suitable for copy and pasting but you can easily see what’s going on.
    // I use my own helper class on top of uSiteBuilder to get nodes so I have strongly typed classes
    var estate= NodeFinder.GetCurrentNode<Estate>();
    if(estate != null)
    {
        var tabs = estate.GetChildNodesOfType<ContentTab>();
        // Get the main content of these tabs as one string
        var mergedTabs = string.Join(“ “, tabs.Select(t => t.MainContent));
        e.Fields.Add("contentTabs", mergedTabs);
    }

4. Set up your index in ExamineIndex.config so that it includes the newly added field.

<IndexSet SetName="{YourIndexSetNameHere}" IndexPath="~/App_Data/TEMP/ExamineIndexes/{YourIndexNameHere}/">
    <IndexAttributeFields>
    <add Name="id" />
    <add Name="nodeName"/>
    <add Name="updateDate" />
    <add Name="writerName" />
    <add Name="loginName" />
    <add Name="email" />
    <add Name="nodeTypeAlias" />
    </IndexAttributeFields>
    <IndexUserFields>
    <add Name="mainTitle" />
    <add Name="mainContent" />
    <add Name="contentTabs" />
    </IndexUserFields>
    <IncludeNodeTypes>
    <add Name="Estate" />
    </IncludeNodeTypes>
    <ExcludeNodeTypes />
    </IndexSet>

5. Trigger a reindex.

6. Now if you opened up the generated index in Luke you’d be able to see that all the Estate nodes had a field saved called contentTabs that contained a string full of the contents of the child tabs.

You would have to set up this process for all occurances that you wanted to be indexed in this way, but it is a pretty quick process.

Performance

Another problem with this is that the site becomes very macro heavy with more calls to more content nodes; this is a little more processor heavy than just rendering page fields and it is therefore very important to set up the caching in Umbraco properly.

Macro caching is often over looked and there really isn’t any excuse as it is so easy to set up. As standard you get three caching options:

Cache Period - This is the time in seconds that Umbraco will keep a cached version of the output of the selected macro before executing the script again.

Cache By Page - This means that Umbraco will cache the macro output separately for each page. This can be really useful if the macro takes data from the current page. As a rule of thumb:

If the macro uses a widget and the output will be exactly the same on every page then you are safe to uncheck the box; if not leave it checked.

Cache Personalized - This means that Umbraco will cache the macro output separately for each user. This is used less often than Cache By Page and is only really appropriate when you use personalised content on the site.

Umbraco caches a separate version of the macro output for each change in parameters; this is why it is safe to heavily use caching in a widget based site. You can also cache different macros depending on a query string value (and cookie I believe too); this is especially useful if you handle pagination through the querystring. To do this:

Add a parameter to the macro (e.g. “page”) with the type text.
When inputting the macro add in the Umbraco syntax for getting a parameter from the query string ([@{yourParamNameGoesHere}]).

<umbraco:Macro Alias="BlogArticles" page=”[@page]” runat="server" />

Now when Umbraco detects a change in this value it caches a separate version of the output, just like it would if a different value was hardcoded in.

Conclusion

In part one we outlined why taking a widget based approach was such a great way to architect Umbraco sites; the possibility of content reuse and controlled page flexibility gives the editors more freedom to make engaging sites. Unfortunately this approach could slow down the sites rendering times and make the pages unindexable in Examine. In this part I explained how Examine’s extensibility and Umbraco’s great caching support allowed us to overcome these problems with only a small amount of work.

I’d be surprised if many of you guys in the community weren’t doing this already, but for those that haven’t used this approach before, it’s definitely worth a try.

Comments

comments powered by Disqus