Chapter 4. Planning for Search
Given the potential benefits and challenges of enterprise search it is surprising that the 2012 Findwise Enterprise Search and Findability survey indicated that only 14% of respondents had a search strategy, though 30% were planning to develop a strategy in 2012/2013. This result is consistent with the Digital Workplace Trends report from NetStrategyJMC and tends to support the view that search is not seen as a business-critical element.
Search does need to be planned. It is technically challenging, users have both high expectations and a high dependency on the success of search and there is going to need to be a substantial investment in personnel for the search support team. As you read through this book you will find there is just one single theme. I call it White’s Rule of Search Investment:
The impact of search on business performance depends more on the level of investment in a skilled team of people to support search than it does on the level of investment in search technology.
There is a corollary:
Without an investment in a skilled team of people to support search no matter how great the investment is in search technology there will be no impact on business performance.
Enterprise search also bumps into many business operations. The search engine will need to interface with other applications and there are some legal and compliance issues. In the future the boundaries between search, business intelligence and content analytics are going to become increasing blurred and delivering access to enterprise search through mobile devices is going to be essential within a year or so.
In this chapter some of the business and technology issues that need to be addressed in a business plan and search strategy document are set out.
Making a Business Case
If you are asked to make a business case for enterprise search based on a financial Return-On-Investment model of just the costs associated with the software take heed of two important pieces of advice. The first is that it shouldn’t be done and the second is that it can’t be done.
When I am initially asked to help an organization make an ROI case for investing in enterprise search my initial request is to see business cases for other enterprise applications where the investment has been justified using an ROI calculation. Usually no such business case exists, or where it does the calculation is based on vague assumptions and yet approval is given for the investment. The justification is usually based on the proposition that without the investment the organization will not be able to function, supported by the signature of someone on the Board of the organization. Sadly at present no one on the Board wishes to be the sponsor for enterprise search and knows little about the technology itself or the value of the technology to the organization. The request for an ROI is purely a defensive measure in case things go wrong down the line.
Search is a high-touch application. It will be used personally by a substantial proportion of all employees in a service business and even in a manufacturing business enterprise search will support decisions being made which affect even staff on a production line. This cannot be said for a finance system, a customer relationship management system or a treasury management system, as just three examples.
The main reason why an ROI cannot be calculated is that there are no standard processes involved such as entering an invoice or updating an HR system. These processes can (at least in theory) be timed and costed based on the salary and overheads of an employee, but that is not the case with search. There is some published research from International Data Corporation which suggests employees spend perhaps eight hours a week looking for information. Making assumptions about the time that could be saved by investing in a good search application will not be founded on any sort of reality.
The reality is that if an organization needs a financial ROI to make an investment in search then it fails to appreciate the value of information as an asset, and in addition thinks that there is just a single metric on which to judge the business case for search. In fact there are multiple business cases, probably the same as the number of use cases set out in Chapter 3. Search has to be seen within the overall context of the business and its objectives, and that is why a full business plan is needed for search.
Invest in Skills Before Software
Over the last couple of years much of the investment in collaboration applications has been justified on the basis that the organization is not working together as effectively as it should, and implementing a collaboration application will transform the situation. There is usually only anecdotal evidence about poor collaboration and in due course only anecdotal evidence about improvement. When it comes to search the situation is the same. Someone (usually senior) has complained that they cannot find anything with the current search application and the organization needs to get something better.
Almost without exception if the organization is using a search application that has been implemented in the last five years, all the upgrades installed, and all the bugs fixed then significant benefits will arise from increasing the size and skill base of the search support team. During the course of writing this book I was involved in helping a major UK university achieve a higher level of satisfaction with their web site search application. There was virtually no internal support. One developer occasionally took time out to have a look at the search logs, but did not have the time to do anything more. I discovered that another UK university was using the same search application and were very pleased with the performance. The difference? Two members of the web team were allocated full-time to web site search.
Search Support Team
Technology can be bought but a search support team needs to the sort of people who are in very short supply inside most companies. Most of Chapter 10 is about the skills needed in a search team. It would not be too much of an overstatement to say that if you cannot find the people who will form the search support team there is really no point in making any investment in an enterprise search application.
To summarize Chapter 10 there are five search team roles:
Search Manager taking management responsibility for search delivery
Search Technology Manager, looking after the IT elements
Search Analytics Manager, running and analyzing search logs
Search Information Specialist, with responsibility for search quality
Search Support Manager, providing training and user support
In the initial stages of an enterprise search project these roles could be undertaken alongside other work but once the implementation begins these roles need to be filled on a full time basis. There simply is no option. If senior managers say that this is a ridiculous number of people to be supporting a single application ask how many staff support the ERP, business intelligence and document management applications. None of these are used by almost every member of staff every day.
Stakeholder Analysis
The term ‘stakeholder’ is common parlance in organizations but usually little is done to analyze them in any formal sense. The matrix (see Figure 4-1) below can be useful in this respect.
The first step in creating this matrix is to brainstorm a list of potential stakeholders and for each group, or individual, define the following elements:
Name and position
Potential positive and negative impacts on the project (Influence)
What would the project expect the stakeholder to contribute (Influence)
What is the stakeholder’s expectation of the outcome of the project (Value)
Once the stakeholders have been plotted onto the matrix the stakeholder strategy starts to take shape. For the High Influence/High Value quadrant (General Manager?) a member of the project team should build and maintain a one-to-one relationship. The High Value/Low Influence quadrant would have intranet management, document management and records management teams. These teams need to be kept informed and their views brought into the discussions of the project team. The classic example in the High Influence/Low Value quadrant would be the entire IT department and these need to be reassured that it is in their interests to be supportive. Finally in the Low Influence/Low Value quadrant would come the managers of most corporate departments who already have a lead application (finance, marketing etc). The ‘keep aware’ strategy is a blend of keeping them informed on a regular basis and be aware if they start to have concerns about the direction of the search project.
As the project proceeds some of the stakeholders may need to be moved to a different quadrant, but the maximum effort needs to be expended on identifying the stakeholders in the high value quadrants and delivering on their expectations.
Business Impact
It is important to set out the objectives of the organization as these could have a major influence on the way that search develops. The acquisition strategy is especially important as this could require the search application to index substantial new repositories at short notice and also result in the negotiation of new license deals with a number of different vendors. There could be challenging divergences in metadata values and consistency.
One way of identifying ways in which the search application could support the business is to review the risks that are almost always published in the annual report, and work up a mitigating approach to each risk that involves the search application. In 2011 Hofmann La Roche, one of the world’s leading pharmaceutical companies, identified five simple but challenging questions that employees were probably asking themselves and colleagues on a regular basis:
Can I handle this?
What is the implication?
Can we find out sooner?
Will it work?
Have I chosen wisely?
If enterprise search can provide answers to those questions then the business case can be made on just a few pieces of paper.
As well as this top-down approach the techniques set out in the previous chapter will provide all the evidence that is needed as to how search can have an impact on business operations and the achievement of objectives.
Even though search is used very widely in the organization it is virtually impossible to make a convincing business case across all, or at least, most employees. It is advisable to build the overall business case for investment on a number of individual business cases that resonate with senior managers. These might include improving customer service, shortening the time to prepare business proposals, reducing the time to develop a new product or being more responsive to the actions of competitors. The metrics that will illustrate success will be different for each of these business cases but will be grounded in business processes.
Nothing carries more weight than a story from a respected manager about how they failed to find information that could have made a positive impact on the organization. It’s a trick used by many management authors, who use call-out stories to make an impact in an otherwise mundane book on some aspect of business operations. Beginning the business case document with a really strong search success or search failure (or both!) is a guarantee that readers of the business plan will already be pre-disposed to agree to the investment.
The 2012 Findwise survey found that the top ten justifications for enterprise search implementation were the following, based on a summation of respondents ranking these reasons as ‘imperative’ or ‘significant’. The list is in decreasing order of importance:
Accelerate retrieval of information from known information sources
Improving the re-use of information and knowledge
Increasing the extent of collaboration through finding people with relevant expertise
Eliminating information silos and the risk that important information was not being found and used
Accelerating the speed of finding people both by name and expertise
Raising the awareness of what was already known
Eliminate duplication of work because relevant information could not be found
Improving the consistency and quality of response to queries from customers and partners
Creating a more personalized intranet solution
Providing support for compliance management
Unfortunately none of those are easy to quantify.
Search Owner
If building a search team is difficult finding someone who will take business responsibility can be even harder. This is probably because there are no business and compliance-critical workflow processes that are supported by enterprise search. Look around at the main enterprise systems and they are owned by the manager responsible for the workflow; Sales Director, Operations Director, HR Director and so on. In around 70% of organizations (based on the Findwise survey) the decision on a search application and the management of the application are the responsibility of the IT department. In many organizations search is owned by Corporate Communications, almost certainly when the same search application is being used for the web site and for internal enterprise search.
Here are two questions for you. In your organization what percentage of the total amount of content being indexed is owned by either Corporate Communications or IT and what percentage of the total number of employees work in IT and Corporate Communications? The answers will be small numbers, almost certainly less than 10%. IT should be delivering support services and certainly have an important role in search sub-system performance management. When the day comes that IT people regularly attend meetings with business units with supplier and customer facing staff then IT can own search. But not until that day.
In an ideal world search should report to the senior manager whose performance bonus is based on meeting customer requirements, either through product development and delivery or service development and delivery. This could be a General Manager, or Director of Manufacturing. All that the search owner really needs to do is fight for a sensible capital and operating budget.
Content
A search engine needs to be instructed about the content that needs to be indexed. The place to start is a content audit based around the repositories of information that the organization holds. The content audit needs to cover the following elements for each repository and application.
Owner
The first, and often most difficult, step is to find out who owns the repository. It may have been set up some time ago and the initial owner might even have left the company. If there is a current owner the chances are that the original intention of the repository has long since been overridden. This is often the case with a departmental repository where the department has been merged or fragmented over time.
Scope
A brief description of the content should be prepared, along with a description of the user categories who contribute and use the information in the repository. The total file size and total number of documents are important to know when it comes to sizing the search application. For the same reason the rate of addition of documents by time will give an indication of how frequently the repository needs to be re-indexed, or whether the documents are such that they need to be indexed as soon as they are added to the repository.
Document Size and File Formats
All the file extensions should be identified and listed out and at a minimum the maximum file size of the collection should be ascertained.
Metadata Management
If the content has been contributed through a content management or document management application then there will probably be good metadata tagging. If it is just a shared file server then even basic folder metadata could be inconsistent.
Language
Making the assumption that all the content is in a single language and that the language is English is only a safe assumption in a very few countries of the world. It could be that the French version of a document has been stored alongside the English version in what otherwise looks like a totally English-language collection.
Security
Working out the security rights is essential, and like so many other elements in the audit list it may not be immediately obvious what these are, especially if they are at document level rather than server access level.
The work involved in undertaking this content audit should not be under-estimated. In the process of converting this list into an Excel database any cell that is not completed could mean that the content is not indexed, or not indexed properly, and so becomes invisible.
Technology
An important component of the technology section is to provide a list of current applications that already have search functionality, possibly as an embedded application. Examples might include document management and records management systems and enterprise resource planning systems. There are often more of these embedded search applications than most managers appreciate. These are often optimized for a specific application, and probably an enterprise application could not provide the same level of search performance and satisfaction. Having a list of these applications enables decisions to be made about whether there would be a benefit in providing a federated search environment.
There are a number of technologies that need to be surfaced in a search business plan, and these include:
The use of open-source software
Mobile access to enterprise information assets
The adoption of cloud/software-as-a-service applications
In-house versus external development and maintenance
In most organizations there will already be a number of search applications in use. It is easy to suggest that having just one powerful engine will solve all current search problems at a stroke. It could easily add to them. An important section of any business plan should set out how the process of migration from a number of different search applications to one single enterprise search application is going to be accomplished. This is not just a technology issue, but has to be approached both from an IT and a user perspective. Without doubt there will be some considerable change management, training and support issues that will have to be addressed and solutions put in place long before the technical migration occurs and for some time afterwards.
As with any software application a search vendor will release versions of the software to either address bugs or to provide additional functionality. This section of the business plan should set out the basis for considering whether to implement a new version of the software, bearing in mind that there could be risks with connectors to other applications.
Infrastructure
Enterprise search can have some challenging infrastructure requirements as far as storage in particular is concerned. The topology of a large-scale enterprise search application with good disaster recovery and the minimum latency on queries will need careful planning. The issues will not just be about the size of the index relative to the size of the repository but also the write speeds of the disk arrays. Many capacity planning specialists will be in novel territory when it comes to planning search capacity.
Almost certainly there will be a need for test, development and production servers. For large-scale enterprise search applications there will need to be multiple production servers with distributed indexes.
Network bandwidth to distant but still important offices can also present issues that need careful review. Substantial files could be downloaded very rapidly for perhaps 10-15 minutes as a user works their way through the top 50 results looking for a specific piece of information. This can be a particular issue with PowerPoint files, and a number of vendors offer a feature to render the document, or PowerPoint file, as a small HTML thumbnail image.
Disaster Recovery
It is tempting to put search down the bottom of the disaster recovery priority list but arguably it should be right at the top. It may enable the organization to keep going while other applications are brought back to life. After all the search index will contain a copy of most, if not all, of the information that the organization possesses, and if the application can provide users with an HTML thumbnail of a document that could be more than enough for business-as-usual to continue.
A disaster recovery plan usually sets outs a Recovery Time Objective (RTO) defining the maximum application downtime and a Recovery Point Objective (RPO) noting an acceptable restore point. For an enterprise search application these need to be considered from basics rather than the blind adoption of objectives from other enterprise applications. It is not just a case of getting the application back and running from a user perspective but understanding and accounting for content that may not have been completely crawled or an index that has not been correctly updated. The index itself may have been distributed around the world, and with disaster recovering will come the need to re-synchronize the indexing process.
Security
Organizations are rightly very concerned about the risks from employees finding information that they are not entitled to see. Even if there is a corporate security policy the questions that should be asked is whether it is granular enough and implemented rigorously enough to ensure that ACLs can be created and maintained. The potential impact on staff and the reputation of the organization from a failure of the security policy could be very dramatic.
This is especially the case if the document being created is being indexed as soon as it is saved to a repository. At that moment in time there is in effect a duplicate of the content accessible to anyone with the correct security permissions. The index will almost certainly be backed up on a second server for disaster recovery purposes. Removing the document from the repository will almost certainly not remove the content from the index. If the document was a list of senior executive salaries then a search for this information might well disclose the amounts even if the document has been deleted.
Performance
There are many ways of measuring search performance. User-centric measurements are set out in Chapter 10 but and certainly need to be summarized in the business plan. In addition there are also some system performance measures that need to be set out. These typically include:
The optimum freshness for all categories of content
Crawl times and crawl frequency
The rate at which content is going to have to be ingested in order to achieve the desired freshness
The latency between ingestion and the index being updated
The amount of temporary disk space used in the process of creating and updating the index
Total size of the index as a percentage of the total amount of content
Elapsed indexing time for a new content set
Indexing processor time, which excludes speed gains from parallel processing
The expected number of queries-per-second
Desired response times
In cases where the enterprise search application is also going to be used for site search on the corporate web sites the internal performance and external (public) performance metrics may be quite different.
When desired response times are being established a number of special cases need to be taken into consideration. Federated search will almost certainly be slower than searching on a single repository or application and the way in which security management has been implemented will also have an impact on search response times. The time taken to display a results set on the desktop is only a small element of the search process, and users will quickly become dissatisfied with any response time to open up a document in excess of perhaps 10 seconds. This latency time is not just absolute but relative. If one search provides a user with response times of only a few seconds and the next search takes perhaps 30 seconds to call up a document the user reaction will be one of considerable disappointment even if in absolute terms the search application is working to the maximum of its technical performance.
Metadata and Taxonomies
A few years ago at a search conference there was a presentation about a new search implementation. The search manager reported that one of the tests that had been run during the implementation was to find members of staff called Jane. To everyone’s surprise most of the high relevance results were to male employees. It turned out that they had all written and submitted their cvs on a template owned by someone called Jane, and the search engine was placing more value on this metadata item than on the name field.
The problem with metadata is that the content contributor has to add it, and does so either with reluctance, or without due care or a combination of both conditions. This section of the search strategy needs to highlight the importance of metadata and how it will be generated, either automatically (e.g. the name of the content contributor from the system log-in information) or through manual addition. Entity extraction is a half-way house.
All the evidence points to the benefit of a taxonomy and metadata enhancing search performance, and especially in presenting highly-relevant information. However taxonomies are time-consuming to compile and to maintain. As with so many search-related issues a balance needs to be established between the value of the taxonomy and the benefit to users, taking into account that the users of the information may not be the people who have to add the taxonomy metadata in the course of saving the document.
Help Desk
A search application needs its own help desk, even if it is a virtual one, and there needs to be a Service Level Agreement both ways between the IT and Search Help Desks because it may take quite a bit of effort to work out what is causing the problem and what actions should be taken to remedy the problem.
Usability
Despite the high profile efforts of usability experts such as Jakob Nielsen few organizations seem to take usability seriously. Search usability testing is especially important because of the complexity of many search user interfaces with a profusion of filters, facets, annotations to results and perhaps even graphical representations of clusters of search results.
Training and Support
The view is sometimes taken that search should be so intuitive that there should be no need to providing training and support. This view is often based on the ‘simplicity’ of the Google search box, a view that ignores that books have been written on the very wide range of hacks that are available to users of Google search.
The same is true of an enterprise search application. Certainly there should be as few barriers as possible to carrying out a basic search, but in an enterprise context there are probably very few basic searches as finding most, and ideally all, of the relevant information is very important.
Risks
It is always advisable to have a risk management strategy for enterprise search, and this probably needs to cover off the following risks:
Lack of resources in the search team leading to poor search performance
Search manager leaves and there is no internal candidate
Search vendor is acquired or goes out of business
Search vendor unable to provide an adequate level of support
No clear roadmap for development
Changes in senior management at the vendor result in a repositioning of the search engine
Inadequate security management leads to a breach of access permissions
Key development skills in the open-source contractor are not available
Disaster recovery procedures prove to be inadequate
Enterprise networks are giving rise to significant performance problems
Poor performance of connectors and APIs
Best bets are no longer best bets
Web Site Search
An enterprise search strategy should also include a strategy for web site search. The major issue here will be whether using a search application optimised for internal information will be a good solution for external users of corporate web sites. There is no right answer. The requirements of both need to be defined and then the extent of the overlap and the implications of any compromises that might arise from using the same search application need to be considered.
Summary
Writing an enterprise search strategy or a business plan is not an exercise that can be completed in a few days sitting at a desk. Of the list of topics covered in this chapter the most time consuming will be the content audit. I find it difficult to understand why organizations do not document their search strategy, and then maintain it on at least an annual basis. As with any major IT project the more work undertaken in the planning stages the lower the risk and the greater the benefits post-implementation.
Further reading
You’ll find some additional information regarding the subject matter of this chapter in the Further Reading section in Appendix A.
Get Enterprise Search now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.