Preface

No matter who they work for or what work they do, most people seem to want their internal search application to work like Google. When you ask what it is about Google Search that makes it so desirable, the response is usually about the very simple interface, the speed with which results are presented, and the high probability that the first page of results will provide at least a good start in the information discovery process. That’s quite an achievement for a company whose business is advertising and not search. The investment in research and development at Google reached $10 billion in 2014, and most of this is spent on nearly 20,000 staff. This effort was largely responsible for Google selling close to $60 billion of advertising in 2014. Does your organization spend 13% of its revenue on search?

There is more to the Google success story than technology. Website owners and contributors spend a considerable amount of time and effort to make sure that Google indexes their content and presents it at the highest possible position in a list of search results. However, internally, there are never any rewards for making sure that information is of the highest quality and presented in a way that will make it easy for the technology to work its retrieval magic. There is rarely more than one lonely person with the responsibility for supporting the search application and making sure that it is tuned to meet user requirements. Investment in search is never seen as a priority.

There is another aspect of Google Search that is not fully appreciated. For almost any search, the same information will be presented from multiple sources, be they restaurant reviews, airline flight times, or the distance from the Earth to the Moon. The real reason that people want Google is that they trust it to deliver something, even if not everything. For most purposes, something is good enough. If Google cannot find British Airways flight times from London to New York, then you can check out the airline or the airport or a multitude of other sources.

Inside an organization, information that cannot be found is information that, in effect, does not exist. It has vanished. Permanently. No one will ever see it again. There is a chance that a call to a colleague might result in a document with the anticipated title, but can you be certain it is the latest version? Something is not good enough. Meanwhile, the colleague might be annoyed to be interrupted yet again by other colleagues asking about the same document. In the course of writing this edition, I interviewed over 20 senior managers in a global business about their requirements for a proposed intranet upgrade. Within a minute of the introductions, without exception, they started to talk at length about the poor quality of the intranet search, and several wanted Google. I explained to them that the search application they had was significantly more powerful and had a wider range of functionality than a Google enterprise appliance. It came as quite a shock!

The fundamental problem is that organizations do not see information as an asset. They know how many desks there are, how much money is in the bank, the names of every employee and customer, and the depreciated value of buildings and IT hardware. They have no idea of the amount of information they have. The CIO might quote a total storage volume, but that is not the same as the amount of credible, trustworthy information. It is usually not until the intranet is migrated from a perfectly usable content management system (CMS) to SharePoint that the organization finds that it has perhaps 500,000 documents hidden (and I use the word advisedly) in the application.

CEOs, managers, and other leadership personnel are now beginning to appreciate that information that cannot be found and shared might well be putting their organizations at risk. All directors have a responsibility to minimize the risk profile of their organizations, but rarely does information risk appear explicitly on the risk register. If it did, the market demand for search would escalate exponentially. Information management and the currently more popular term of information governance are gradually moving center stage. Knowledge management still has a role to play, but as with Big Data, it is not easy to translate into improvements in revenues and profits. The impact of information is much easier to assess.

Google is certainly impressive, but it is not quite the information access panacea that it seems to be. Searching scholarly articles that have very few links is very challenging, which is why Google offers Google Scholar. Google knows that one size does not fit all, but that is a message lost on far too many IT managers. Entering search terms into the Google search box seems very easy, but why then are there books with several hundred pages of advice on how to get the best from Google? People need to be trained in how to search. Even Google is not totally intuitive.

This second edition is almost twice the length of the first edition, published in late 2012. The increase in size is not because there have been dramatic changes in technology, but rather, because in the previous edition I passed over subjects on search based on my own assumptions that they were already known. The feedback from readers was that I had skimmed over some topics, notably website search and user interface design. Moreover, both open source search and the search application in SharePoint 2013 have increased the awareness of what search can offer.

If you follow the guidance I offer in this book, then I am confident that you will achieve one of the most important attributes of Google—users will trust your enterprise search applications to deliver significant amounts of relevant information on which they can base decisions that will benefit their organizations and their careers. In my view, search is a decision support application. If you agree with me, then making a business case for the resources needed to capitalize on the existing investment in technology will be very easy, and could potentially save your organization from unnecessarily purchasing new technology “because the current technology is broken.”

This book is technology-light. I have described the core elements in search technology in a way that does not require a degree in computer science, primarily because I don’t have one. I am an information scientist. There are chapters on open source search and on SharePoint search but they are written for business managers and not for developers and systems administrators.

In the final analysis, good search depends on good content and good people. People who have the time to write and tag high-quality content, people to support search, people who have been trained to use search applications to their maximum potential, and people who are willing to provide an appropriate level of investment.

How to Use This Book

This book has been written to help business managers, and the IT teams supporting them, understand why effective enterprise-wide search is essential in any organization. The focus is on how to ensure that search applications deliver the information needed to make sound business decisions that enhance business performance. This second edition is twice the size of the first edition, which was published in 2012. Virtually every chapter has been revised, and chapters on open source search, SharePoint search, user interface design, website search, and search governance have been added.

One of the most visible changes to this edition is that each chapter includes a list of books, papers, reports, and web resources for readers who wish to dig deeper into search technology and its applications.

The first two chapters set the scene, explaining why effective enterprise search is essential to any organization. Chapters 3 and 4 then provide a low-technology description of how search works. After an overview of the search business in Chapter 5, there are individual chapters on open source search (Chapter 6) and on SharePoint SP2013 search (Chapter 7), as both are widely used for enterprise search purposes.

Chapter 8 is all about search governance, and in particular, the skills needed to provide support to enterprise search. In almost every case I have come across of search failing to meet the requirements of the business, it is because there is not an appropriate level of skilled support for an application that is used probably every day by most employees.

The process of specifying, selecting, and implementing a search application is set out in Chapters 9 to 14, including substantial chapters on how to define user requirements and the design of user interfaces. Search performance is not just about speed but about employees feeling that the search application meets their expectations and has a significant impact on business performance. Chapter 15 covers how to assess technical performance, discovery, satisfaction, and business impact.

Corporate websites are also one element of an enterprise search strategy and are the subject of Chapter 16, followed by chapters on eDiscovery (Chapter 17) and content analytics (Chapter 18). In Chapter 19, I have been brave enough to suggest how search technology will develop over the next five years, by which time I will have retired!

The book sets out all the information you need, or need to collect, to write a search strategy. Every organization writes strategies to its own house style. Appendix A provides an A–Z of all the topics that need to be included. Finally, there is a list of 10 critical success factors (Appendix B), a list of books and blogs on search (Appendixes C and D), a list of search vendors (Appendix E), and a Glossary. Hopefully, this will be all you need to make enterprise search a success for your organization.

Safari® Books Online

Safari Books Online is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business.

Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.

Safari Books Online offers a range of plans and pricing for enterprise, government, education, and individuals.

Members have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and hundreds more. For more information about Safari Books Online, please visit us online.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

  • O’Reilly Media, Inc.
  • 1005 Gravenstein Highway North
  • Sebastopol, CA 95472
  • 800-998-9938 (in the United States or Canada)
  • 707-829-0515 (international or local)
  • 707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://oreil.ly/1VHpGLW.

To comment or ask technical questions about this book, send email to .

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

I could not have written this book without the generous support of many colleagues over quite a number of years. In particular, I must thank Stephen Arnold, Paul Clough (Information School, University of Sheffield), Jed Cawthorne, Ed Dale (E&Y), Jeff Fried (BA Insight), Charlie Hull (Flax), Miles Kehoe (New Idea Engineering), Helen Lippell, Andrew McFarlane (City University), Agnes Molnar, Valentin Richter (Raytion), and Professor Elaine Toms (Information School, University of Sheffield). Agnes, Charlie, Jed, Jeff, and Elaine read through substantial portions of the book at a draft stage and made invaluable contributions.

Other colleagues who in various ways have shaped my understanding of search technology and search good practice include Denise Bedford, Thomas Borch, Susan Feldman, David Hawking, Jane McConnell, Kristian Norling, Sam Marshall, Peter Morville, James Robertson, Tony Russell-Rose, Michal Szlanowski, and many members of the Findwise team.

Janus Boye (JBoye), Erik Hartman (Hartman Event), Kurt Kragh Sørensen (IntraTeam), and Val Skelton (UKeiG) have given me many opportunities to run search workshops at their events. These have been invaluable in learning from the experiences of enterprise search managers across Europe. Katherine Allen and the team at Information Today Europe have supported my vision for the Enterprise Search Europe conference with enormous enthusiasm and skill. I have had the honor of being a Visiting Professor at the Information School, University of Sheffield, since 2002 and everyone on the academic staff has given generously of their time and expertise.

I am grateful to the Aberdeen Group, AIIM, Alta Plana, and Findwise for the use of information from their surveys. Over the last decade, I have carried out many enterprise search consulting assignments, but I am not in a position to list the organizations involved. Each of these assignments has given me additional insights into the technology and use of enterprise search.

I would like to thank Allyson MacDonald at O’Reilly Media for commissioning this second edition and guiding it from spreadsheet to bookshelf. It is a privilege to be an O’Reilly author.

It has not been easy for my wife, Cynthia, when people ask her what I do for a living. Being an information scientist is fascinating for me but difficult for Cynthia to describe. She has been immensely supportive during 12 career changes and 8 books.

This book is dedicated to the memory of W. Gordon Graham, who was Chairman of Butterworths when I was involved in launching Lexis in the UK. Gordon was a distinguished publisher and an early and passionate believer in digital publishing. Although I left Reed Publishing in 1984, he continued to be a valued friend and mentor. He died early in 2015.

Search has been a part of my life since 1970 when I was using 10,000-hole optical coincidence cards at the British Non-Ferrous Metals Research Association. The principles I learned at that time in terms of indexing, constructing a query, relevance, recall, and precision have sustained me throughout the last 45 years. In many respects, nothing has changed, and yet everything has changed.

Get Enterprise Search, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.