6
The World of Enterprise Search By Susanne Koch, Pandia Search Central and The Norwegian Centre for ICT in Education, with contributions from Stephen E. Arnold, Beyond Search/ArnoldIT Photo by The Planet In recent years, enterprise search has lagged Web search in certain features. This paper presents four trends that suggest enterprise search is catching up in information retrieval and outlines some features that show the contours of innovation in the field of enterprise search. You will also learn which challenges that face an organisation looking to acquire an enterprise search system.

The World of Enterprise Search

Embed Size (px)

DESCRIPTION

In recent years, enterprise search has lagged Web search in certain features. This paper presents four trends that suggest enterprise search is catching up in information retrieval and outlines some features that show the contours of innovation in the field of enterprise search. You will also learn which challenges that face an organisation looking to acquire an enterprise search system.

Citation preview

Page 1: The World of Enterprise Search

The  World  of  Enterprise  Search   By  Susanne  Koch,  Pandia  Search  Central  and  The  Norwegian  Centre  for  ICT   in   Education,   with   contributions   from   Stephen   E.   Arnold,   Beyond  Search/ArnoldIT  

Photo by The Planet

In   recent   years,   enterprise   search   has   lagged  Web   search   in  certain  features.  This  paper  presents  four  trends  that  suggest  enterprise   search   is   catching   up   in   information   retrieval   and  outlines  some  features  that  show  the  contours  of   innovation  in   the   field   of   enterprise   search.   You   will   also   learn   which  challenges   that   face   an   organisation   looking   to   acquire   an  enterprise  search  system.  

Page 2: The World of Enterprise Search

  2  

 

The  challenge  You may find it easy to search the web using Google or Bing. But what do you when you want to search your own information: the data found on your company web site, the intranet, in the document management system, in emails, in house databases and other types of content? We have all experienced how useless some web site search engines are for finding information on company web sites. Many avoid such search forms altogether, using Google instead, limiting his or her search to one particular web site (like in “enterprise search” site:http://www.pandia.com). Even if Google has not indexed all the pages of that web site, the result are often much more relevant than the listings presented by the embedded site search engine. The reason such web site search engines so often fail is that it is extremely hard to rank content without the kind of link and usage information Google makes use of when ranking sites. Their algorithms make use of information given by millions of users in the form of links and search behaviour. When the algorithm has to rely on data generated by a few hundred content providers and searchers, the search engine has to rely on other types of information in the ranking of content, most often on page text and metatags, and that is rarely enough. Web site search solutions of this kind are often part of a larger enterprise search package. In-house enterprise search engines for the employees face many of the same problems as enterprise search engines developed for web site visitors. Without the necessary traffic, the search engine finds it hard to determine what is important (the Board presentation of last week) and what is not (the agenda of a meeting held in 1996). In other words: Is this page the main source of information on this topic, or is it a letter containing no more than a short reference? And like Google, enterprise search engines must also try to decide what kind of information is most relevant to your need. Are you looking for a web page? A manual? An email? A spreadsheet? And how do you index and compare information found in different platforms following different file standards, taxonomies and types of tagging? All of this means that if your company or organization is going to invest a large sum of money in a search engine that can crawl and make sense of your own data, you had better know what the enterprise search providers can offer. Over the last five years, the mainstream enterprise search vendors have improved the usability, performance, and functionality of their respective systems, and some of them have managed to overcome some of the problems presented above. This makes for a very competitive scene. The trends presented here are not the only ones that dot the landscape of search, but they identify some major features.

Page 3: The World of Enterprise Search

  3  

Trend  one:  Enabling  search  within  applications We have seen a steady increase in the appetite for search-enabled applications and a decrease in users’ interest in learning how to hunt for information in several systems. In answer, enterprise search has shifted from a "one size fits all" solution, where the searcher is forced to go to one particular search application or web page to search, to enterprise search as an enabler within many applications. We have seen a disappearance or submersion of search into other applications. This means that you don’t have to open the search application to perform a search. Instead the enterprise search solution integrates with the systems holding the information and makes it searchable within that system. Autonomy and Exalead are leaders in this field, although other firms are shifting to this orientation as well.

Trend  Two:  Combining  and  analyzing  structured  and  unstructured  data  Enterprise search is no longer focused on laundry lists. Presentation of search results is being done in new, user-friendly, ways. The search tools are trying to help the searcher identify what is relevant information. The outputs or search results are no longer lists of links and summaries that the searcher has to analyse herself. Instead the search systems generate reports that summarise the data the search retrieved. The enterprise search engine may also attempt to visualize the data you requested. Solutions like Endeca’s blend traditional outputs with a variety of report formats. New versions of well-known systems now perform "mash up" operations between structured and unstructured data from the different source systems that are being searched, being that intranets, archives, or other databases.

Trend  three:  Combining  different  content  types  in  search  results  The sharp increase in rich media is makes many existing enterprise search applications kneel. Video is becoming the primary means of explaining a product or a service for many organisations. Google itself allows engineers to explain a service in a short video placed on YouTube.com. Indexing and making video content searchable is going to force some vendors to exit the enterprise search sector, for the simple reason that their core competence is in text and the analysis of codified, text or number like, information. Analysing speech in a soundtrack requires a different set of competences. Deciphering the visual content in a photo or a video is even more complicated. At the present this is not feasible in any productive manner, but recent face recognition technologies applied by Google, Facebook and Apple tells us that it is only a matter of time before enterprise search customers will require the capability to search for

Page 4: The World of Enterprise Search

  4  

“photos of the CEO taken after midnight at Christmas parties between 2006 and 2010”. At the moment enterprise search vendors need accompanying text to perform such a feat, and such text is often missing. That being said, enterprise vendors have made some headway as regards media files. They no longer present separate lists of search results for different types of content. Instead, enterprise systems display integrated and deduplicated results lists. Texts, images, videos etc. are all presented, sorted not by content type, but by relevance. A vendor with effective technology for this kind of “federated search” (searching multiple disparate content sources with one query) and deduplication is Vivisimo.

Trend  four:  Social  functions  integrated  into  search  A last trend is the use of a platform to perform social functions. Enterprise search systems are struggling to cope with blog updates, RSS feeds, the stream of text snippets generated by Twitter, and constantly changing dynamic pages from, for instance, popular airline reservation systems. Some vendors let users create affinity groups around a particular topic (i.e. non-hierarchical groups of people sharing a common interest or cause). The search system then allows a person looking for information to identify these groups or search the content tagged by these groups. Some suppliers also support external social network content from such sources as Facebook or Twitter, among others. Although many vendors assert the social functionality of their information retrieval systems, Microsoft has been among the most successful in making sales of a content platform with search plus collaboration features.

The  buyer’s  conundrum  The buyer has the choice between several complex and quite different products and the sales pitches make each of them sound like just the thing for the organisation in question. Every search system vendor has licensees who sing the praises of the search system but not surprisingly; every search system vendor has clients who are deeply dissatisfied. Who can you believe? The reality is that the nature of information and the inability of a potential licensee to state exactly what information access problem is to be solved, are roadblocks. So is the nature of the sales and marketing processes. Most organizations today have experience with search and content processing systems. Users are familiar with the performance, interface, and features of services from Google, Microsoft, and some promising start-ups like Blekko.com and DuckDuckGo.com. Many vendors have worked to make their enterprise systems

Page 5: The World of Enterprise Search

  5  

more like the free Web search services, but there are important differences of which many decision makers are unaware. One downside of today’s systems is that licensees have to know what specific functions the search-and-retrieval system is to perform. Furthermore, licensees have to know what content the system is to process and how frequently the indexes of that content must be updated. Moreover, they have to be prescient, and try to foresee future needs when looking for an enterprise search solution. Given that most of them lack experience from this particular type of search, that is not easy. The fact that this technology can, to some degree, be shaped to the needs of the buyer, adds to the complexity. There is not one solution that fits all, and the buyer has to carefully weigh the costs up against all the different features marketed by the supplier. Most organizations already have multiple search-and-retrieval systems. A system may be built into Microsoft SharePoint. Enterprise applications like content management or customer support systems typically provide search. Some departments have acquired specialized search systems. In chemical and pharmaceutical companies, research units may license specialized systems to search chemical structure information. In the business development unit, a manager may have leased a Google Search Appliance or installed a department search system from Coveo, ISYS Search Software, or Fabasoft Mindbreeze. If the firm has an information centre, there may be a search system providing access to books, monographs, and other third party content. Most buyers would like to keep the existing search functionality, and preferably have them integrated into a complete enterprise search solution. But that is not always possible, given the diversity of software solutions, file types and tagging standards.

The  dialogue  with  the  vendors  Often there is no strong connection between the potential licensee and the users of the proposed system, i.e. between those who are buying the solution and those who are going to use it. A search vendor cannot bridge this easily. Because of this, it is a challenge to figure out what the system should do. Once armed with that information, the potential buyer can then try to match the requirements to vendors’ offerings. The supplier, when presented with a list of requirements, works to demonstrate how its system meets the requirements. The problem of course is that words describing search are not the same as using the system in day-to-day business. Different “tribes” use different jargon. Not surprisingly, the vendor’s system often triggers push back from the buyer. The buyer may indeed not know what is needed until the system is deployed. The vendor does not learn what the licensee really meant in the system requirements document until it is too late.

Page 6: The World of Enterprise Search

  6  

Some consultants talk up the importance of “bake offs” or “head-to-head” competitions, yet the financial climate is such that very few organizations have the time, expertise, or motivation to run objective tests. Testing the different offers on the market up against each other is very time consuming and requires special skills. The result is that almost anyone can set himself or herself up as a search expert and offer help with procurement. Hundreds of technology services firms assert their ability to work through engineering problems, and most list a number of search systems as part of their service base. Some of them are good; some of them are not. But even the best consultants need time and resources to go beyond what the prospective buyer says that she wants, and find out what she really needs.

How  to  make  your  new  enterprise  search  system  work  Enterprise search systems are complex and even the supposed plug-and-play systems like Google Search Appliance doesn’t work out of the box. Once the new system is working on the specific data of the organisation, problems arise. The reality is that even engineers working on a search system may not know how to resolve a problem due to the extreme complexities “in the depths”, being that data formats or the techniques needed to merge different data sources. The only way to remediate some problems is to invest in technical resources and time to solve (or more commonly work around) the problems. There are some steps that will make this process easier, though. In general, more frequent indexing requires a more robust content processing system and sufficient bandwidth to identify, copy, and process the source material. Licensees must provide adequate resources so that the enterprise search system can operate in a satisfactory manner. Starve the search-and- retrieval system for bandwidth, memory, and CPU horsepower, and the search system will not perform at its optimum level. And with the types and amounts of data that are being produced in most of today’s businesses, storage space must in many cases be doubled every four to six months. It is not cheap. But not being able to find the information you need, when you need it, may cost you more – much more! This paper is partly based on a new monograph, The Landscape of Enterprise Search, written by Stephen E. Arnold and published by Pandia.com this year. The ebook contains discussions of all the major enterprise search vendors, including Autonomy, Endeca, Exalead, GoogleSearch Appliance, Microsoft/Fast and Vivisimo. The ebook can be found at http://www.pandia.com/enterprise-search/ The paper was presented at the Online Information Conference 2011.