Finding The Needle in a Haystack

An experiment about improving search experience for the end user.

“Ask and it will be given to you; seek and you will find; knock and the door will be opened to you.” — Mathew 7:7

No one can deny that we live in the age of data. Data engulfs us and its constant stream of information is channeled through machines which in turn churns out useful aggregates and actionable intelligence. It is only logical to infer that efficient cataloging of data is necessary to promote discovery, knowledge and exploration of this massive data set.

Perhaps, the most famous example of them all is Google search. Google is involved in a constant endeavor to provide the most relevant search content given minimal search input. It takes a phenomenal effort to offer a fast, reliable search at scale all over the globe.

“Woah there! Google fan”, you might say. I’ll admit; I use it for navigation, searching through code documentation, and even queries like “car door repaint self”. However, this article is not about Google, nor about its search algorithm. It is about trying to achieve a better user experience by providing a meaningful search interface with a small effort.

The Keyword Argument

It is of little surprise that most users search on the web using a keyword centric approach. The user breaks down the sentence in his/her mind and highlights a few prominent words that are likely to be present in the target article. For example, if I want to learn about the effects of hurricane harvey on Houston, my brain constructs “What are the effects of hurricane harvey on Houston?”(Now try that on Google), whereas my hand says “hurricane harvey houston”.

Why is that ?

1.It’s easier to type 😅

2.You are interested in a bunch of articles, not a single article that matches your query exactly.

3.You did not pay attention in English class. (It is ungrammatical).

You and I implicitly understand that the search result is most likely a news/magazine article and there is probably no match for the exact sentence. Language is hard and ambiguous, but keywords are specific. In essence, your search engine does not understand what you are asking for. It does not understand what a hurricane means, not to mention the effects of a hurricane.

The sad part is that we, as users of the web, have grown accustomed to this style of search. Deconstructed word tokens are preferred in the quest for surfacing relevant results. The success of Elasticsearch, Solr based on Lucene, or any other inverted index backed search solution are a testament to this.

Well then, if you are not using Google search, what choice do you have? How about we journey together on an experimental (rather silly) implementation, and catalog its effects ?

Sneak Peek

In ancient times, gates represented a city’s access point. Search is like a portal, or a gate to the vast amount of (walled) knowledge inside.

Search is a portal into the knowledge realm of your application. It is a unified interface that enables exploration of the data that should be exposed to the end user.

In other words, a search tool opens up your application for the end user to explore or even more simply it tries to answer “What data do you have to offer?”

Show me the phones

As depicted above, we use the example of an application that allows the users to search for phones. Pay close attention to the input queries. Note that there is no keyword matching here, what you are seeing is the input query being translated directly to a database representation (SQL in this case).

This might be a narrow use case. But I can easily extend this to examples other than phones. Also notice that the language support is not universal, it can only handle a very small subset of the English language. Yet even in its simple form, it offers a meaningful user experience compared to a pure keyword matching approach.

The key difference here is that it understands that Lenovo phones are phones made by Lenovo rather than matching some words in the description of the phone.

Understanding user’s input significantly boosts the user experience as it feels much more natural to search for phones like phones made in korea or usa. It mirrors the way our brain constructs the query in the first place.

Thus, user experience of search improves drastically when the search tool is capable of parsing the input query. In simple terms, it is as if the search tool understands exactly what you need. Note that the input query supports composition: phones made in korea are also “phones”. Therefore, samsung phones is valid, and so is samsung phones made in korea. This recursive structure is used by semantic parsing libraries to derive a parse of natural language input.

Needle
Search is simply not about viewing the list of all phones. How about we get creative and ask some questions related to the make and the country it is made in?
1.Show me the Lenovo phones priced under 20,000 INR
2.How about Apple phones under 50k or Samsung phones over 30k ?
and so on…

If you observe closely, you can see how the search is guided by the results of the previous search. Compare this to the search approach seen in Amazon where filtering is commonly done via an array of checkboxes and sliders. Surely, they have their place, for e.g., it is hard to type “apple phones with a warranty of 2 years or is refurbished and sold by xy retailer”, but they are not necessary for simple queries like lenovo phones. A good search experience should take advantage of both approaches, showing the filters after the user has sufficiently narrowed down the original search using a suitable natural language phrase.

Haystack

You might wonder what framework is working behind the scenes. The answer is none. This entire demo was written in under 400 lines of javascript, along with a stand-up Express.js server to talk to MySQL. The objective of this demo is not to criticize and say that you are not doing enough when it comes to implementing search in your web application. It is about pondering over the possibility of improving the end user’s search experience with a small incremental effort.
Hope you find it thought provoking/interesting.

Epilogue

Things are fast changing though. Try typing samsung phones under 10000 in Google. Search engines are using machine learning and NLP to understand the user’s intent ever more every day. We live in the age of chat bots and conversational user interfaces (Apple Siri, Amazon Echo, Google Now). Hopefully, in the near future, we will have a truly seamless integration of a conversational style search interface with the applications we build.

Code

Watch this space for some updates: https://github.com/tmpaul06/sneak-peek

Related

ClearGraph
Airbnb’s tool for searching data.

Tools used

Illustrations drawn with Boxy SVG
GIFs made with QuickTime, gifsicle and Imagemagick.

Tharun Paul