Search

...now browsing by category

 

Behind Google’s Data Buying Binge

Friday, August 6th, 2010

Google used to be based on a simple premise.  The web is a big place, we help you find the relevant piece of information for your question and direct you there — as quickly as possible.  You don’t consume information on Google, you simply find it.  Users only spend 3.4% of their time on search engines.  This is changing.  Having the best algorithm is no longer enough.  Google is investing heavily to own the data across key vertical categories and slowly becoming a destination experience for consuming this data.  Unless you own and curate rich data-sets there are natural limits to both the search relevancy and experience you can provide.  Google is quickly adapting by buying access to vast and rich data sets. Let’s look at their recent buying binge:

  • Travel: Acquired ITA Software which aggregates flight routes and pricing information and enables advanced search capabilities on travel data.
  • Local: Attempted to purchase Yelp.com, and after that fell through they ramped investment to build their own local data set.  Additionally,  Google has been investing for years in map with StreetView and satellite imagery.
  • Metaweb (Freebase): Structured data of  people, places, and things.

When Google focuses on a category like local this is what happens to the search experience.  For a query like Delfina Pizzeria (an excellent pizza place in San Franciso) rather than linking to the best sources of information like Yelp, Zagat, SF Chronicle, etc. Google first pushes you towards Google Places. It currently includes a mix of their own content and other sources of licensed content.  What happens when they have their own pictures, reviews, and check-in data — do they really need to license all this other content from the likes of  Zagat and SF Chronicle.

I expect Google to complete the buying binge by acquiring companies with rich data sets across other highly monetizable categories:

  • Shopping: Amazon and eBay to a lesser extent are capturing significant percent of query share in a very lucrative area.  If users bypass Google and go directly to Amazon for their product queries this represents a serious threat to their business.  They need to acquire a company with a huge selection of product data — rich and structured product attribute (size, color), inventory availability, pricing and promotions, and user reviews.  I can’t think of one company (beyond Amazon) that has done this well at the scale Google would need.  This may require acquiring multiple companies to create this.
  • Real-Estate: Is somebody like Trulia next?

Enterprise vs. Consumer Products II: Managing Different Cuisines

Tuesday, December 30th, 2008

Continuing the series on managing enterprise vs consumer software one of the most significant changes a product manager needs to quickly grasp is the notion that great consumer services have a machine learning component to them while most enterprise systems are deterministic by design. This is important as these are two very different types of cuisine which require different one to change their mindset and optimization priorities. Yet, once one become facile with both there are opportunities to take elements of each and infuse them into each other.

What’s The Difference

Lets briefly define machine learning and deterministic systems. While there are examples of these types of systems in aviation and network systems I will confine this definition to the software application domain. Machine learning systems learn from the input of users and automatically correct themselves with limited to no human intervention. On day one they are not perfect, but a well designed one with a positive feedback loop will continuously improve. Web search is a good example of a machine learning system, it leverages implicit actions like clicks, time-spent, query refinements and re-ranks the results (both paid and algorithmic results) based on these implicit signals. The set of results (output) for a given query (input) will change over-time as the system weeds out the less relevant results.

Whereas deterministic systems execute a defined process, any modifications to the process require changes in the underlying product. For example, an order entry system for cable TV service takes an expected input from the user (address, cable package, installation time-frame, etc) processes it and returns the time of installation, confirmation number. As a product manager designing or working on the implementation side of an enterprise system even a minor error in a business rule acting on a data field can cause significant harm downstream so one rightfully becomes paranoid of data integrity issues.

Most enterprise applications optimize for accuracy and precision. Each year Comcast processes millions of orders – everything from a simple new service order to a more complicated change service order. An order entry error rate of even 2% costs will cost Comcast hundreds of millions of dollars as trucks roll to the wrong address or at the wrong time. Each order must capture a very specific set of data in a specific format (i.e. high accuracy), send the data to various downstream systems (billing, scheduling, network provisioning) and repeat this exact process millions of times a year (i.e. high precision).

Now contrast that with a web search engine which is an example of a machine learning system. Not withstanding the significant improvement in web search a user’s query returns hundreds of thousands of results, and of these thousands of results only the first ten or so are relevant to the their intent – clearly search is ripe for move innovation. Whereas deterministic enterprise systems are meant to handle consistent inputs and repeatable tasks machine learning systems such as a web search engine are meant to handle unique inputs and ambiguous intent. More specifically, 25% of web-search queries are unique – i.e. the search engine has never seen that query before. Furthermore, the user’s intent is often times highly ambiguous e.g. “lions fight” is the user looking for a recent fight at the Detroit Lions game or are they interested in understanding how lions fight with one another.

Infusion

So, with knowledge across these very different product “cuisines” how can a product manager with knowledge and experience across both these “cuisines” infuse elements of one into the other Simply put, we can bring machine learning techniques into the enterprise world to build better enterprise application and vice versa. Lets look at two examples.

Case I:

Smart Drop-Down Menus come to Web Search

As established above one of the advantages that enterprise systems have is consistent input. Obviously if a search engine knew every possible query a user could input the results would be perfect. While that is not possible at least for now, we can improve the input on two levels — by reducing query uniqueness and ambiguity. A little over a year ago Yahoo! launched SearchAssist. It works as follows, as the user begins to type their query the SearchAssist technology engages and gently drops down an assistance tray of potential similar queries. The user can either select one of the query suggestions from the drop-down tray or continue typing. Provided that the query suggestion worked this helps users clarify their intent (i.e. reduces ambiguity), provides a more predictable set of query patterns (user is likely to select from existing set of queries that are presented), and saves users some time (hitting enter is faster then typing seven or eight additional characters). Extending our analogy above, in many ways this is similar to a drop-down menu on an order entry form for Comcast cable service.

Case II:

Building Robust CRM Data Sets from Unstructured Email Data

Pattern recognition and machine learning are hallmarks of a web search systems. For example, once a web crawler downloads a web-page extractors identify web-page design elements that help it separate the header/footer and navigational elements of the page from the content, product description and price, amongst others. With a large enough training set the machine can start to detect these patterns accurately. Making sense of unstructured content (services like Dapper are simplifying this for all of us) is an essential element of building a great search engine – the better the search engine understands each piece of data on a page the better the search engine.

Infusing some of these techniques into enterprise systems can significantly improve data freshness and quality. CRM sales systems are notorious for their lack of data — unless sales executives prod their sales reps with a stick or carrot they rarely use these software tools, and when ultimately forced to do so, they enter the minimum set of data to be compliant. Want to know how many product issues a customer is having or the status of a renewal contract; this valuable yet unstructured data sits silo-ed in email and attachments.

What we want to develop is a tool (which I will refer to as the “DataGenie”) which crawls all sales reps email data, extracts the valuable data, and generates new data in the CRM sales system. Extracting this unstructured data is complicated, but there is some low hanging fruit to start with — data elements such as the name, role, email address, dates, priority and subject are all formatted data elements that can be easily pulled from email messages. Now, in decreasing order of data detection accuracy lets supplement it with richer data sets:

Detecting Addresses and phone numbers Consumer Mail applications like Yahoo! and Gmail already detect these data types, and its accuracy is reliable. If this data is then validated against the user’s contact address book or more generally the companies internal CRM address book.

Events and milestones

Lets look at a few examples of things we can expect to see in email threads which can be detected fairly reliably and mail services like Yahoo! Mail are doing so.

  • product demonstration next Tuesday at 10AM in our offices”
  • all RFPs will be due on Friday December 19th by 5PM PST”

Deriving Issue Type:

One can auto-generate dictionaries from the companies website. For example, for a refrigeration company these would include terms like “technical account manager” “24/7 support” and product names. Leveraging these dictionaries the detectors can determine what product is under consideration and whether it is a sales or product/technical issue.

Building Priority via Sentiment Analysis

Given that users tend to misuse the priority setting on emails there are other ways to determine priority from emails. Sentiment analysis technologies can detect the tone of the message based on the use of character types (bold, exclamation points) and keywords (unacceptable, failure, etc.).

Their tends to be a fair number of false positives (e.g. “49ers really suck this year… horrible QB”) may register as , but this technology is improving as startups like BuzzLogic and BlogPulse experiment with companies like P&G and ConAgra Foods are looking to sentiment analysis techniques to consumer response to their brands in blogs and message boards.

Once “DataGenie” extracts and populates the data here is what it would look like to a user of the CRM sales system:

DataGenie

On day one, the data generated by “DataGenie” will not be perfect, yet its an improvement over the status-quo of limited and stale data. So, how do we improve the data with some fairly simple positive feedback loops. Using the simple controls such as edit, delete, add, or the absence of any actions can provide important hints. Lets see how we can interpret these action if the user…

  • Adds to the record then the underlying data is solid — we can assume that “DataGenie”
  • Edits data elements (Events + Milestones) then reliability is low. With enough edits on certain data elements and the before and after we can pick up patterns. For example, the extractor may not be truncating important event or milestone data.
  • Deletes a data element within the record – data may not be associated properly. For example, the events & milestones data is not associated to this contract renewal issue. Why bother editing the data when the entire thing is wrong.
  • Takes no action. Depends on the overall level of user engagement, for a heavy user (lots of delete and edit actions) the absence of any action could mean that the data is reliable.

To the best of my knowledge “DataGenie” does not exist – if you are aware of a product that does this or something similar drop a comment below.

These are just two of the many ways in which a product manager can take their learnings from the enterprise world (highly deterministic systems) and apply them in the consumer software space (bias for machine learning systems) and vice-versa. If you have other interesting examples please share.

Inquisitor Comes to Life for Firefox and IE

Thursday, October 30th, 2008

Inquisitor for FirefoxAfter acquiring Inquisitor in the early summer we pulled together a team of developers who love Inquisitor to extend Inquisitor to Firefox and IE. The Inquisitor team includes folks in HQ Sunnyvale, Bangalore, Orlando, and Vancouver.  Each project and team is different – there were three things that we did well that made this product.

Passion for the Product

People who are genuinely interested in the product they are building are 5x better then people who are just as smart and capable who are only mildly interested in the product.  We have people like David Watanabe, Paul Alex Broman, Giju Eldhose, and Priya Vadivel who are passionate about Inquisitor and it shows in the effort and the end-product.

Details, Details, Details

The team’s passion to get every detail right no pixel or bug was considered too small - was incredibly satisfying.  It is so tempting to add more features, but until you nail every feature in your product which is really hard to do, don’t start implementing new ones. And if you don’t nail that feature or don’t believe you have to then you should strongly consider removing it from the product before it becomes dead weight.

Working with Firefox

Working with the Mozilla team during the development process  improved the end-product.  Folks like Basil Hashem, Rey Bango and Arch provided excellent suggestions and a thorough review of the product.  While people give the Apple AppStore and Firefox grief for their respective App and Extension rigorous approval processes it is actually a super-smart thing to do in the early years of a developer marketplace – Facebook is learning this the hard way.  In the early years, I say the more rigorous the better as it will quickly filter out the those developers who are committed to building great products from those simply looking to do something trivial.

So, whether you prefer IE, Firefox, or Safari — Inquisitor is now available on all these. If you are in the market for a faster and smart search experience from your browser check it out.

We are still in the early days of web search

Wednesday, March 7th, 2007

Tom Foremski asks some rhetorical questions regarding the state of web search -here - and while some of his points are inaccurate (e.g. Google is not encouraging your avg. publisher to upload content into Google base) his larger point is a fair one — search engines still need enjoy the help of humans (in this case publishers, webmasters, etc.).

Given how much room for innovation is left combined with the fact that there is still billions of dollars of ad spend that will come online in the next few years these calls and rumors of consolidation of the major players e.g. Microsoft buying Yahoo!, Google buying Yahoo!, Microsoft buying Google, and every other combination you can possibly think of are somewhat silly.