December, 2008

...now browsing by month

 

Enterprise vs. Consumer Products II: Managing Different Cuisines

Tuesday, December 30th, 2008

Continuing the series on managing enterprise vs consumer software one of the most significant changes a product manager needs to quickly grasp is the notion that great consumer services have a machine learning component to them while most enterprise systems are deterministic by design. This is important as these are two very different types of cuisine which require different one to change their mindset and optimization priorities. Yet, once one become facile with both there are opportunities to take elements of each and infuse them into each other.

What’s The Difference

Lets briefly define machine learning and deterministic systems. While there are examples of these types of systems in aviation and network systems I will confine this definition to the software application domain. Machine learning systems learn from the input of users and automatically correct themselves with limited to no human intervention. On day one they are not perfect, but a well designed one with a positive feedback loop will continuously improve. Web search is a good example of a machine learning system, it leverages implicit actions like clicks, time-spent, query refinements and re-ranks the results (both paid and algorithmic results) based on these implicit signals. The set of results (output) for a given query (input) will change over-time as the system weeds out the less relevant results.

Whereas deterministic systems execute a defined process, any modifications to the process require changes in the underlying product. For example, an order entry system for cable TV service takes an expected input from the user (address, cable package, installation time-frame, etc) processes it and returns the time of installation, confirmation number. As a product manager designing or working on the implementation side of an enterprise system even a minor error in a business rule acting on a data field can cause significant harm downstream so one rightfully becomes paranoid of data integrity issues.

Most enterprise applications optimize for accuracy and precision. Each year Comcast processes millions of orders – everything from a simple new service order to a more complicated change service order. An order entry error rate of even 2% costs will cost Comcast hundreds of millions of dollars as trucks roll to the wrong address or at the wrong time. Each order must capture a very specific set of data in a specific format (i.e. high accuracy), send the data to various downstream systems (billing, scheduling, network provisioning) and repeat this exact process millions of times a year (i.e. high precision).

Now contrast that with a web search engine which is an example of a machine learning system. Not withstanding the significant improvement in web search a user’s query returns hundreds of thousands of results, and of these thousands of results only the first ten or so are relevant to the their intent – clearly search is ripe for move innovation. Whereas deterministic enterprise systems are meant to handle consistent inputs and repeatable tasks machine learning systems such as a web search engine are meant to handle unique inputs and ambiguous intent. More specifically, 25% of web-search queries are unique – i.e. the search engine has never seen that query before. Furthermore, the user’s intent is often times highly ambiguous e.g. “lions fight” is the user looking for a recent fight at the Detroit Lions game or are they interested in understanding how lions fight with one another.

Infusion

So, with knowledge across these very different product “cuisines” how can a product manager with knowledge and experience across both these “cuisines” infuse elements of one into the other Simply put, we can bring machine learning techniques into the enterprise world to build better enterprise application and vice versa. Lets look at two examples.

Case I:

Smart Drop-Down Menus come to Web Search

As established above one of the advantages that enterprise systems have is consistent input. Obviously if a search engine knew every possible query a user could input the results would be perfect. While that is not possible at least for now, we can improve the input on two levels — by reducing query uniqueness and ambiguity. A little over a year ago Yahoo! launched SearchAssist. It works as follows, as the user begins to type their query the SearchAssist technology engages and gently drops down an assistance tray of potential similar queries. The user can either select one of the query suggestions from the drop-down tray or continue typing. Provided that the query suggestion worked this helps users clarify their intent (i.e. reduces ambiguity), provides a more predictable set of query patterns (user is likely to select from existing set of queries that are presented), and saves users some time (hitting enter is faster then typing seven or eight additional characters). Extending our analogy above, in many ways this is similar to a drop-down menu on an order entry form for Comcast cable service.

Case II:

Building Robust CRM Data Sets from Unstructured Email Data

Pattern recognition and machine learning are hallmarks of a web search systems. For example, once a web crawler downloads a web-page extractors identify web-page design elements that help it separate the header/footer and navigational elements of the page from the content, product description and price, amongst others. With a large enough training set the machine can start to detect these patterns accurately. Making sense of unstructured content (services like Dapper are simplifying this for all of us) is an essential element of building a great search engine – the better the search engine understands each piece of data on a page the better the search engine.

Infusing some of these techniques into enterprise systems can significantly improve data freshness and quality. CRM sales systems are notorious for their lack of data — unless sales executives prod their sales reps with a stick or carrot they rarely use these software tools, and when ultimately forced to do so, they enter the minimum set of data to be compliant. Want to know how many product issues a customer is having or the status of a renewal contract; this valuable yet unstructured data sits silo-ed in email and attachments.

What we want to develop is a tool (which I will refer to as the “DataGenie”) which crawls all sales reps email data, extracts the valuable data, and generates new data in the CRM sales system. Extracting this unstructured data is complicated, but there is some low hanging fruit to start with — data elements such as the name, role, email address, dates, priority and subject are all formatted data elements that can be easily pulled from email messages. Now, in decreasing order of data detection accuracy lets supplement it with richer data sets:

Detecting Addresses and phone numbers Consumer Mail applications like Yahoo! and Gmail already detect these data types, and its accuracy is reliable. If this data is then validated against the user’s contact address book or more generally the companies internal CRM address book.

Events and milestones

Lets look at a few examples of things we can expect to see in email threads which can be detected fairly reliably and mail services like Yahoo! Mail are doing so.

  • product demonstration next Tuesday at 10AM in our offices”
  • all RFPs will be due on Friday December 19th by 5PM PST”

Deriving Issue Type:

One can auto-generate dictionaries from the companies website. For example, for a refrigeration company these would include terms like “technical account manager” “24/7 support” and product names. Leveraging these dictionaries the detectors can determine what product is under consideration and whether it is a sales or product/technical issue.

Building Priority via Sentiment Analysis

Given that users tend to misuse the priority setting on emails there are other ways to determine priority from emails. Sentiment analysis technologies can detect the tone of the message based on the use of character types (bold, exclamation points) and keywords (unacceptable, failure, etc.).

Their tends to be a fair number of false positives (e.g. “49ers really suck this year… horrible QB”) may register as , but this technology is improving as startups like BuzzLogic and BlogPulse experiment with companies like P&G and ConAgra Foods are looking to sentiment analysis techniques to consumer response to their brands in blogs and message boards.

Once “DataGenie” extracts and populates the data here is what it would look like to a user of the CRM sales system:

DataGenie

On day one, the data generated by “DataGenie” will not be perfect, yet its an improvement over the status-quo of limited and stale data. So, how do we improve the data with some fairly simple positive feedback loops. Using the simple controls such as edit, delete, add, or the absence of any actions can provide important hints. Lets see how we can interpret these action if the user…

  • Adds to the record then the underlying data is solid — we can assume that “DataGenie”
  • Edits data elements (Events + Milestones) then reliability is low. With enough edits on certain data elements and the before and after we can pick up patterns. For example, the extractor may not be truncating important event or milestone data.
  • Deletes a data element within the record – data may not be associated properly. For example, the events & milestones data is not associated to this contract renewal issue. Why bother editing the data when the entire thing is wrong.
  • Takes no action. Depends on the overall level of user engagement, for a heavy user (lots of delete and edit actions) the absence of any action could mean that the data is reliable.

To the best of my knowledge “DataGenie” does not exist – if you are aware of a product that does this or something similar drop a comment below.

These are just two of the many ways in which a product manager can take their learnings from the enterprise world (highly deterministic systems) and apply them in the consumer software space (bias for machine learning systems) and vice-versa. If you have other interesting examples please share.