Why (& How) We're Building Toucan

We’re building Toucan AI because we think that the medium of conversation deserves a lot more respect than it gets.

Conversation is, according to some, the key component that makes humans unique1. It’s a fundamental part of every person’s daily experience, whether it takes the shape of speech, text, sign language, braille, or the voice in the back of your head that encourages you to eat just one more donut. A baby’s first words are so momentous because they signal a first step into a global community, defined by the ability to make one’s thoughts and opinions known to the world. Over the centuries, despite taking many forms, language has remained the most natural, most effective, and most universal means by which individuals express their thoughts.

Unfortunately, companies are having a hard time listening.

There’s no doubt that conversation is an increasingly hot topic in tech and marketing circles, and buzzwords like “conversational commerce” or “conversational marketing” are on the tips of everyone’s tongues. Given the hype, you’d almost believe that every chatbot or smart-speaker device was truly C3PO in the flesh (in the silicon?). Take one for a spin, though, and you’ll probably be pretty disappointed2. Examples range from pointless gimmicks at worst to incrementally-loaded forms at best. That’s not to say they’re all useless–I definitely appreciate using my Google Home to turn off my lights at night, and I admire the slick on-boarding flows that chatbots can faciliate on certain websites. At the end of the day, though, they all feel more like a glorified menu than an actual conversation. I’m never actually being understood by the bot that I’m speaking to–I’m just using my voice or my keyboard to choose between a handful of pre-defined options that some developer hard-coded beforehand3. Conversations offer a unique window into the minds of consumers, but pre-scripted dialog flows, even when painstakingly crafted to be as exhaustive as possible, largely fail to take advantage of this opportunity. When a bot begins its life with a pre-programmed dialogue it needs to get through, it ends up as the digital equivalent of that guy who corners you at a party and insists on talking about himself for 30 minutes. It’s an experience that neglects the things that make conversation meaningful, and one that typically isn’t nearly as helpful as companies seem to believe4.

At Toucan AI, we’d rather listen. That commitment has led to a pretty radically different take on conversation.

First, Do No Harm

We began with the principle that our product must, first-and-foremost, add value. Far too many examples of AI conversation seem to exist just for the sake of keeping up with the latest marketing trends, and they’re consistently frustrating (or ignored). We wanted Toucan to be genuinely helpful, both to the companies that use our platform and to the end-users or consumers that they serve. So we searched out to the people, from CEOs to interns, who’d been tirelessly staffing those omnipresent live-chat widgets. We asked them where they felt they’d been adding the most value. Then we went and got the other side of the story, finding consumers who’d interacted with live-chat widgets and asking what they felt created the most enjoyable and worthwhile experience. This might not come as a surprise, but it turns out that one of the most valuable uses of conversation is in helping people make decisions, especially about what to buy. There’s a reason every store you walk in to has staff waiting to help; throwing a hundred choices in front of a consumer and asking them to pick which ones they want just doesn’t cut it. There’s also a reason websites, apps, and smart-speakers haven’t adopted this approach: live-chat is almost impossible to scale, and chatbots following scripts can’t hope to address every possible customer query. To Arjun and me, this seemed like the perfect opportunity to build a brand new kind of conversational AI.

Ask, Answer, Amend

The foundation of our AI is a framework we refer to as Ask, Answer, & Amend. We came up with this framework after poring over hours worth of chat transcripts between live agents and customers. We paid attention to which conversations were successful, which ones weren’t, and what they all had in common. At scale, we noticed that effective conversations (i.e. the ones that resulted in satisfied shoppers) tended to alternate between 3 key Dialogue Acts:

  1. Asking questions: The shopper asks about a detail regarding a product/offering (“How many come in each pack?”), or the live agent asks a question that helps the shopper more effectively compare some options (“Do you prefer a calm environment, or somewhere with a lively nightlife?”).

  2. Answering questions: the live agent provides the shopper with specific information that helps with decision making (“This one has 3 per pack”), or the shopper provides an answer that helps the live agent better understand their needs (“I’d want the one with a good party scene”).

  3. Expressing preferences: The shopper expresses a preference, either one that is specific to the decision being made (“I want a really cheap projector” or “Which projectors are relatively cheap but have a high resolution?”) or one that is a general statement (“I love watching sports and action films”). In either case, the preference makes the live-agent amend their profile of the customer, and then make new suggestions that incorporate this new information.

Over and over again, conversations that followed this pattern resulted in customers who felt appreciative, informed, and ready to make the purchase that’s perfect for them. No choice paralysis, and a dramatically lower bounce rate. With Toucan AI, we seek to recreate this experience in away that’s incredibly easy to set up, completely automated, and accessible to companies of any scale.

Know What I Mean?

When a user says something to a typical chatbot, the only “AI” that occurs is in matching what the user said to the corresponding item in the list of things the bot can respond to. It’s an approach that works adequately for building Siri, which can carry out a fixed number of tasks, but there’s no way to make an exhaustive list of every question or preference an online shopper might express. Furthermore, we didn’t want our system to be limited to companies with the resources to build and maintain complex scripts for chatbots to follow; we wanted something that would work well for everyone. Most importantly, we wanted a system that listened to and understood what shoppers were saying, rather than just looking for a match in a pre-written script. So, we turned to deep learning.

Arjun and I have been doing research in the field of Natural Language Processing for several years, and we’ve witnessed first-hand the massive impact that deep learning architectures have had on the field. These architectures, combined with concepts like transfer learning and language modeling, are giving computers the ability to actually comprehend language, in a sense (blog posts that dive into these topics with more technical depth are coming soon!). They allow machines to reason about concepts and to read and understand text in ways that weren’t possible just a few years ago. They even allow for the incorporation of world knowledge, so that, for example, our AI can tell that a vacation to the Sierra Nevada needs a hiking boot while a trip to Miami is better suited by a pair of sandals. We take advantage of these properties heavily at Toucan AI – they’re the basis for almost every model we create. Building on this base of machine comprehension, we’ve been constructing artificial neural networks that can handle all the sub-tasks of the Ask, Answer, Amend framework. This includes things like identifying which products best match a preference that the shopper expresses, or telling what product facts or features can answer a specific question. It also involves lower-level tasks at the individual sentence level, like recognizing when a sentence has multiple preferences contained within it and deciding how to deal with them, or like the task of Dialog Act Recognition that I wrote about in my last blog post.

An Information-First Approach

These models allow us to take a unique information-first approach. While most bots and virtual assistants are created by specifying exactly what they can say, Toucan’s AI agents are created by specifying what they need to know. The level of semantic understanding that we utilize in our models allows us to take only the existing information that a company has about its products (e.g. product descriptions in a product catalog) and instantly make it conversationally accessible. No restrictive scripts, and no burdensome data requirements–our AI reads what you already have, and understands it well enough to converse about it with consumers. This means that our AI can let the user guide the conversation, serving as a source of quick and accurate information when needed rather than an annoyance.

We’re also making it super easy to integrate what happens within a Toucan AI conversation with what’s going on in your site: as the AI agent amends its suggestions and gets better at figuring out what you’re looking for, we send you events that you can use to dynamically update the product listing accordingly. This all combines to create an experience that lets shoppers take the reigns and express themselves in the most natural way, one that leaves them feeling satisfied that they found exactly what they were looking for. It also lets companies genuinely listen to what their customers have to say, and to learn about what they want, what they couldn’t find, and what left them thrilled–A truly two-way interaction that facilitates genuine expression.

To us at Toucan AI, that’s what conversation is all about.

- Vishnu