Build for the User, not the Device

Creating a conversational app from scratch is like building a visual app pixel-by-pixel. Sinking time and effort into low-level details like rectangles and coordinates is no way to build an engaging visual application, and the same argument holds for a conversational one. SayKit takes care of the details, freeing you to take care of the experience.

Go beyond Question and Answer

Conversational apps should involve more than just answering questions. The SayKit SDK manages the state of a conversation between the user and the app, enabling you to build voice experiences that go deeper.

Feature Details

Intent Recognition

The ability to define your own actions and detect parameters is the most basic feature in creating a voice-based experience. For example, in the text “I’m looking for peanut butter”, intent recognition allows you to assign “I’m looking for” to an action in the app (in this case, a product search). Entity extraction enables you to separate “peanut butter” from the intent so you can use it to specialize your action (perform a search for “peanut butter”).

Always Listening Mode

The ability to speak to an app without using your hands opens up the possibility for many powerful applications that might not otherwise be possible.


  • Recipe and cooking app (in the kitchen while cooking)
  • News reading app (in the car while driving)
  • Checklist app (in a warehouse while moving boxes)

Dialogue Management

Dialogue Management allows an application to chain a sequence of interactions together. This is most commonly used to clarify spoken input, but it can also be used to simplify complex interactions.


  • “I’d like to fly to New York on Friday.
    ”“That is Friday 12/18. Correct?”
  • “I’m looking for bread.”
    “What type of bread are you looking for?”
    “I’m looking for White bread.”

GUI Syncing

Syncing the conversational state of the app with the GUI, without creating an entirely separate application flow (one for the conversational part of the app and one for the visual) can be challenging. Since visual information is expressed instantaneously and audio information is expressed over time, syncing the two modalities requires a lot of maintenance. Moreover, specifying code and content for visual and audio presentation is very repetitive, which makes software less maintainable. SayKit helps smooth over both of these problems.

Client Side State Management

When it comes down to it, creating the voice experience for an app is simply creating its User Interface (UI). When you build an app, the graphical elements are defined in code or as assets stored on the device. They are not loaded from a server. For example, no mobile or web frameworks would ask the developer to contact a server to determine if a user touch was a single or double tap. However, that is not the case with voice-based frameworks.


Mirroring application state or structure on a server can become problematic, resulting in either unmanageable applications or limiting the UI to shallow interactions. SayKit allows a developer to sidestep this problem by keeping application state local.