make voice culture

Chop Chop

A handsfree kitchen companion serving up fun, easy to follow video tutorials on chopping fresh produce. From the basics to the exotics, Chop Chop’s step-by-step instructional videos will have you slicing and dicing in no time.

invocation name chop chop
release date may 2018
category food & drink
catalog # bon016
product page 🇺🇸

The kitchen is ripe for voice. Ever struggled to TouchID unlock an iPhone with food on your hands that went to sleep right before you needed step 3 of a recipe? The burners already on, better hurry! So you peck in your passcode with your knuckle only to be greeted by an email list popup overlay from some janky recipe site. Chop Chop isn’t a recipe skill, but it emerged from a desire for reduced friction with technology in the kitchen.
Since your hands are tied up while you’re cooking, voice interaction makes a lot of sense. But the addition of a screen can be especially useful in the kitchen. In this case, a quick visual cue about how to chop a mysterious piece of produce might be all you need to get an injection of confidence and some direction so you can keep things moving with your cooking. Reading or hearing about how to chop something from a recipe can feel super abstract. But when you see it, it just clicks.

native content creation
Chop Chop is an exploration of native voice content creation. There are loads of how-to videos on YouTube from various channels showing how to chop all sorts of things. But you need to search around quite a bit. And once you find a relevant video, you still have to get past the ad and scrub through the intro and host chatter to find what you need. This is super inefficient when you’re cooking. Chop Chop strips all of that away. It’s designed to give you exactly what you need, when you need it. There are no menus. No categories. And intentionally minimal navigation. “Hello, what would you like to chop?”. If there’s a match, a video immediately begins to play.

audience engagement

If Chop Chop doesn’t find a match, users are prompted to submit a chop request at Featuring 40+ original videos at launch, we’ll continue adding new fruits and veggies to the catalog every Friday. The intention here, beyond creating a dynamic skill with fresh content, is to empower users with a feeling that their voice is being heard and they can directly influence the evolution of the skill. If someone requests a cherimoya, we’ll go find a cherimoya and chop it up. And hopefully they’ll find it next time they return to the skill.

technical development

Chop Chop was built using the Alexa Skills Kit (V1) for node.js, leveraging Entity Resolution to disambiguate synonyms in the voice model, DisplayDirective to identify multi-modal Echo devices, BodyTemplate7 for the splash screen and VideoApp for playback. The backend is hosted in AWS Lambda.

media production

Video was shot on iPhone 7+ in 4K (although it nearly melted 🔥🚒) and edited in Final Cut Pro. Logos and images were created in Pixelmator. Alexa dialogue within the video assets was generated in the Develope Console testing simulator, captured by Audio Hijack, trimmed & normalized in Fission (Rogue Amoeba FTW!) and imported into FCPX for final assembly.

Even though Echo Show and Echo Spot have different screen shapes and resolutions, they’re powered by shared media assets. So to ensure a good experience for Spot users, we had to frame our shots and edits with a strong center of gravity and maintain a consistent safe zone, especially when using graphics or text. Final exports were only 720p, but shooting in 4K ensured we had enough headroom to push in on shots as needed. This took quite a bit of trial and error. If you’re serious about multi-modal skill development, you can’t rely on the simulators. Testing on devices is mission critical.

Bringing this skill to life required concurrent creative and technical development. We piloted the video format and production process quite a bit before moving into full production. Creating native voice content is like nothing I’ve ever worked on because the creative development informs the technical development which informs the creative development and around and around we go. In most creative mediums, the media is locked and then moves down the conveyor belt to its various distribution platforms. But with voice, the two are inseparably intertwined from creation to consumption.

On the technical side, my initial skill architecture didn’t account for the ability to add new video content without the needing to re-certify the skill. In this first iteration, I generated a long list of every conceivable type of produce that we didn’t have and made them all synonyms of a single slot value called “Missing Produce.” So any ER-Match to this group would resolve to a message soliciting suggestions for #FlavorFriday. In my second iteration, I moved all of these synonyms to the top level of the voice model and instead handle them with an if/else statement directed by a match between the user’s spoken slot value and the available video assets in the media array. This way, I can simply drop a new video into the Lambda function and our voice model is already built & certified to accommodate the slot value. I’ve only just scratched the surface of Entity Resolution – there’s much more to explore here!
︎ ︎
Bondad, LLC 2020
Little Bangladesh, Los Angeles