The Voice of Alexa – Skills I build, stuff I learn, and more…

Voice First: the one stop shop for Voice User Interface jobs

Blessed Twitter has led me to Voice First, that seems to be an employment agency that lists a ton of ultra interesting voice jobs.

If you’re UK or US based, you’ll surely find something for you.

(Cool logo, by the way!)

Amazon is worried you’re not using Alexa that much. Are you?

Two female users and smart home assistant. Smart home office main controlling hub. IoT technology, voice controlled assistant monitoring house concept. Vector isolated illustration.

Yup, it’s been a while. It doesn’t mean I haven’t been involved with voice, in fact, the opposite is true. But here we are!

This Christmas week, a memo from Amazon has emerged where they show their concern about the lack of engagement with its line of Echo smart speakers [Bloomberg, The Verge] and that is why Alexa is more talkative than ever, trying to give us more tips on how to obtain value from it.

Is it shocking to you to learn that between 15 to 25% of new users stop using Alexa altogether after just two weeks from its purchase?

I personally find it very useful in my day to day, but it seems I don’t differ much from the ordinary user, as these are my main use cases:

Set timers (especially when cooking).
Listen to Amazon Music Unlimited (here in Mexico we benefit from a 39 pesos monthly subscription, valid for one device only).
Ask general questions (when talking with friends and family and we need to check a fact, like the birthplace of someone famous, what something is made of, stuff like this).
Ask the date and the time.
Switch the lights on and off: the awesome Steren connected home devices.
[Mexico City specific use case] Check which vehicles aren’t allowed on the road today.

What do you use it for?

Alexa Conversations is here!

Photo by https://www.pexels.com/@jopwell — https://www.pexels.com/@jopwell

Trying to create natural conversations in your skill was very, very, VERY cumbersome. I would say humans don’t think straight -luckily as this makes us way more interesting!- and we certainly aren’t very comfortable following scripts unless we are call center operators and we are armed with the appropriate tools. And we are genius synonym finders as long as we’re not writing an essay, then we get writer’s block! Defining the utterances in your skill was a combinatorial exercise, with a sheer number of combinations. I guess every developer had a strategy to deal with it. For me it was a relational database containing the possible values for expressions, and some SQL queries that would create all the possible permutations and populate the required JSON files on the go, then paste it all in the Amazon Developer console.

Keeping track of status in the skill was also hard. Say, we need to gather information to populate three slots. We can ask for slot 1, then slot 2, then slot 3. Or we could ask for slot 1, then slot 3, then slot 2. Or, in this order, slots 2, 1, 3. Or slots 2, 3, 1. Or slots 3, 1, 2. Or slots 3, 2, 1. That’s six combinations. But also users could decide to give us the information about two slots unprompted, in a single utterance. So it could be 1,2 then 3, or 1, 3, then 2, or 2, 3, then 1. Or… Surely you get the idea: managing this by hand was too hard. The alternative was a fixed order in the dialog with the user, which could feel stiff for the free spirits, or boring and robotic if the skill is to be used frequently.

Dialog Management was quite a stepchange since the early days of Skill creation, nevertheless the way to leverage it was via coding.

It’s no surprise I was very happy to read about the July 22nd announcement of Alexa Conversations. As the name suggests, it places the conversation at the core of the Skill creator experience (note I did not say “developer”). The definition of what a Skill is gets clearer: a set of possible dialogs between Alexa and the user whereby enough information is gathered, then Alexa provides some useful functionality to the user.

The cornerstone of Skills development is the creation of possible dialogs, specified, well, as a dialog in a novel: in turns.

User: I want cookies.
Alexa: What type of cookie you want?
User: Round ones.
Alexa: Do you like cinnamon?
User: No, I hate it.
Then I will bake a batch of chocolate chip cookies. Are you very hungry?
User: Oh yes.
Alexa: Then I’ll bake 24 chocolate chip cookies.

You should create as many dialogs as different combinations of turns are possible (hint: not the order, though). Another possible dialog for the example above could be:

User: I would like to have 24 chocolate chip cookies.
Alexa: OK, I’ll bake 24 chocolate chip cookies.

Once you are happy with your dialogs, you need to annotate each turn: what role in the conversation does the turn have (is Alexa asking for parameters? Is the user providing information? Is the user confirming what Alexa is suggesting?), what are other ways that you expect the user to say the same thing (these are called Utterance sets), what are other ways for Alexa to say the same thing (so that the Skill doesn’t feel monotonous if it’s used frequently), etc.

The second key concept is that of an API definition. The API is the provider of the service rendered by Alexa and it will be implemented programmatically. The API definition is just its representation: what parameters it takes as input, what kind of data it returns as output, and so on.

To build the Skill, Amazon will use its computational power to “train the model”. This is the creation of a program that takes into account all the possible combinations of how a conversation could happen: both the dialog sequences and the language elements (vocabulary, synonyms, grammatically equivalent sentences, etc.). Here’s a comprehensive explanation of the AI techniques involved . Tens of thousands of permutations are typically created, and this way the probability of Alexa reacting correctly to user input is highly increased.

So the bulk of the work shifts from development to a friendly user interface where the designer can focus on the voice experience rather than excessive technicalities.

Development of the API itself becomes a simple business of receiving parameters, probably invoking a mature API that is already being used by other clients (web, apps…), and passing the result back to the Skill.

If you’re a one-woman band (my case), you greatly appreciate the speed in which you can bring your concept to reality. If you’re a company you can better leverage the skills of your associates by having dedicated roles. Psychologists, marketeers, product owners, almost anyone who is web savvy and is a subject matter expert can work on the dialog curation part of building a Skill.

MWC 2018: BMW Autonomous Car and Customer Experience

[This post is a bit off topic, but as cars are already an everyday object we talk to, I thought it would be of interest.]

I had read somewhere about Elon Musk’s digital transformation ideas around passenger transport and the Tesla car. I recall that he said that once the electric car and autonomous driving (level 5) is achieved, the concept of owning a car, what you do with the car, the use of that resource called your car, totally changes. And BMW are helping me understand that. This week, during the Mobile World Congress 2018, BMW have showcased the Customer Journey for their brave new autonomous vehicle. Video in: (short intro in Spanish, English from then on.)

Main takeaways (apart from seeing the steering wheel eerily turning itself):

There is a companion app.
Through the app you rendez-vous with the car, including instructions for you how to walk there.
You set the route and get in the car (in no particular order).
Sit back and relax!
There is an infotaiment system, you can use that, or the app, during your trip.
You can change your destination half way there. Obviously, the car won’t stop in its tracks as seen in the video, it will just smoothly re-calculate the route.
The car will drop you off at your destination, and it will go park itself.
You can lend your car to friends. They’ll need the app, and you as the owner will manage what you lend them the car foor from your app.

I feel so lucky to live in these times – I now know that when I become too old to drive (not that I drive much these days anyway!), that won’t hinder my mobility a iota.

It’s also easy to imagine a mashup of what I have just explained and what Uber / Lyft currently do. You can think of your car as what you use for your own transportation when you are using it, and the rest of the time, instead of sitting iddle inside a garage, the car can be out and about earning you some bucks.

That’s what Musk was talking about in that interview that stuck in my mind. And it’s great to know that there’ll be plenty of makes and models to choose from. Kudos to BMW for this!

MWC 2018: Google Assistant Demo

Alexa has a competitor: Google Assistant and the plethora of home pods that make it leave the boundaries of the phone’s not-so-good ambient mikes.

I was fortunate to attend a demo this morning at the Mobile World Congress 2018. You can watch the videos below.

My takeaways:

Amazon has one year time advantage over its competitors, but not more.
They offer practically the same functionality, but Alexa has been around for longer and has more skills (apps).
The only thing where Google Assistant has an advantage over the Amazon Echo family and Alexa is the fact that it can switch language on the fly, and French is already implemented.

It’s time to develop skills in a way that they are easily portable to other platforms.

Enjoy the videos!

Alexa is coming home for Christmas: available worldwide (>80 countries)

Good news! After the initial US release, followed by Germany, the UK and India, the family of Amazon Echo products can be purchased in over 80 countries. The languages supported are German and English, the latter with 3 different locales (US, Britain and India). Here’s the official news.

For Europe, the German Amazon store has a bargain Echo Dot 2nd generation for just 35 euros.

Time for me to review my skills, be sure multilanguage is well implemented, to be ready to add my own. Surely it’s around the corner 😉

Who am I? Alexa introduces Voice Profiles

Privacy within the privacy of your home is a concern for users of the Amazon Echo and of any other voice assistant, especially since skills that sync it with your personal accounts were made available. I am okay with my spouse checking my calendar, but I would not be so happy to mistake her appointments with mine! Voice assistants also bring out an an ages old problem that us tecchies detect very well, but others not so much: that of cardinality. An example: when you have one Echo (or multiple, linked Echos acting as one, if your home is bigger than mine!) but you don’t live alone, then it’s quite likely that more than one human will speak to Alexa. Why this one-to-many relationship between humans and machines represents a cardinality problem?

Let’s continue with the example. At home, my spouse and I use our Amazon Echo. We’re both non-English speakers and have distinct accents in English (we learnt the language in different continents). Our Echo sometimes goes crazy understanding one or the other. The Machine Learning element of Alexa must be very confused about supposedly the same human saying the same thing in such different ways at random moments in time! I bet Alexa would be happier if we could let her know that we’re two humans, if we could teach her to tell us apart, and then teach her to understand us better one by one.

If on top of having different voices and different accents, you wish to use individual services information (personal calendars, mail accounts…) then you need to be able to somehow link those individual services with your Echo devices – again, cardinality problem. Which one will Alexa use? Mine or my spouse’s? Why does it have to be only one? Can’t it be both?

Luckily, Amazon has just launched Voice Profiles to achieve this. You configure your Echo devices to pair with as many humans as needed. How? Through the Alexa app on your Smartphone. Here’s how:

The person whose Amazon account is linked with the Echo device must launch the Alexa app on their Smartphone, visit Settings -> Accounts -> Voice, and follow the instructions.
The second adult in the household must do the following:

When both of you are at home, launch the Alexa app on the primary user’s Smartphone.
Settings -> Accounts -> Household profile, and follow the instructions to set up this new user.
With any of your Smartphones, log on to the Alexa app with the credential of the second adult in the household.
Follow the instructions below.

Any other humans other than the primary account holder must do the following:

Install the Alexa app on your Smartphone if you haven’t done so.
Log in with your Amazon account (or create one if you’re not the second adult in the household).
Provide the info that’s required to pair up with the Echo device.
(you can skip Alexa calling and messaging if you don’t want to use that with your Echo).
Settings -> Accounts -> Voice, and follow the instructions.

Here’s the full instructions.

New generation of Alexa-enabled devices is here!

Last week, Amazon announced the next generation of voice-enabled devices (and tools for devs!). Here’s what we could learn from the official announcement and subsequent media coverage.

Echo Plus: Same form factor as the original Echo device, but enhanced in many ways. It will act as the control center for the home. It can manage over 100 IoT home devices “out of the box” and without the Bluetooth fuss. A simple “Alexa, find my devices” will get them all hooked up. The big question is, when will we start to hear about cheeky neighbours going all Poltergeist on your living room lights, or worse?

Echo new generation: Same functionality of the original Echo device, but smaller, and covered in cloth (different colors). It will sell for $99, according to The Verge.

Echo Spot: Finally! Some years ago I fell in love with a device/idea called Chumby. It was some sort of potato shaped, Internet-enabled alarm clock. Sadly (or not!) I never got one. Echo Spot will fill that gap in my life. A device slightly bigger than a baseball with a nice screen that you can talk to, that can wake you up.

I foresee the Echo Spot being the bestseller of the 3. So for us devs, this means we must enhance our Skills with visual functionality (a.k.a. cards).

Wilkommen! Amazon Echo and Alexa now speak “the Queen’s English”… and German!

In September 2016 (time flies…), Amazon announced that the Amazon Echo and therefore Amazon Alexa would be made available in the UK and in Germany.

One would think that this would affect two geographic areas and only one language, but nothing further from the truth. Trying to make Alexa understand Geordie or Scouse makes Deutschsprache crystal clear.

So, from now on, there are three languages that you should consider when you define your skill: English (US), English (GB) and German.

It’s very important that you realise that Geography and Language are different things, and you have to make decisions on both areas. I.e. you can publish a Skill in Germany in English (US) and German, or you can decide that your Skill won’t apply to expats and you want to publish it for Germany and in German only. When you define the Interaction Model, you define as many models as languages you wish to implement. When you provide publishing information, you decide on the geography.

In our next post we will solve the following riddles: What happens to my “functionality”, do I need to create one version per language (hint: don’t do it!!!). What are the implications of limiting my Skill to a certain geography? Then we will write a bit about predefined Slot types and multilanguage implications.

Amazon Lex, the beating heart of aLEXa, opened for conversational bot creation

Amazon Lex Logo

This morning I finally got my invitation for the beta/preview program of Amazon LEX, the heart of Alexa’s voice recognition system.

I am just browsing though the documentation, so bear with me, but it looks very exciting. There are lots of concepts that will be familiar to any Alexa skills developer, especially around the interaction definition area. Some other are brand new.

Hope to have a bot up and running in the upcoming weeks. I’ll keep you posted!