Blessed Twitter has led me to Voice First, that seems to be an employment agency that lists a ton of ultra interesting voice jobs.
If you’re UK or US based, you’ll surely find something for you.
(Cool logo, by the way!)
Yup, it’s been a while. It doesn’t mean I haven’t been involved with voice, in fact, the opposite is true. But here we are!
This Christmas week, a memo from Amazon has emerged where they show their concern about the lack of engagement with its line of Echo smart speakers [Bloomberg, The Verge] and that is why Alexa is more talkative than ever, trying to give us more tips on how to obtain value from it.
Is it shocking to you to learn that between 15 to 25% of new users stop using Alexa altogether after just two weeks from its purchase?
I personally find it very useful in my day to day, but it seems I don’t differ much from the ordinary user, as these are my main use cases:
What do you use it for?
Trying to create natural conversations in your skill was very, very, VERY cumbersome. I would say humans don’t think straight -luckily as this makes us way more interesting!- and we certainly aren’t very comfortable following scripts unless we are call center operators and we are armed with the appropriate tools. And we are genius synonym finders as long as we’re not writing an essay, then we get writer’s block! Defining the utterances in your skill was a combinatorial exercise, with a sheer number of combinations. I guess every developer had a strategy to deal with it. For me it was a relational database containing the possible values for expressions, and some SQL queries that would create all the possible permutations and populate the required JSON files on the go, then paste it all in the Amazon Developer console.
Keeping track of status in the skill was also hard. Say, we need to gather information to populate three slots. We can ask for slot 1, then slot 2, then slot 3. Or we could ask for slot 1, then slot 3, then slot 2. Or, in this order, slots 2, 1, 3. Or slots 2, 3, 1. Or slots 3, 1, 2. Or slots 3, 2, 1. That’s six combinations. But also users could decide to give us the information about two slots unprompted, in a single utterance. So it could be 1,2 then 3, or 1, 3, then 2, or 2, 3, then 1. Or… Surely you get the idea: managing this by hand was too hard. The alternative was a fixed order in the dialog with the user, which could feel stiff for the free spirits, or boring and robotic if the skill is to be used frequently.
Dialog Management was quite a stepchange since the early days of Skill creation, nevertheless the way to leverage it was via coding.
It’s no surprise I was very happy to read about the July 22nd announcement of Alexa Conversations. As the name suggests, it places the conversation at the core of the Skill creator experience (note I did not say “developer”). The definition of what a Skill is gets clearer: a set of possible dialogs between Alexa and the user whereby enough information is gathered, then Alexa provides some useful functionality to the user.
The cornerstone of Skills development is the creation of possible dialogs, specified, well, as a dialog in a novel: in turns.
You should create as many dialogs as different combinations of turns are possible (hint: not the order, though). Another possible dialog for the example above could be:
Once you are happy with your dialogs, you need to annotate each turn: what role in the conversation does the turn have (is Alexa asking for parameters? Is the user providing information? Is the user confirming what Alexa is suggesting?), what are other ways that you expect the user to say the same thing (these are called Utterance sets), what are other ways for Alexa to say the same thing (so that the Skill doesn’t feel monotonous if it’s used frequently), etc.
The second key concept is that of an API definition. The API is the provider of the service rendered by Alexa and it will be implemented programmatically. The API definition is just its representation: what parameters it takes as input, what kind of data it returns as output, and so on.
To build the Skill, Amazon will use its computational power to “train the model”. This is the creation of a program that takes into account all the possible combinations of how a conversation could happen: both the dialog sequences and the language elements (vocabulary, synonyms, grammatically equivalent sentences, etc.). Here’s a comprehensive explanation of the AI techniques involved . Tens of thousands of permutations are typically created, and this way the probability of Alexa reacting correctly to user input is highly increased.
So the bulk of the work shifts from development to a friendly user interface where the designer can focus on the voice experience rather than excessive technicalities.
Development of the API itself becomes a simple business of receiving parameters, probably invoking a mature API that is already being used by other clients (web, apps…), and passing the result back to the Skill.
If you’re a one-woman band (my case), you greatly appreciate the speed in which you can bring your concept to reality. If you’re a company you can better leverage the skills of your associates by having dedicated roles. Psychologists, marketeers, product owners, almost anyone who is web savvy and is a subject matter expert can work on the dialog curation part of building a Skill.
[This post is a bit off topic, but as cars are already an everyday object we talk to, I thought it would be of interest.]
I had read somewhere about Elon Musk’s digital transformation ideas around passenger transport and the Tesla car. I recall that he said that once the electric car and autonomous driving (level 5) is achieved, the concept of owning a car, what you do with the car, the use of that resource called your car, totally changes. And BMW are helping me understand that. This week, during the Mobile World Congress 2018, BMW have showcased the Customer Journey for their brave new autonomous vehicle. Video in: (short intro in Spanish, English from then on.)
Main takeaways (apart from seeing the steering wheel eerily turning itself):
I feel so lucky to live in these times – I now know that when I become too old to drive (not that I drive much these days anyway!), that won’t hinder my mobility a iota.
It’s also easy to imagine a mashup of what I have just explained and what Uber / Lyft currently do. You can think of your car as what you use for your own transportation when you are using it, and the rest of the time, instead of sitting iddle inside a garage, the car can be out and about earning you some bucks.
That’s what Musk was talking about in that interview that stuck in my mind. And it’s great to know that there’ll be plenty of makes and models to choose from. Kudos to BMW for this!
Alexa has a competitor: Google Assistant and the plethora of home pods that make it leave the boundaries of the phone’s not-so-good ambient mikes.
I was fortunate to attend a demo this morning at the Mobile World Congress 2018. You can watch the videos below.
It’s time to develop skills in a way that they are easily portable to other platforms.
Enjoy the videos!
Good news! After the initial US release, followed by Germany, the UK and India, the family of Amazon Echo products can be purchased in over 80 countries. The languages supported are German and English, the latter with 3 different locales (US, Britain and India). Here’s the official news.
For Europe, the German Amazon store has a bargain Echo Dot 2nd generation for just 35 euros.
Time for me to review my skills, be sure multilanguage is well implemented, to be ready to add my own. Surely it’s around the corner 😉
Privacy within the privacy of your home is a concern for users of the Amazon Echo and of any other voice assistant, especially since skills that sync it with your personal accounts were made available. I am okay with my spouse checking my calendar, but I would not be so happy to mistake her appointments with mine! Voice assistants also bring out an an ages old problem that us tecchies detect very well, but others not so much: that of cardinality. An example: when you have one Echo (or multiple, linked Echos acting as one, if your home is bigger than mine!) but you don’t live alone, then it’s quite likely that more than one human will speak to Alexa. Why this one-to-many relationship between humans and machines represents a cardinality problem?
Let’s continue with the example. At home, my spouse and I use our Amazon Echo. We’re both non-English speakers and have distinct accents in English (we learnt the language in different continents). Our Echo sometimes goes crazy understanding one or the other. The Machine Learning element of Alexa must be very confused about supposedly the same human saying the same thing in such different ways at random moments in time! I bet Alexa would be happier if we could let her know that we’re two humans, if we could teach her to tell us apart, and then teach her to understand us better one by one.
If on top of having different voices and different accents, you wish to use individual services information (personal calendars, mail accounts…) then you need to be able to somehow link those individual services with your Echo devices – again, cardinality problem. Which one will Alexa use? Mine or my spouse’s? Why does it have to be only one? Can’t it be both?
Luckily, Amazon has just launched Voice Profiles to achieve this. You configure your Echo devices to pair with as many humans as needed. How? Through the Alexa app on your Smartphone. Here’s how:
Here’s the full instructions.
Last week, Amazon announced the next generation of voice-enabled devices (and tools for devs!). Here’s what we could learn from the official announcement and subsequent media coverage.
Echo Plus: Same form factor as the original Echo device, but enhanced in many ways. It will act as the control center for the home. It can manage over 100 IoT home devices “out of the box” and without the Bluetooth fuss. A simple “Alexa, find my devices” will get them all hooked up. The big question is, when will we start to hear about cheeky neighbours going all Poltergeist on your living room lights, or worse?
Echo new generation: Same functionality of the original Echo device, but smaller, and covered in cloth (different colors). It will sell for $99, according to The Verge.
Echo Spot: Finally! Some years ago I fell in love with a device/idea called Chumby. It was some sort of potato shaped, Internet-enabled alarm clock. Sadly (or not!) I never got one. Echo Spot will fill that gap in my life. A device slightly bigger than a baseball with a nice screen that you can talk to, that can wake you up.
I foresee the Echo Spot being the bestseller of the 3. So for us devs, this means we must enhance our Skills with visual functionality (a.k.a. cards).
In September 2016 (time flies…), Amazon announced that the Amazon Echo and therefore Amazon Alexa would be made available in the UK and in Germany.
One would think that this would affect two geographic areas and only one language, but nothing further from the truth. Trying to make Alexa understand Geordie or Scouse makes Deutschsprache crystal clear.
So, from now on, there are three languages that you should consider when you define your skill: English (US), English (GB) and German.
It’s very important that you realise that Geography and Language are different things, and you have to make decisions on both areas. I.e. you can publish a Skill in Germany in English (US) and German, or you can decide that your Skill won’t apply to expats and you want to publish it for Germany and in German only. When you define the Interaction Model, you define as many models as languages you wish to implement. When you provide publishing information, you decide on the geography.
In our next post we will solve the following riddles: What happens to my “functionality”, do I need to create one version per language (hint: don’t do it!!!). What are the implications of limiting my Skill to a certain geography? Then we will write a bit about predefined Slot types and multilanguage implications.
This morning I finally got my invitation for the beta/preview program of Amazon LEX, the heart of Alexa’s voice recognition system.
I am just browsing though the documentation, so bear with me, but it looks very exciting. There are lots of concepts that will be familiar to any Alexa skills developer, especially around the interaction definition area. Some other are brand new.
Hope to have a bot up and running in the upcoming weeks. I’ll keep you posted!