Voice is the future - ask anyone. The Echo, Amazon's internet-connected speaker, became a sleeper hit when Google and Apple should have had the smart home sewn up, and what some perceived to be a Trojan horse for getting Amazon's storefront into our living rooms turned into a bigger attack at the smart home.
Sales of the Echo rocketed through 2016 and last month the platform topped 7,000 skills. For the uninitiated, skills are the the tricks you can teach the Echo assistant Alexa to perform, be it a weather update, a news flash, some facts about past Presidents, or switching on your Philips Hue light bulbs.
Alexa has attracted a gamut of developers who see a future in voice-controlled technology, but building for voice is different. The good news is that you can make a basic skill today with no coding knowledge; just a head for problem solving, a spare weekend and a dollop of patience. But whether or not you want to build, you may have a few questions about how Alexa does the things she does. How long does it take to make a skill? Can anyone do it? How does Amazon stop "bad" skills getting through?
And of course, can you make Alexa swear?
Building a skill - it ain't so tricky
The natural language processing bit of skill-building is handled on Amazon's side, so all developers have to do is provide the information that tells Alexa what to listen for and how she should respond.
"What you supply to Amazon as a developer is basically your voice interaction model," says Baq Haidri, a software engineer who developed 60dB's Alexa skill. "That's stuff like saying things like 'play', 'next', 'resume'. You supply Amazon with a file that says 'Here's an input the user can say'."
Then Amazon takes care of the rest - reviewing, deploying and scaling your skill. Yes, you have to provide code for anything to work, but there are a lot of templates provided by Amazon and other developers to get you started.
Read this: The best Alexa Echo skills
So I tried it myself, and as it turns out, building and publishing a skill is easier than I initially expected. Ideally you want to have some coding knowledge, but as I quickly discovered, even without it you can still customise template code by applying a little common sense. You'll also find plenty of step-by-step guides for skill building, and I'd advise signing up to Github, a code hosting community where developers dump loads of skills, too.
Making a skill is formed of two parts: you need to build the code and host it somewhere; then you need to make the skill and point it at said code. The first part is the trickier bit, but Amazon provides a platform on its Amazon Web Service called Lambda that lets you host code on its servers instead of your own. It supports a variety of coding languages including Java, C# and Python, but again, if you don't know any of these, don't worry - you can still have some fun.
Second, you need an account on Amazon's Skills Portal. This is where you add a skill and point it at the code you put into Lambda. It might sound complicated, but on my first run-up it took me about 15 minutes to get everything in place.
My skill was definitely vulgar, but only in its unabashed uselessness
If you've used an Echo, you might have tried a skill that throws out a fact of the day, or gives you a joke when prompted. These kinds of skills are easy to make by reworking some of the basic fact-skill templates found around the web. I studied a little C# coding back in school, but truth be told it escapes me now, so I was essentially code illiterate. Nonetheless I wanted to make a skill called Hugh Facts that would relay a series of totally solid, not-at-all embellished info bites. So I hunted down a basic template and put it in Lambda.
Once you have your skill pointing at your code, you need to think about what kinds of words people should say to interact with it. This is important, because when it comes to speaking to AI we want it to feel as natural as possible.
"When you say something to Alexa, it's passing a bucket of information, such as what the user says or what it interprets the user says," says Justin Kovac, developer of the 7-Minute Workout skill and prior technical program manager for Alexa at Amazon. "It translates that into different buckets called intents."
This could be something as simple as saying "yes". Kovac again: "Think of all the different ways the user can say yes - yes, yuhuh, let's go, sure. That would be a yes intent. Or there might be a 'Start a workout' intent - begin a workout, get a new workout going, etc.".
"Alexa, give me some Hugh Facts" seemed like a good one to invoke my own skill, but I needed to think carefully about how people speak when they're talking to devices like Echo and Google Home, and the sample utterances I should consider Alexa listen out for.
Read this: The best Alexa Easter eggs
"Amazon does things to help you," says Justin. "They have built-in intents, the list of utterances of 'yes', 'sure', 'let's go' - anything that may be considered the yes intent. But the sky's the limit". You could tell Alexa to recognise "Macchiato" as an intent word if you wanted, but all you do is provide the text, as the voice recognition is all done on Amazon's end.
Which is why I also had to remind myself that Alexa isn't perfect, and sometimes it will get things wrong. That's why you should avoid utterances that could be confusing or unclear for Alexa. And what happens if she does misunderstand? "The only way to recover is that you have to ask the user to repeat it again at this point in time," says Matthias Keller, chief scientist at Kayak who worked on the travel company's own skill. "If you try to guess it, you're already down the path of a very bad user experience. Same if you're asking [the user] for confirmation."
It wasn't long before I had Hugh Facts up and running, spitting out a random fact on command. But it was time to take it to the next level, so I found another template for a multiple choice quiz Amazon provided for developers, called Reindeer Games.
I was able to edit a series of questions already included in the code I provided, and again, I needed to think about the utterances people would use to start the game and fetch new questions.
In this case, all of the questions and answers were contained in the code, but you can also point a skill at a source on the internet, in which case you need to use an API. "Someone else is managing that database," says Justin. "If you wanted something like what's the new movies coming out, you could pull those facts or updates from someone else's API easily."
Getting it certified and published
Once I was done with my trivia skill, it was a case of submitting it for review. This process can take anywhere between three and seven days, during which time your skill will be analysed for bugs and content. "Amazon has a testing process called certification," says Justin. "So they'll test from a security standpoint to make sure there are no obvious loopholes with your skills code that could be malicious to another user."
So it needs to be functioning and there must be no possibility it will get confused or trip up on any bugs in the code. But what about the content part?
"There's also the content policy, which dictates what she can respond with," says Justin."You can't build something that specifically markets to children unless certain conditions are met. So they test for that and make sure you're not making something [that should be] restricted for kids." Amazon is subject to internet laws protecting children, such as the COPPA (Children's Online Privacy Protection Rule) in the US, but some things you can around by providing a thorough description of your skill in the submission notes.
"Say if I had a bartender skill, if they found your skill isn't appropriate for all ages, you'll have to put that it's for 21 and older in the skill description."
You must also declare in your submission notes if your skill is going to collect any personal information about the user. As for swear words, these are censored from Amazon's side. I tried a few and "bastard" was the only word I could get through unsanitised - all others were bleeped out. Amazon is unlikely to publish something incredibly vulgar, but there is a grey area and it's best to be upfront in declaring everything in your description.
"They've been doing a good job of burying the less relevant stuff lately," says Justin. In the same way Apple and Google are good at hiding clones, Amazon seems to be learning too.
My skill was definitely vulgar, but only in its unabashed uselessness. Still, it was time to submit it, and at the time of writing I await the verdict of whether it's good enough for publishing. Considering everything I've learned, it seems like it should be a safe bet.
So of course, my mind then goes to money. Not that I see any justifiable reason or way to monetise my skill, but how will more deserved apps make cash?
"Right now there's no monetisation for developing a skill but that doesn't mean there won't be," says Justin. "Of course that's something people want to figure out. People will come to me and say 'Hey, you built that skill, want to build a skill for us?'
Right now it's a bit like social media; Alexa can help brands enhance themselves, and become more discoverable. For developers getting in the game early, it may have financial rewards in the long term.
Plus, there's the incentive that voice recognition is just… cool. And it can only get better.
Time to smarten up your home
- The state of the smart home in 2017Is the Jetson lifestyle truly upon us?
- Amazon Echo v Google Home: The showdownWhich smart home speaker is best for you?
- The ultimate guide to smart home etiquetteWho knew futuristic homes would be such a minefield of manners?
- Google Home tips and tricksAll the best Google Home cheats you'll ever need
Better, but how?
I was keen to get some thoughts from Alexa developers on how they'd like to see voice platforms get better, and what advice they'd give to any budding developers.
"There are times you need a user's eyeballs, and I don't think that can necessarily be replaced by voice soon," says 60dB's Baq Haidri. "I think the problem with these platforms right now is that they're very constrained in what they can do. It's essentially a call and response and there's really no memory. So if you want to use voice to cook something from a recipe, you need to be able to have a conversation."
Kayak's Matthias Keller says he'd like it if, in the future, some of the natural language processing could be done on the developer side. For now, he says people "should just try it".
""It is super easy, and if you have experiencing in programming, it's probably a Sunday afternoon to get something going. A recommendation is to really spend time in building the voice interface and what you can do to build a meaningful service that has some sort of impact.
"Spend more time finding a problem."