Microsoft Shows Off the Future of Digital Assistants

This little guy is powered by Kinect. His job is to offer visitors directions when they come out of the elevator and arrive on a floor.

This might be what Stephen Hawking and friends were warning us about the other day. Microsoft is giving us a glimpse at what they’re cooking up deep inside Microsoft Research HQ – some serious AI advances that could someday make Cortana less like a glorified personalized search engine and more like the actual Halo Cortana.

Currently, what we see on our phones with Cortana, Siri, and Google Now alike is natural language recognition mixed in with long-term memory, saving search tendencies and actions. They’ve even managed to get contextual learning in there – to some extent, those digital assistants can change what they do based on where you are or what time it is. That said, you can still clearly see the delineations in what virtual assistants can and can’t do – you can see the sketch lines of the programming behind the finished painting. What Microsoft is working on will erase those sketch lines completely – your virtual assistants will become less obviously ones and zeroes, and a little bit more like you or me.

That’s what Microsoft’s Situated Interaction project is about. It includes something as deceptively simple as smart elevators – elevators with detectors that can differentiate between passersby and people who actually need a lift – to virtual assistants that go far beyond what’s currently on phones. The relatively straightforward tech going into those smart elevators – looking at shapes of people and how those shapes usually move when going to press an elevator button, then using that information to make inferences on a case-by-case basis – is being brought over to increasingly complex situations, including the most complex kind – communicating with people.

Take the directions robot at Microsoft Research. The robot sees human shapes approaching and, based on movement patterns, can guess whether or not you’ll be needing in-office directions. The potential for improvement is always there – as the programmers better understand and model human movement, and as the robot goes through more and more test cases to learn from, the models the robot uses to make inferences will get sharper and sharper over time. There’s an easier way to put that – it’s learning, and on the robot’s part, it’s not terribly different from how humans learn. The result is a robot that can ask if you need help, understand natural speech, and give you directions,with accompanying hand gestures.

Maybe even more impressive is Monica, the virtual assistant of Eric Horvitz, a researcher at Microsoft. Monica has the same kind of spatial recognition technology that the directions robot has – made possible by a Kinect, of course – and can interact with people in a natural way. Monica is tapped into Horvitz’ life – based on his tendencies, which are gleaned from the way he interacts with his office computer, Monica can guess whether or not Horvitz will be busy, or if it’s a good time for him to see visitors. You can imagine that as this technology improves, an assistant like Monica would have facial recognition software that could recognize particularly important people in Horvitz’ life, and alter her reactions accordingly.

Ideally, Monica – and the virtual assistants we now use on phones – will be tapped into all devices in the future. After all, more data means more accurate models, and better results when you want or need your virtual assistant to do something for you.

So, are we headed into the dark future that Stephen Hawking has foretold? Who knows? Then again, I guess nobody knowing was the root of Hawking’s warning, wasn’t it?

Well, whatever. If it means a virtual assistant that knows exactly when I want a pizza delivered, and can get my order right, I’m all for it in the interim until the machines rise, and the figurative battle between Cortana, Siri, and Google Now becomes a literal one.