Apple's Big Experiment
Vision Pro is packed with AI but Apple won't say it out loud
Today’s post requires that you are up-to-date on Apple’s big news. So if you haven’t already done so, watch the following video.
By and large, according to initial reviews, it looks like all of this actually works as promised.
That means that yesterday was one of the biggest days ever for AI. But if you watched Apple’s speech, you would be forgiven for thinking that this had nothing to do with AI. This is because Apple, to my knowledge, has never used the word, “AI” or “Artificial Intelligence” to describe anything that it is doing. But via the now common meaning of the word, it is probably applying AI to more distinct things than any company in the world. When it is pushed to describe the technology, it is through the more technical and precise mantras of machine learning, neural engines and just yesterday, “transformers.”
The other words Apple did not use were augmented reality (AR), virtual reality (VR) or mixed\extended reality (XR). Well, they mentioned augmented reality very briefly before moving on to talk about something new, spatial computing.
What should we make of all of this?
Where’s the AI?
It’s everywhere. To create a display and user interface for a computer that can be worn on your face/head, Apple had to solve several unsolved problems. First, it needed to provide little to no latency in the experience so that people feel this is a natural situation rather than something that is so strange it may make them nauseous. Second, it needed to provide a means of interacting with the device without also having to hold some other device like a controller. Finally, all this had to be at a resolution and quality that made it possible to read and work. All but the last of these (I think) required major advances in AI.
The most obvious is the interaction. Apparently, you move around a screen using your eyes, so there has to be a way of predicting what your eye is really looking for. And you then interact by using simple hand gestures. Not raising your hands and swishing Minority Report style (although that sounds like it would be fun). Instead, being able to be relaxed and not have to move your hands or hold them up at all. This required them to put a camera on the bottom of the headset to look for your hands and then AI to interpret those gestures. That whole thing is a crazy advance that I don’t believe any other company has seriously attempted. And Apple opted for that rather than, say, relying on Siri; which is interesting.
Relatedly, Apple also uses AI to identify people who come into the room and work out whether you want to interact with them. This is another great use of AI.
AI also solves the latency problem. This is not Apple’s first rodeo in that regard. As we wrote about in Prediction Machines, the iPhone almost failed to be a thing because they couldn’t get the keyboard to work. Then one engineer solved it overnight by using some basic predictive analytics to solve what would otherwise have been a “fat thumb” issue.
The same type of magic as at work with the Vision Pro. A former Apple scientist described this:
Generally as a whole, a lot of the work I did involved detecting the mental state of users based on data from their body and brain when they were in immersive experiences.
So, a user is in a mixed reality or virtual reality experience, and AI models are trying to predict if you are feeling curious, mind wandering, scared, paying attention, remembering a past experience, or some other cognitive state. And these may be inferred through measurements like eye tracking, electrical activity in the brain, heart beats and rhythms, muscle activity, blood density in the brain, blood pressure, skin conductance etc.
There were a lot of tricks involved to make specific predictions possible, which the handful of patents I’m named on go into detail about. One of the coolest results involved predicting a user was going to click on something before they actually did. That was a ton of work and something I’m proud of. Your pupil reacts before you click in part because you expect something will happen after you click. So you can create biofeedback with a user's brain by monitoring their eye behavior, and redesigning the UI in real time to create more of this anticipatory pupil response. It’s a crude brain computer interface via the eyes, but very cool. And I’d take that over invasive brain surgery any day.
Other tricks to infer cognitive state involved quickly flashing visuals or sounds to a user in ways they may not perceive, and then measuring their reaction to it. Another patent goes into details about using machine learning and signals from the body and brain to predict how focused, or relaxed you are, or how well you are learning. And then updating virtual environments to enhance those states. So, imagine an adaptive immersive environment that helps you learn, or work, or relax by changing what you’re seeing and hearing in the background.
All of these details are publicly available in patents, and were carefully written to not leak anything. There was a ton of other stuff I was involved with, and hopefully more of it will see the light of day eventually.
There are 5000 patents, so that may take some time. We have seen so many advances in AI this year but this one just boggles the mind. They are beating your own eye to the target!
All this is both amazing and amusing. The amazing stuff is obvious. What is amusing are the continued breathless takes that Apple is somehow lagging behind in AI. This device was built for AI. It has 12 cameras (!) and more sensors than I can count. It has solved some truly audacious prediction problems. Google, in their recent keynote, couldn’t go 30 sections without saying AI this and AI that. Apple doesn’t care what you call this and clearly sees the term as a distraction that sets expectations poorly.
A “Best Foot Forward” Experiment
Just like AI, people are going to try and talk about whether this device is AR, VR or XR. Apple didn’t bother with that. They don’t care. But they have been pushing AR-type developments with the iPad and iPhone through ARKit for at least five years. Those were low-grade experiments that showed that a flat screen wasn’t going to cut it for something useful there.
How immersive technologies like AR and VR lead to real uses is something that I have been working on with Abhishek Nagaraj for about a year. Last week, because we wanted to get ahead of any Apple announcement, we realised a paper describing our views on that. Bottom line: previous moves by Google, Meta and Snap had focussed on applications with very low value. Higher value ones require thinking more clearly about the decisions users and making and how AR and VR might help them.
We need not have rushed. To the extent that there is any AR or VR here, it is basically the ability to have a big and potentially more useful screen without having to have the physical product. Anything AR and VR will not be the device, but the apps, and those apps are yet to be done and will likely not be done by Apple even if they run on the spatial computing platform. So we will talk about AR and VR on a future occasion.
So what is all this? It is a big experiment. In a paper I published last year, I argued that innovators conduct experiments that provide different signals. I distinguished between “raise-the-bar” experiments and “best foot forward” experiments. “Raise the bar” experiments are where you put out an intentionally kind of crappy product with the notion that if people still like it, you are on to something. Think of minimum viable products that startups often do. It is the sort of thing you do when you are not really expecting this all to work.
By contrast, a “best foot forward” experiment is what Apple has done here. They have made sure they provided a spatial computer with a user interface that just worked. That meant solving the three problems I mentioned earlier that gave it the best chance of being something people would actually use. This meant, of course, compromising on two things that will eventually matter — power and price. Both are clearly the weak points to mass adoption. But to compromise there would have harmed the experiment.
Apple’s big experiment, if people buy the device and don’t use the device, will tell them that spatial computing is not a thing. It will send a clear signal that it won’t be a thing for at least two more decades and that Apple’s user interface research didn’t cut it. If it passes the test, it will create a signal that it is worth investing more in this direction of technology.
You may think that having a $3,499 device is hardly an experiment. Won’t people not use it because they can’t afford it? Apple tried to make a marketing pitch that it was better than a high-end home theatre system, but that was weak because that isn’t the right comparison — for starters, what sort of home theatre system is enjoyed by only one person at a time?
The better comparison is Apple’s Pro Display XDR which can cost $6,000. If the Vision Pro works, it is better than that and costs less. In other words, there are users who will pay for this thing, and that will be enough to see if it is a thing.
This is something Apple has done before. The original iPhone was very expensive and limited to the US only. It was so expensive that Apple actually dropped its price and gave customers some of the difference back. But those who got it loved it, and so the technology took off.
More critically, however, this experiment is an industry-wide experiment. If Apple fails, all investment in AR and VR will likely halt for the better part of the next two decades. If they succeed, others will have some catching up to do, but I suspect that will happen. The big value will end up being in the apps created.
Summing Up
We have potentially seen one of the most important advances in the use of AI to build a new interface. At the moment, it is for a desktop or non-mobile experience. Apple didn’t show anyone outside of the home or office. They don’t expect this particular device to be worn out. My guess is that we are five years away from a set of glasses that can be worn and made useful in public. And that will likely be tethered to the phone for computing.
The question you might be asking is: will I, Joshua Gans, be buying one of these early next year? Well, for starters, it looks like it will be US only, so I will have to go to some more effort to get past that. But more critically, I am not sure. I have plenty of devices that look interesting that now sit in a closet. I can justify that for something worth $500 but $3,500 gives me significant pause. I don’t know that I will want to watch a movie alone or adjust how I work as I like my current setup. But then again, it is cool and I’m a sucker for cool tech.