I have had the opportunity to use Apple’s Vision Pro for many hours. Throughout that time, I was wondering what I might write about it. I could be gushing in amazement. I could be whiny that it is not naturally comfortable. I could claim that I was able to prepare a meal and wash dishes while having a big-screen TV in my kitchen (I was). I could also comment on how I finally had a laptop screen I could read easily (I did). I could embark on an experiment, “I used a Vision Pro to extract my kid’s appendix myself!” (I didn’t), or I could forecast that I have finally seen how people will find me when I am dead, lying on a couch with a big arse set of goggles strapped to my head and it would take them hours to realise I was gone (we’ll see). But all (most?) of that has been written about by others. Instead, I want to consider what it means for the evolution of computing technology.
In Power and Prediction, our big point is that AI will prove to be transformative only once new systems are developed to leverage the power of AI. Everything we have seen up until now has been point solutions, uses that can slot into existing systems or applications, new stand-alone things you can do. System innovation is beyond that — think electricity leading to production-line manufacturing or the mobile phone leading to ride-sharing. For new systems to be developed off of the back of AI, you will need (1) pretty great AI prediction and (2) a new system. Suffice it to say that while we can be confident that is what you need, it is not exactly helpful. Many have asked us for more details on step (2).
As I played with the Vision Pro, it came to me that Apple is pretty amazing at seeding new systems. Pretty much all of our digital ecosystems were seeded by Apple. There are two exceptions — the Internet broadly (by Netscape) and the latest wave of AI (by OpenAI) — but as I will argue, they are exceptions that prove the rule.
With the Vision Pro, there is a good chance Apple is going to do it again because it finally discovered the one critical thing you need to seed ecosystems — finding the right user interface (UI) for technology.
Stepping back … the importance of UI invention
A UI invention is the discovery of a way that ordinary people can interface with a new technology. To explain Apple’s role in this, I created an infographic.
We could quibble about some of the details here — e.g., was it the Mac or the Lisa that seeded graphical computing? Was it the Apple I or Apple II that was relevant for personal computing? Was it the iPod or iMac that seeded digital music changes? And didn’t Palm create a pen computing ecosystem or Alexa something for voice computing? But it is the big picture that is important. The really large leaps in ecosystem development came with graphical computing and mobile computing. In those cases, Apple worked out a new UI that could sporn so much more as others developed apps to take advantage of it. In both cases, the general idea was there, but it was only Apple that first put it together in what would become a dominant design. Graphical UI is the UI for all personal computing up until the present day. The UI for the original iPhone is the UI for all mobile computing up until the present day. These are the very definition of dominant design. When kids today are given those original devices, they can use them, which is more than we can say for other stuff.
The point here is that once a UI was discovered, this lowered accessibility costs for people and created massive incentives to develop new applications based on that UI. It offloaded the task of determining user interactions from developers so they could develop other stuff. So, an answer to “Where do we get new systems from?” is: the discovery of the UI for a new technology.
Before going on, a quick story: before I decided to be an academic economist, I did interviews for other careers. One, in particular, sticks in my memory: IBM. I applied to work with them back in 1988 because I liked computers. I was offered a job, which I turned down. The reason was that I had an argument with the interviewer that convinced me that IBM had the wrong vision. They were asking me about the importance of “computer literacy.” This was the 1980s version of the “learn to code” movement. IBM thought it was very important that people be taught to be computer literate. I made the case that that was a bad mindset because it would surely be easier to design computers better so that someone could just use them without having to learn too much. I used the Mac as an answer, which went down as well as you might have expected. I recount that story because (a) I was right, (b) IBM went into sharp decline thereafter and (c) because it illustrates a critical point about the right UI: people shouldn’t have to become literate in anything to use it. And if you want to know what it means to have that right, just watch this video.
The Vision Pro’s UI
The Vision Pro has only one significant innovation, and it is the big one: it gets the UI for spatial computing right. Everything else that it does has been done by other VR/AR stuff before — some of it is better, but some of it is not. There is no new application that you see is really new. From big screens to 3D environments to immersion to timers available for cooking pasta, these uses are not new. (By the way, the same was true of mobile computing with the iPhone — AT&T saw pretty much that future back in 1993). But what is new, and it is something that makes the rest seem new, is the UI.
As soon as you have set up your headset, you can use it naturally. You may have tried other VR or AR, and it all requires some education. I had a Meta Quest 2, and you used a controller to do stuff, and I just hated it. It was unintuitive and annoying. Yes, you could look around, but as soon as you had to do anything, the magic quickly wore off.
The Vision Pro requires that you do only two things: (1) you look at the thing you are interested in or want to manipulate (like a button), and (2) you put your thumb and forefinger together like a pinch. After that, you can drag stuff around or scroll just like you would on a phone; it is just that your forefinger or your thumb has to touch the other rather than a screen.
You have to try it to appreciate it. And then you can marvel that it works so well. The eye tracking is seamless. Sometimes you don’t quite focus on the right thing — less so with native apps, but it can happen when browsing the internet (which, by the way, is why Netflix sucks to use on the Vision Pro). But it is pretty rare. Then, when you want to interact, you don’t have to move your hand somewhere. You can just use it where it is. All of this requires the Vision Pro to keep track of your eyes and your (right) hand in real-time. The amount of technology that had to be developed and worked out for that to occur is mind-boggling. It will take years for others to get to this level. But they will because this will be the only way to interface in spatial computing. There may be new gestures and interactions, but the core of it is already done.
Indeed, I suspect even eye tracking hasn’t yet reached its full potential. At the moment, you often have to enter a passcode, although an Optic ID can bypass that. But I could imagine that you could enter that with just eye tracking or use a keyboard with eye tracking alone, just like the swipe keyboards on phones. Or, we will just use Siri. On no device have I found it more natural to use Siri to do stuff than on a Vision Pro. Because the keyboard isn’t really there, I have found myself asking Siri first to see how it goes. And this works more naturally than Airpods because I can see what Siri is doing. (This is why I put Siri as a Bust? in the infographic. It may still have its role. For starters, it is the only one of those voice things that is culturally significant.)
The ecosystem can come off the basis of this. And it could easily go to everyone wearing a spatial computing device most of the day, especially at work. The energy and space-saving considerations alone could make that worth it, and that is before the usual meta-verse fantasies. You can physically be in a confined or crowded space but feel like you are not. That has the elements of transformation around it.
What about AI?
If the answer to where new systems start is UI innovation, then what about AI? There is a good case that LLMs and, in particular, ChatGPT, is where that starts. It certainly is natural to use. What it doesn’t have is reliability of the sort that Apple’s UI discoveries have. But that may be the limits of AI rather than this being the right user interface. Others have thought that the right UI for AI will be to build robots with human physical form. Part of that is a hypothesis that is the only way to develop human-like intelligence. But another is that the human form is the way to enter AI into human physical systems. Although there is a sense where that approach is already taken, the system is given rather than the system being the thing that will change.
In the end, I don’t have the answer for AI transformation. But the Apple Vision Pro reminded me how important UI is and so that is something I’ll be paying attention to.