Another example of the phenomenon I call ‘a world, not a wall’. Here Angles fans will gat an augmented experience at the game with more-or-less standard Android tablets that will be pushing stats and closeups. But augmented reality is perfect for settings like the ball park.
Imagine watching a basketball game and seeing all of the vital statistics surround your favorite player without taking your eye off the game. CrowdOptic aims to visually enhance the event experience through a heads-up display on an iPhone. Instead of marketing to consumers, however, CrowdOptic is able to charge event organizers, sports managers, and advertisers a (sizable) premium for hyper-detailed analytics of knowing which performers are most popular and when. While the app itself is certainly a step forward, what it represents is the next stage in the event experience.
Clever readers may know that commercial optical recognition, such as Google Goggle’s, hasn’t advanced to the point where users can snap a picture to identify a person, let alone a moving target. CrowdOptic works by sensing the iPhone’s GPS location, compass heading, and time of day to know which object is most likely being viewed through the iPhone screen. It needs at least one other user looking at the same object to triangulate the position. Thus, it can tell which band is on stage, which side of a tennis court a player is on, or which soccer player is running down field.
CrowdOptic then overlays the screen with data such as the name of the song being played, say, or the point guard’s free throw percentage. “Its the live action that matters,” says founder Jon Fisher. Users can then snap photos, and share with friends across the social media universe.
There’s great deal of heat these days (and little light, I think) about intentional evolution, where humanity will tinker with its own DNA, or put gizmos in our brains, causing a step function in the way that humans think, act, or perceive the world.
Mark Changizi thinks the most obvious path to some significant change in human cognitive capabilities is neuronal recycling, where we take advantage of the plasticity of our minds to develop new cultural tools, like written language, or mathematics:
If there is something next, some imminently arriving transformative development for human capabilities, then the key will not be improved genes or cortical plug-ins. But what other way forward could humans possibly have? With genetic and cyborg enhancement off the table for many years, it would seem we are presently stuck as-is, sans upgrades.
There is, however, another avenue for human evolution, one mostly unappreciated in both science and fiction. It is this unheralded mechanism that will usher in the next stage of human, giving future people exquisite powers we do not currently possess, powers worthy of natural selection itself. And, importantly, it doesn’t require us to transform into cyborgs or bio-engineered lab rats. It merely relies on our natural bodies and brains functioning as they have for millions of years.
This mystery mechanism of human transformation is neuronal recycling, coined by neuroscientist Stanislas Dehaene, wherein the brain’s innate capabilities are harnessed for altogether novel functions.
This view of the future of humankind is grounded in an appreciation of the biologically innate powers bestowed upon us by hundreds of millions of years of evolution. This deep respect for our powers is sometimes lacking in the sciences, where many are taught to believe that our brains and bodies are taped-together, far-from-optimal kluges. In this view, natural selection is so riddled by accidents and saddled with developmental constraints that the resultant biological hardware and software should be described as a “just good enough” solution rather than as a “fine-tuned machine.”
So it is no wonder that, when many envisage the future, they posit that human invention—whether via genetic engineering or cybernetic AI-related enhancement—will be able to out-do what evolution gave us, and so bootstrap our species to a new level. This rampant overoptimism about the power of human invention is also found among many of those expecting salvation through a technological singularity, and among those who fancy that the Web may some day become smart.
The root of these misconceptions is the radical underappreciation of the design engineered by natural selection into the powers implemented by our bodies and brains, something central to my 2009 book, The Vision Revolution. For example, optical illusions (such as the Hering) are not examples of the brain’s poor hardware design, but, rather, consequences of intricate evolutionary software for generating perceptions that correct for neural latencies in normal circumstances. And our peculiar variety of color vision, with two of our sensory cones having sensitivity to nearly the same part of the spectrum, is not an accidental mutation that merely stuck around, but, rather, appear to function with the signature of hemoglobin physiology in mind, so as to detect the color signals primates display on their faces and rumps.
These and other inborn capabilities we take for granted are not kluges, they’re not “good enough,” and they’re more than merely smart. They’re astronomically brilliant in comparison to anything humans are likely to invent for millennia.
Neuronal recycling exploits this wellspring of potent powers. If one wants to get a human brain to do task Y despite it not having evolved to efficiently carry out task Y, then a key point is not to forcefully twist the brain to do Y. Like all animal brains, human brains are not general-purpose universal learning machines, but, instead, are intricately structured suites of instincts optimized for the environments in which they evolved. To harness our brains, we want to let the brain’s brilliant mechanisms run as intended—i.e., not to be twisted. Rather, the strategy is to twist Y into a shape that the brain does know how to process.
But how do I know this is feasible? This tactic may use the immensely powerful gifts that natural selection gave us, but what if harnessing these powers is currently far beyond us? How do we find the right innate power for any given task? And how are we to know how to adapt that task so as to be just right for the human brain’s inflexible mechanisms?
I don’t want to pretend that answers to these questions are easy—they are not. Nevertheless, there is a very good reason to be optimistic that the next stage of human will come via the form of adaptive harnessing, rather than direct technological enhancement: It has already happened.
We have already been transformed via harnessing beyond what we once were. We’re already Human 2.0, not the Human 1.0, or Homo sapiens, that natural selection made us. We Human 2.0’s have, among many powers, three that are central to who we take ourselves to be today: writing, speech, and music (the latter perhaps being the pinnacle of the arts). Yet these three capabilities, despite having all the hallmarks of design, were not a result of natural selection, nor were they the result of genetic engineering or cybernetic enhancement to our brains. Instead, and as I argue in both The Vision Revolution and my forthcoming Harnessed, these are powers we acquired by virtue of harnessing, or neuronal recycling.
In this transition from Human 1.0 to 2.0, we didn’t directly do the harnessing. Rather, it was an emergent, evolutionary property of our behavior, our nascent culture, that bent and shaped writing to be right for our visual system, speech just so for our auditory system, and music a match for our auditory and evocative mechanisms.
And culture’s trick? It was to shape these artifacts to look and sound like things from our natural environment, just what our sensory systems evolved to expertly accommodate. There are characteristic sorts of contour conglomerations occurring among opaque objects strewn about in three dimensions (like our natural Earthly habitats), and writing systems have come to employ many of these naturally common conglomerations rather than the naturally uncommon ones. Sounds in nature, in particular among the solid objects that are most responsible for meaningful environmental auditory stimuli, follow signature patterns, and speech also follows these patterns, both in its fundamental phoneme building blocks and in how phonemes combine into morphemes and words. And we humans, when we move and behave, make sounds having a characteristic animalistic signature, something we surely have specialized auditory mechanisms for sensing and processing; music is replete with these characteristic sonic signatures of animal movements, harnessing our auditory mechanisms that evolved for recognizing the actions of other large mobile creatures like ourselves.
Culture’s trick, I have argued in my research, was to harness by mimicking nature. This “nature-harnessing” was the route by which these three kernels of Human 2.0 made their way into Human 1.0 brains never designed for them.
The road to Human 3.0 and beyond will, I believe, be largely due to ever more instances of this kind of harnessing. And although we cannot easily anticipate the new powers we will thereby gain, we should not underestimate the potential magnitude of the possible changes. After all, the change from Human 1.0 to 2.0 is nothing short of universe-rattling: It transformed a clever ape into a world-ruling technological philosopher.
The Web is a new form of knowledge tool, like writing and mathematics, but of astonishingly greater power, and one that potentially bends and shapes our cortex in novel ways. The rise of gestural user experience, messiness-at-scale social systems, and augmented reality are likely to usher in drastically different cultural norms and forms of interaction and perception based on a web-mediated experience of the world.
Maarten Lens-FitzGerald, one of Layar’s founders and current general manager was kind enough to answer some of my questions in the following short interview.
[…]
For the rest of the year we had four principles that worked for us: sense, scale, open and pull.
Sense means that we don’t always understand everything but trust that on a deeper level we know what direction to take. The mobile industry moves swiftly and is very complex. We trust our instincts most of the time and are not the types for elaborate business planning. It’s no coincidence we are in the sensing business.
Scale means that we create systems that can grow. Augmented Reality is an economy of abundance. There is no limit. We host in the cloud, limitless scalability as the Lakers – Celtics effect showed us. We also don’t know what is relevant in Argentina or Tokyo. That’s why we don’t do content. Others make it, and make a good business when they sell their layer work and AR. We can’t talk to everyone to make a business. But together with the developers and publishers we can. And another one is that we knew the Layer catalog wouldn’t scale for the many, many layers and their content. You need a discovery mechanism to open up the augmented world. Like the EPG for TV, Google for the web etc. That’s why we launched Stream.
Open means that we share and give away as much as we can. The internet has great examples of openness like the protocols, websites like Wikipedia and software like Apache. This helps us see that to scale we need to be open. To last we need to be open, to give away and share the opportunity. We love the idea of infrastructure and its ideals. AR needs infrastructure and hopefully Layar can help by being open as much as we can.
Pull means that we don’t push. We don’t call people and try to sell our product. We don’t do anything that costs too much time and energy. We’d rather put the energy in a great product that attracts, that pulls everyone to us. Instead of spending money and a big marketing campaign we’d rather create a great feature that everyone will talk about and can be introduced with one blog post. John Hagel was a good inspiration for us for this.
For us this works, they are principles we work by and that are closely linked together.
I guess we’ll have to see if Layar becomes another Tivo: emblematic of a fundamental transition in communications and media, but unable to capitalize on it in a large way. Remains to be seen if they can compete with folks like Google.
I loved Minority Report’s gestural interface, as a scifi representation of what we think a police state might use to watch us, given the ability to move through a nearly infinite amount of data — and time — searching for clues.
Apparently, that interface is not just the stuff of Hollywood, as it appears that John Underkoffler, the guy that mocked up that experience for the movie, has been off actually building the system he literally is hand waving into existence.
MG Siegler thinks this represents the future of computing. I disagree, but first, MG’s thoughts:
While we may not have been at this year’s TED conference, apparently, Oblong was. And apparently, it wowed the crowd. And it should have. If you’ve seen the movie Minority Report, you’ve seen the system they’re building.
No, really. The co-founder of Oblong, John Underkoffler, is the man who came up with the gesture-based interface used in the Steven Spielberg movie. And now he’s building it in real life.
The demo I saw a couple years ago was stunning, but it was still just a video. Apparently, at TED, the audience got to see it in action. NYT’s Bits blog detailed some of it in a post yesterday. For those not at TED, Oblong has also made a few demo videos in the past, which I’ll embed below. Again, this is Minority Report.
Oblong’s coming out party couldn’t come at a better time. Following the unveiling of Apple’s iPad, there has been a lot of talk about the future of computing at a fundamental level. That is to say, after decades of dominance by the keyboard and mouse, we’re finally talking about other, more natural, methods of input. The iPad is one step to a multi-touch gesture system (as is this 10/GUI awesome demo), but this Oblong system is the next step beyond that.
I don’t believe that huge displays based on petabytes of information — like Cruise was surfing — is likely to be the prototypical user experience for normal people in the near term. In some narrowly defined industries — military, cinematography — such displays may be temporarily of interest. But the future of user experience is a logical extension of what we have been seeing in consumer electronics: a continued movement to small, mobile, and personal.
Yesterday, I posted a Nokia video that I think is much more true to life. I reproduce it here, again, in the form of the complete video, and an image pulled out.
complete video
one screenshot from the video
The Nokia example is based on a few assumptions:
Augmented reality glasses will become the standard user display — Instead of huge displays, hung on walls with giant panels, people will wear augmented reality glasses. These will display on the inside of the glass images that provide access to various sorts of information.
Displays will become less complex than today’s file/folder/desktop jumble, and interaction will be based on simple eye movements and gestures — User interaction will rely on eye tracking and gestural interfaces to represent selection, expansion, playing video or audio, and the like. In this example, the woman looks at the name of an artist in a playlist long enough and the environment interprets that as a selection. At some points in the demo she flicks her hand to represent clicking or scrolling. Note she doesn’t wear gloves or special hand gear: the glasses have cameras that watch her hands. They don’t show her doing it but either a generalized sign language could be used for more complex communications — more than selecting an emoticon, like she does — or a virtual keyboard could be displayed, and ‘keystrokes’ recorded, again, by the glasses observing our hands.
I don’t think that the grand gestural, ‘orchestra conductor’ sort of scenario that we saw in Minority Report will be the norm, although in specialized contexts — like gaming, war fighting, and brain surgery — those sorts of advanced gestural languages might be developed.
Social interaction with others will be the primary modality of all future operating environments, and other activities will principally be constructed to help filter and aggregate social channels — This is not well-represented in either the Nokia video or Minority Report. In the Nokia example, the woman is mostly dabbling with relatively conventional streams and stores — weather and news, and riffling through a music library — while occasionally being pinged by an overly attentive boyfriend. Imagine a more rich scenario of a marketing executive racing through the streets of New York, communicating with four colleagues in an open semi-public sort of way, with integrated information streams of plans, designs, and marketing campaign mockups. And at the same time receiving local augmented reality information about the streets she is passing through, like GPS coordinates, a map showing her destination and where her four colleagues are, offers from the food truck she passes, and a global stream of socialized news and information from her network of friends, fans, and connections.
***
This can be condensed to the shorthand: not a wall, a world.
The steampunk idea that we will continue to have displays like today’s TV screens or PC monitors is dubious. I would give up mine in a heartbeat. More important, there is a world out there, and amplifying what we are already looking at — like the street we are walking on — with relevant information — like where the bus stop is, or what kind of food that restaurant serves — is so obviously helpful it doesn’t really need to be motivated.
We will continue to have personal and mobile computing experiences in the near future, because mostly we work and play on personal devices. Yes, there is the occasional meeting in a face to face setting where currently we use large displays, but this will be replaced by shared augmented reality: a presentation, for example, could be controlled by one person (or more) and viewed by a larger group. But this wouldn’t be projected on the wall, necessarily. Instead, it would be shared via each attendee’s glasses. We might be looking at a blank wall, or we might be walking through a virtual representation of a building being designed, or a product being assembled.
Amplifying the social through this sort of user experience would be phenomenal. Wandering around at a business meeting, a party, or a conference, and seeing salient information about the people you are looking at — where they work, when you last talked, the names of their loved ones, their pet peeves, whether they follow you and know of your work — would be an immense help. And would potentially change the nature of our social contract in startling ways. This is what I am expecting to appear, and very soon. Not 2045.
The technology industry is going retro — moving away from remote controls, mice and joysticks to something that arrives without batteries, wires or a user manual.
It’s called a hand.
In the coming months, the likes of Microsoft, Hitachi and major PC makers will begin selling devices that will allow people to flip channels on the TV or move documents on a computer monitor with simple hand gestures. The technology, one of the most significant changes to human-device interfaces since the mouse appeared next to computers in the early 1980s, was being shown in private sessions during the immense Consumer Electronics Show here last week. Past attempts at similar technology have proved clunky and disappointing. In contrast, the latest crop of gesture-powered devices arrives with a refreshing surprise: they actually work.
[…]
Just as Microsoft’s gaming system hits the market, so should TVs from Hitachi in Japan that will let people turn on their screens, scan through channels and change the volume on their sets with simple hand motions. Laptops and other computers should also arrive later this year with built-in cameras that can pick up similar gestures. Such technology could make today’s touch-screen tools obsolete as people use gestures to control, for instance, the playback or fast-forward of a DVD.
To bring these gesture functions to life, device makers needed to conquer what amounts to one of computer science’s grand challenges. Electronics had to see the world around them in fine detail through tiny digital cameras. Such a task meant giving a TV, for example, a way to identify people sitting on a couch and to recognize a certain hand wave as a command and not a scratching of the nose.
Little things like the sun, room lights and people’s annoying habit of doing the unexpected stood as just some of the obstacles companies had to overcome.
GestureTek, with offices in Silicon Valley and Ottawa, has spent a quarter-century trying to perfect its technology and has enjoyed some success. It helps TV weather people, museums and hotels create huge interactive displays.
This past work, however, has relied on limited, standard cameras that perceive the world in two dimensions. The major breakthrough with the latest gesture technology comes through the use of cameras that see the world in three dimensions, adding that crucial layer of depth perception that helps a computer or TV recognize when someone tilts their hand forward or nods their head.
This advance is one of several that will form the basis of an entirely new experience for computing.
Gestural UI, or ‘hand jive’ as I call it, once deployed as a built in aspect of future computers, like touchpads and mouses are today, will set the stage for a rethink about user experience.
First we will see hand jive as a way to manipulate the gears of now-tradition windowed UIs: pulling down a menu in an app, moving windows around, dragging a file to the trash.
In the future, we’ll have real Minority Report stuff, without the enormous touch screens: we’ll also see the emergence of augmented reality goggles — Terminator goggles — where we can toggle back and forth between 100% computer screen sorts of display to 100% augmented reality. And the goggles — as an integrated part of the computing device — will be watching our hands for commands, and watching the world for reality to augment.
The combination of these trends will make computing primarily mobile: we’ll have an iPhone sized device we carry all the time, which will be a phone and a PC. We will be free of LCD screens — in general — courtesy of our goggles, and free of keyboards, courtesy of hand jive. A keyboard can be imaged on any flat surface by the goggles, and we can type without a physical keyboard because the gestural system is watching our fingers in 3D. And of course, a lot of things could be done without typing, especially once kids start using sign language and voice to communicate with computers. (I say kids because that’s who start first.)
The other parts of this tectonic shift in UX will include the end of the document-centric folder/file/desktop metaphor, where information in managed in documents, based on old school filing cabinets. I believe that innovation like the Litl OS, which has shifted to a TV-influenced UI of channels, treating information as something that flows and not something static, sitting in a document.
And of course, the social web will be the foundation of future computing, as opposed to a document-centric world in which people are an afterthought.