A UNC student points his smartphone at the fountain in front of Bynum Hall. A Staryu — one of many water-capable Pokémon — appears on the screen. It jumps into the fountain as the student attempts to repeatedly throw Pokeball after Pokeball at the creature. Although it’s just an image of the pop-culture critter overlaid on a live video that the phone is currently capturing, it looks like the Pokémon is actually there with the student, in his world.
This is Pokémon Go — a game that recently introduced the world to augmented reality, a technology that places computer-generated objects or data into the world around us. Although it wasn’t the first app of its kind, it’s certainly the most successful to date and has brought the technology to the mainstream. While an experience like Pokémon Go is definitely novel, the real power in augmented reality is its ability to embed data into our everyday lives.
Imagine being able to point your phone camera at a car and getting the make, model, and price listings of that car at nearby dealerships. What if you could do the same thing with someone’s face? The app would reveal their name, a short bio they’ve uploaded to it, and any other relevant information to the situation you are both in.
This past semester, I have been developing an app to do exactly that in the Reese News Lab under the guidance of journalism professor Steven King, who is collaborating with another team on a version for the Microsoft HoloLens. Users will be able to point their phones at someone who has uploaded information to the database and get an overlay on the phone’s video feed of whatever information they provided, in real time. The app uses facial recognition algorithms to compare who comes through the video feed to the people in the database.
Through the eyes of a computer
Facial recognition is part of a larger family of technologies called computer vision. Computer vision is exactly what it sounds like — it allows computers to “see” things in pictures and video. For our app, we use a code library called OpenCV (Open Computer Vision) to process video as it comes through the phone. Most facial recognition today happens on pictures. You take a picture, process it, and then the program returns whose face it thinks it is. Facebook already does this by suggesting people to tag in photos on your profile. In video, it works the same way, except instead of using one picture, the program takes frames from the video and processes them.
For accuracy, you need multiple pictures of a face for each person. Most algorithms in OpenCV combine the photos of one individual into one complete image and then use that to identify people. This helps the technology recognize people whether they’re smiling, frowning, or even if they’re wearing glasses, provided there’s a picture for each one of these in the database. The use of multiple pictures also allows for in-between expressions so that you can get the widest possible range of emotions recognized, while still noting they’re all from the same person.
Photo overload
Having three to seven pictures for each face in a database would immediately deplete any user’s data limit for the month — and could actually overheat some phones with all the processing power. Plus, downloading all those photos each time a user opens the app would take way too long to be acceptable. It is possible, though, to “train” a recognizer. Once trained, it can be condensed into a single file that only needs to be updated when new faces are added to the database.
Training simply means inserting values that relate to the people in your database into the algorithms for recognizing faces. The app then accesses these values from a single file on each video frame. You can train a recognizer on a web server, meaning pictures don’t have to be present on the phone at any point. When the program opens, it downloads this file — the “trained recognizer” — and puts it into OpenCV’s algorithms.
Facial recognition, what is it good for?
An idea we’ve considered is connecting this app with LinkedIn for large events like business conferences. It would identify people at the conference, return their LinkedIn profiles, and display the information on-screen. Then, employers and recruiters could see who would be available to hire and who matches their needs before a conversation even happens. This saves a lot of time in determining whether somebody is qualified for a position, and helps potential employers determine if a person is a good fit for the company’s culture.
A simpler application would allow users to display basic information along with a simple message about them. If someone seems interesting, you can talk to them and mention something about their profile or message. While the initial interaction happens on-screen, the design of the app will help people put their phones away upon finding interesting people to chat with.
Saving face
I’ve gotten every response from, “Wow I could see the TSA or someone using this,” to, “You’re literally making that thing from Black Mirror,” when I tell people about this project. When you use people’s data, naturally they’re going to be protective of it — and for good reason. So when you attach that data to their faces, it’s no surprise they get a bit nervous.
My goal is to make the apps using this technology completely voluntary. No one has to put any information they don’t want to inside the database. It’s best not to store sensitive information in these programs anyway, since there’s a potential to have that data displayed next to someone’s head. Since the end goal of the app is to facilitate conversation, any crucial information not provided, such as a person’s full name or email, can be gained by simply talking to the person.
Viewing the future
We’re still working out a few kinks with this app, as well as with the HoloLens version. Even though the processing is pretty fast, it still lags a bit. This means we have to skip some frames and engineer ways to track people across frames that are not directly processed by the facial recognition code. We also still have a fair bit of testing to do for considerations like how light affects recognition and the minimum number of pictures of someone we can use to accurately and reliably recognize them in a video stream. We don’t want to have our users take 10 pictures because that’s annoying and by the end they’ll probably stop trying to ensure they are good enough pictures to use for recognition.
Despite these few challenges, the future of this technology looks really promising. It’s an app that can put the power of facial recognition combined with structured data and augmented reality into the hands of the average person. Since it is now open source, other coders and programmers will be able to add to it and improve on it once it’s done. But, like any technology there is some potential for misuse. If we address those issues now, we can produce something that will truly make a positive impact on the world.