Unseen Science

RENCI is a hub for supercomputing and data science power at Carolina. The institute is the backbone for a slew of successful projects, from the data management software used by the National Library of France to a storm surge modeling system relied upon by FEMA and now a global platform for researchers to develop and test new internet architectures.

a computer monitor made out of LEGOs
In September 2019, RENCI announced it will lead a $20 million project to create a platform for testing novel internet architectures that could enable a faster, more secure internet. (image composed by Corina Cudebec & Alyssa LaFaro)
January 21st, 2020

Sometimes the most powerful things in life are those we can’t see. Atoms. Oxygen. Love. Radio waves. Faith.

And the internet. What appears in your web browser is just the top layer of the sprawling, invisible tangle of pathways and protocols that makes a worldwide network of electronic devices possible.

What most people don’t know is that the story of the internet is a beautiful tragedy. Just three months after NASA put the first man on the moon, researchers at UCLA and Stanford sent the first message over the ARPANET. But at that time, the high cost and lack of technology to support its function — things like processors, disk storage, and memory — made it inefficient and painfully slow.

Nonetheless, it was enticing. Students and amateurs began utilizing the system to build electronic bulletin boards and FidoNet, a network to connect them. This new method of sending and receiving information wouldn’t become mainstream until the early 1990s, when a computer scientist named Tim Berners-Lee invented the World Wide Web — which created an internet experience much closer to the one we know today.

And then came the companies: Mosaic, Netscape, Amazon, Ebay, Google. Each realizing the profit-making potential.

Fast-forward to today, the era of supercomputing, when components are inexpensive and extremely powerful and the userbase is a lofty 4.5 billion people. Even so, the internet doesn’t seem more efficient. Most people have experienced frustrations with access to this online platform, from slow load times to glitches.

That’s because the internet’s computer networking architectures are outdated — as evidenced above, most were developed by a series of government-funded programs between the 1960s and ’80s. But with all the advances we’ve had in the last 50 years, the internet needs to catch up.

“Nobody wants to throw the baby away with the bathwater, but the foundational principles of the internet should be reexamined,” says Ilya Baldin. And that’s exactly what RENCI, UNC’s computing institute, hopes to do to in its recent $20 million project called FABRIC.

“Think of it like a set of LEGO blocks,” says Baldin, director of network research and infrastructure at RENCI and the project lead. “They can be put together in a number of ways to make different kinds of networks. It’s up to the imagination of the researcher as to what they can make with those blocks.”

This is what most RENCI projects do, in a nutshell, and FABRIC is just one iteration of some of the most groundbreaking work at the institute. From long-term data management solutions to supercomputing power, RENCI rebuilds essential computing services to streamline collaboration at an international level.

This could lead to a faster, safer online experience for all — but, like so much of our infrastructure, it’s not something we’ll see with our own eyes.

“If the technology and infrastructure are really good, they vanish,” RENCI Director Stan Ahalt says. “It becomes something you don’t have to think about. It’s almost magical. That’s what we want for science: We want it to be done easily and quickly without a lot of friction.”

Data deluge

Ninety percent of the world’s data has been created in the last two years. As of 2020, each person on Earth generates an estimated 1.7 megabytes of data every single day. And that data is like a houseplant. It needs somewhere to live. It needs someone to take care of it, to integrate it, protect it, and check on it from time to time.

RENCI has developed a solution for institutions to do just that: iRODS.

iRODS stands for integrated Rule-Oriented Data System, software that manages and protects large amounts of data. It’s like the glue on the binding of a book — when it does its job well, users don’t even know it’s there. Without that adhesive, all that’s left is a pile of disordered pages.

“It’s basically invisible,” says Jason Coposky, iRODS’ executive director. “And it protects your data because, eventually, your infrastructure is going to change out, or you’ll need to buy new storage so you need to move your data, which is a very difficult thing to do when you’re dealing with petabytes and millions of files.”

iRODS is the Swiss Army knife of data management — and it does a lot more than protect it from aging technology. It can examine the data over time, improve the speed of data retrieval operations, and set rules for who can and can’t access the data.

“iRODS helps tell the story of the data from front to back,” Coposky says. It establishes a data management policy — the “secret sauce,” according to Coposky — that can assign data to specific media, folders, and even geographic locations based on organizational needs.

“Telling the story of the data is like telling the story of the glaciers,” Ahalt agrees. “It’s an evolving, living, growing, changing thing. And if you want to keep track of those changes, there has to be some way to write the history. A log of all those changes is a powerful way of capturing the evolution of the data.”

Biomedical bartering

Not all data are created equal — some are harder to manage than others, requiring a more robust management system. Medical data, for example, requires a lot more security than other types of information. Plus, it often lives in multiples places.

“Once you start having health issues, you build a lot of data from different medical facilities and if those facilities don’t talk to each other, that’s a problem,” says Stephanie Suber, RENCI’s communications and marketing manager.

And that’s just data for individual people, Suber points out. When research institutions like UNC host ongoing studies involving multiple patients, the data only continue to grow and become more complex. Creating new treatments for disease requires examining all sorts of biomedical data.

“We believe that it’s rarely the case that a single observation or data set contains all of the information required to divine a new approach,” says Chris Bizon, RENCI’s director of analytics and data science. Combining data sets is the key to discovery, he adds.

Launched in October 2016 with contributions from RENCI, the NCATS Biomedical Data Translator Program combines disparate biomedical data to identify groups of patients who are likely to respond or not respond to specific treatments — information that could generate new hypotheses, therapies and procedures, and clinical trials. Just as the discovery of the periodic table transformed the field of chemistry, RENCI researchers believe Data Translator will do the same for the treatment of diseases.

“Our rate of discovery increases when we understand the fundamental principles of biology or medicine,” Ahalt says. “When Translator hopefully helps a doctor understand a genetic pattern and thus understand how to treat a patient better, that’s a direct benefit.”

The Data Translator project has led to a variety of tools. ICEES shares clinical data across institutions without compromising privacy concerns. TranQL helps researchers search biomedical knowledge bases, while ROBOKOP uses algorithms to help them answer questions. BioData Catalyst will house decades of human subject data, making it more accessible to researchers.

“In many ways, RENCI’s ultimate contribution is to make impacts to everyday life, but it may not be one you know or feel immediately,” Ahalt says. “It may be a longer-term impact that shapes the future.”

Environmental computation

Since its inception in 2006, RENCI has also helped build a foundation for storm surge prediction during major weather events like hurricanes. Storm surge occurs when wind pushes water up on land.

On September 14, 2018, intense winds from Hurricane Florence, which hovered over North Carolina’s coast for three days, forced water as high as 13 feet above the shoreline in some places. The storm broke high-water records dating back 65 years for the cities of New Bern, Wrightsville Beach, and Wilmington. In just a few days’ time, all roads to Wilmington flooded and were deemed impassable, isolating the entire city and demanding the rescue of more than 450 people.

“While winds are strong and can do a lot of damage, water is harder to shelter from,” UNC marine scientist Rick Luettich says. “It causes more damage and more deaths.”

Luettich is the co-creator of ADCIRC, a storm-surge modeling system that runs every six hours during active hurricanes. The data produced in these runs helps organizations like the U.S. Coast Guard, U.S. Army Corps of Engineers, Federal Emergency Management Agency, and local emergency management teams make decisions about evacuations, supply locations, and response personnel.

What the general public see when they look at an ADCIRC model is a map of the Eastern United States, along with the projected path and storm surge levels of the most current, active hurricane. But beneath that map, along the western edge of the Atlantic Ocean, sits a grid of tiny triangles. Wind, coastline, and ocean depth data are applied to each point of those triangles using computer code, which allows researchers to calculate potential water levels and where the currents will take that water.

“And the more processors you have available, the more accurate the model you can run,” says Brian Blanton, director of RENCI’s Earth data science. That’s why ADCIRC is hosted at RENCI, which houses a supercomputer that allows Blanton to run the data quickly. Before RENCI opened its doors, Blanton and Luettich ran the ADCIRC model on computer systems that could not accommodate such on-demand simulations. It took them 10 days to process these same runs. Now, it takes just a few hours.

A robust and stable computing infrastructure that doesn’t experience constant hardware failure is vital, according to Blanton, especially when running data that affects people’s lives.

“New technology that helps us squeeze more out of that six-hour time period is what we need now,” Blanton says. “Because things never get simpler; they get more complex — whether or not that’s the computers, the data, the software, or even the people running it.”

A small world, after all

While we may never “see” these technological advances, we’ll certainly experience them — one LEGO block at a time.

Good technology, according to Ahalt, brings us closer, making our seemingly big world a little smaller.

“You know how people gather around a fire and warm their hands and talk to each other and share bread?” he asks. “That’s what good technology does for you, too. It’s a social thing if it’s done well and done with respect. It’s something we can share.”

Ilya Baldin is the director of network research and infrastructure at RENCI.

Stan Ahalt is the director of RENCI and a professor in the Department of Computer Science within the UNC College of Arts & Sciences.

Jason Coposky is the executive director of the iRODS Consortium, located within RENCI.

Stephanie Suber is the communications and marketing manager at RENCI.

Chris Bizon is the director of analytics and data science at RENCI.

Rick Luettich is alumni distinguished professor and director of the Institute of Marine Sciences and lead investigator at the Coastal Resilience Center of Excellence. He is a member of the faculty in the Department of Marine Sciences within the UNC College of Arts & Sciences and in the Department of Environmental Sciences and Engineering within the UNC Gillings School of Global Public Health.

Brian Blanton is the director of Earth data science at RENCI.