Faster data crunching
Photos by Evan Krape December 13, 2018
UD network infrastructure upgrades support data-hungry research
The University of Delaware’s networking infrastructure is getting an upgrade and it is coming just in time.
As UD builds major research initiatives focused on biotechnology, energy, biopharmaceuticals and data science, among others, the need for advanced cyberinfrastructure to support researchers, faculty and students working in these areas becomes increasingly critical.
While existing teaching, learning and operational needs are well supported through UD’s present cyberinfrastructure, current and emerging research needs related to data creation, transfer, archival and analysis are outpacing UD’s Internet connection and bandwidth capacity.
The project, funded through a $447,089 grant from the National Science Foundation’s (NSF) Office of Advanced Cyberinfrastructure (OAC), will expand UD’s network connection to the Internet2 Network and create specific networking configurations to enable researchers to access new high-bandwidth capabilities.
“UD faculty and students conduct high-quality research with national and global relevance,” said Sharon P. Pitt, UD vice president of information technology and chief information officer. “The much-needed boost in connectivity will ensure UD’s place among the top-tier research institutions around the country and offer a state-of-the-art capability to our researchers needing expanded architecture to support their research goals.”
The upgrades also underpin UD’s effective and continued participation in the national research community, which benefits society and improves U.S. competitiveness, Pitt said.
“This networking infrastructure will greatly support research at the convergence of big data and high-performance computing to enable novel scientific discoveries and urgent decision making,” said Cathy Wu, founding director of UD’s Data Science Institute. “It will be instrumental to accelerate collaborative interdisciplinary research and inspire new research directions in data science at UD, across the region and joint global initiatives.”
Scaffolding UD’s strong research platform
Researchers across UD, including those located at the Science, Technology and Advanced Research (STAR) Campus, continue to push the architectural boundaries of UD’s cyberinfrastructure though new data-intensive scientific exploration. Data science, for example, is a powerful tool that helps researchers make sense of numbers, measurements, graphics, statistics and all manner of experimental, theoretical and computational findings. UD’s Data Science Institute brings faculty from the University’s seven colleges together to work collaboratively on a wide range of topics that intersect statistics, computer science, engineering, mathematics, information sciences and numerous related fields.
The University is expanding its current STAR Campus facilities by 200,000 square feet to house enterprises in bioinformatics, biotechnology and biopharmaceuticals — fields that are experiencing transformational growth.
“These are data-hungry fields that increasingly depend on a robust network and powerful computing resources,” said William Totten, UDIT enterprise architect.
Planned enhancements include the creation of a network capable of transferring information at a rate of 100 gigabits per second (Gb/s), a steep increase from UD’s current 10 Gb/s capability.
Not sure what Gb/s actually means in terms of speed? Here’s a quick analogy: To download all of the data in the Library of Congress (about 15 terabytes worth) using UD’s current 10 Gb/s connection would take about a day and a half. Over UD’s new 100 Gb/s link, the same data could be transferred in about four hours.
A film buff instead of a bookworm? In high definition, the Harry Potter franchise of movies contains nearly 20 hours of footage totaling about 75 gigabytes (GB) of data. To download the entire collection at 10 Gb/s speeds would take a couple of minutes, but at 100 Gb/s it would take little more than 10 seconds.
This boost in speed is exactly what UD researchers engaged in heavy computational research — and the colleagues and students working with them — need.
The infrastructure upgrade will remove delays, experienced by faculty on campus with computationally intensive research, that result from a smaller pipeline connecting UD to outside resources, such as supercomputing centers and colleagues at universities and national or international laboratories. Likewise, greater transfer capacity will limit downtime between computational runs and analysis.
Arthi Jayaraman, associate professor of chemical and biomolecular engineering, and students and postdocs working in her research lab are looking forward to fewer lulls between successive simulations in their work. Jayaraman’s research relies on molecular level simulations of polymers to design materials for a variety of applications in the biomedical, optical and energy fields.
As part of this work, researchers in Jayaraman’s lab transfer files several times a day between UD’s Farber high-performance computing cluster and supercomputers at NSF and external institutions. Often, waiting for files to transfer can take hours, hampering efficiency, collaboration and engagement with colleagues, sometimes even research advances.
“Time spent waiting for file transfers could be better spent toward analyzing and interpreting the data from these polymer simulations, and thus, accelerating materials innovation and engineering,” Jayaraman said.
The expanded infrastructure will be particularly helpful for those who require a scalable, secure research environment tailored to support growing research requirements, such as bulk data transfers, data visualization and remote experimental control.
The new high-bandwidth capabilities will improve sharing of large, time-sensitive data sets within what’s known as the Science Demilitarized Zone (Science DMZ). For example, after processing samples, many UD biotech labs currently ship the digital data back to customers on external hard drives. Now, researchers will be able securely download the data digitally.
In addition, a dedicated file transfer server in the Science DMZ will be linked to UD’s high-performance computing clusters (Mills, Farber and Caviness), maximizing the speed and secure transfer of shared datasets across both infrastructures.
Scientists at the Delaware Biotechnology Institute’s Sequencing and Genotyping Center, for example, routinely receive biological samples for sequencing from researchers in the United States, France, Peru, Turkey, India and other countries. According to Center Director Bruce Kingham, speeding up transmission times means “more sampling and more support of important research across the country.”
In UD’s Bartol Research Institute, based in the Department of Physics and Astronomy, researchers study turbulence and magnetic reconnection in the heliosphere. The increased bandwidth will allow more simulation data to be archived and analyzed at UD, thus making it more readily available for analysis.
This is good news for William Matthaeus, an astrophysicist whose expertise extends to research in turbulence theory, solar wind, space plasma physics and numerical simulations.
“The new generation of simulations and associated research projects and space missions that we are involved in require immediate access to hundreds of Gigawords or even Terawords of data to manipulate, analyze and visualize physical results,” said Matthaeus, Unidel Professor of Physics and Astronomy.
“Many research projects require thousands — if not millions — of file accesses and transfers per week, without which, the scientific community worldwide would have limited access to work being done and data being collected by colleagues, as well as critical databases and data analysis results that support productivity, further understanding of the field, and allow for timely knowledge transfer.”
Faculty, staff and students who do not require capabilities beyond UD’s current connections will still feel a noticeable boost when using UD’s growing array of cloud-based services (i.e., Google Email, Microsoft’s Exchange Online and Office 365, and Zoom) or systems hosted off-campus, such as research clusters, state and federal data repositories or systems maintained by collaborating research institutions and organizations.
Broadening UD participation in future improvements
This expanded capacity also will enable UD to more broadly connect with the Internet2 Network, a consortium of members from U.S. higher education institutions, leading corporations, government agencies and community anchor institutions that exchange ideas, transfer knowledge, engage in collaborative research and develop network technologies to power the Internet for years to come. Access to this vibrant community will help UD continue to grow its research platform and broaden collaborations both globally and with connected community members.
According to Pitt, the University’s IT group is actively talking with researchers and educators on campus to ensure UD has access to the best resources and services available, too.
“We are laying the groundwork for UD’s researchers to conduct their research and are also participating in broader discussions to improve connectivity for research, including Internet2’s Next Generation Infrastructure initiative and a multi-institutional effort to build a National Research Platform for the East Coast,” said Pitt.
The technical components of the upgrade are expected to be completed by early 2019, with educational sessions for research groups across campus interested in maximizing the advantages offered by the new network capacity to be scheduled later in the year.