A Computer Simulator Could Usher in Next Wave of Supercomputers
Computer scientists are grappling with an existential issue: The next step in computer evolution is upon us, and the technology available now won't be able to keep up.
But Jason Lowe-Power, an associate professor in the Department of Computer Science at the University of California, Davis, may have the beginnings of a solution. With $6.5 million in combined funding from the National Science Foundation, or NSF, and the Department of Energy, or DOE, Lowe-Power aims to scale up gem5, a computer simulation tool, to be able to build the future generation of supercomputers.
"If we keep designing computers as they're designed today, they're not going to get faster, they're not going to get better," he said. "There's very little that we can do to make them better without doing a hardware-software codesign. If we want to see computers get better, we need tools like gem5 that allow us to do these kinds of investigations."
Currently, gem5 is an open-source, community-driven software used in computer architecture research to investigate new computer chip designs, develop hardware-software codesign and perform system-level performance analysis. Universities worldwide employ it for research and as a teaching tool in computer science, and tech companies all over the world like Advanced Micro Devices (AMD), Google, Rivos, Riken and Arm also use it to test their latest technologies.
The project began in 2000 as two separate entities, a central processing unit simulator called M5 from the University of Michigan and GEMS, a University of Wisconsin-Madison project that focused on caches and moving data around the memory system. The projects merged in 2011, the same year Lowe-Power began working on his Ph.D. in computer architecture at UW-Madison, to become gem5.
Lowe-Power used gem5 as his primary research tool throughout his Ph.D. while also making tweaks and improvements. As his colleagues working on gem5 moved on in their careers or to other projects, Lowe-Power saw the need to create formal leadership around it. When he joined UC Davis in 2017, he saw an opportunity to lead the charge and drive the community forward — a mission that can continue with these newly funded projects.
The Next Phase of gem5
The NSF grant will provide Lowe-Power and his project cohorts — including electrical and computer engineering professor Houman Homayoun and computer science professor Vladimir Filkov from UC Davis, and researchers from Georgia Tech, Cornell, and the universities of Washington, Kansas and Wisconsin-Madison — with $3.5 million over four years to increase the overall impact of gem5.
"We're going to add a lot of different kinds of models of modern and future computer hardware, and we're looking at ways to scale up the simulation from its current capabilities, where it's mostly focused on laptops and desktops, to supercomputers," he said.
Lowe-Power is also collaborating on two grants — one for $1.7 million and one for $1.3. million — from DOE's Exploratory Research for Extreme-Scale Science, or EXPRESS, program, which focuses on modeling future supercomputer systems. This year, EXPRESS asked researchers to propose ways of fueling the next step toward a new generation of supercomputers.
"We know that we can't use the technology of today, so we need new kinds of simulation and modeling capabilities to model future technology that we could use for future supercomputers," he said. "The gem5 simulator is a key tool that the different teams in this grant are using to meet that challenge."
One EXPRESS grant is in collaboration with the University of California, Berkeley, and Lawrence Berkeley National Laboratory. The team will be developing models for superconducting circuits, which are complex circuits that have minimal heat output and can run much faster than typical computers. However, these circuits are stored in liquid helium inside a refrigerator near a temperature of absolute zero. The researchers will devise techniques to transfer the data from inside the fridge to the outside by using light and creating models to simulate the circuits.
With the other grant, Lowe-Power is partnering with UW-Madison and Oak Ridge National Laboratory to develop new technologies that can model a supercomputer, aka a 100 exaflop system. (DOE defines "flops" as the measurement of computer performance in floating point operations per second, while "exa" refers to the 18 zeroes indicating how many operations per second the computer is processing: 1 exaflop = 1,000,000,000,000,000,000 flops.)
"Today, gem5 runs about 10 to the sixth times slower than real life, which means if we want to model a 100 exaflop system, we need to somehow improve the performance of gem5 by a factor of 10 to the 12th," he said.
A gem5 for All
According to Lowe-Power, gem5 is not as widely used as it could and should be, partly due to a lack of accessibility. As gem5 grows in its capabilities, Lowe-Power aims to expand the community around and access to gem5.
"We're going to lower the barrier to entry to use gem5, which is going to allow the community to grow and for new and more people to be involved in this," he said. "We're going to generate new ideas, new kinds of computers, new computer hardware that we couldn't have done before because we have this tool enabling it."
One way Lowe-Power plans to do that is by establishing his gem5 boot camp as a yearly event. The first was held in the summer of 2022, and invited 50 first- and second-year Ph.D. students to UC Davis to spend a week learning how to use gem5 as a research tool. Through funding from NSF, the camp was able to pay for the students to be there, which meant those who might have financial barriers to attend were able to participate.
At the conference, Lowe-Power noted, the students not only built their gem5 skill set, which could support them in their research, but they also established a network and community of other researchers using gem5, something many of them don't have at their respective institutions. Lowe-Power recalls how crucial these peer connections were when he was earning his Ph.D.
"When I learned gem5, I was just told, 'Here's gem5. Figure out how to use it,'" he said. "I was lucky because the research group that I was in had a lot of experts in gem5, so I could bug my officemate, I could bug the person in the hall and say, 'Hey, I'm trying to do this thing.' Most people don't have that. Now these students know 49 other students that are all at the same point in their careers, and they will all be going to the same conferences. They were able to establish that baseline network that is really difficult to establish for early career researchers."
Looking Toward the Future
Lowe-Power, who for a long time has seen gem5 as more of a passion project, envisions gem5 being used as the default simulator for all computing research and sees this new round of funding as gem5's gateway to the next step in its evolution.
"I see in five or 10 years that gem5 will be the keystone to enable new paradigms of computing," he said. "These projects are allowing me to nurture gem5 into what I think it can be, and it will enable researchers to study superconducting circuits for supercomputers. To see this tool and its community grow and become richer, that's very satisfying."