Skia: Shedding Light on Shadow Branches

Researchers at Texas A&M University developed a new technique to improve efficiency and performance in computer processors.

April 24, 2025 By Katie Satterlee

Engineering News

engineeringnews@tamu.edu

Two men standing in front of lab equipment. — Chrysanthos Pepi and Dr. Paul Gratz standing in front of HPRC server clusters. | Image: Hayden Schonhoeft/Texas A&M Engineering

What happens when trailblazing engineers and industry professionals team up? The answer may transform the future of computing efficiency for modern data centers.

Data centers house and use large computers to run massive amounts of data. Oftentimes, the processors can’t keep up with this workload because it’s taxing to predict and prepare instructions to carry out. This slows the flow of data. Thus, when you type a question into a search engine, the answer generates more slowly or doesn’t provide the information you need.

To remedy this issue, researchers at Texas A&M University developed a new technique called Skia in collaboration with Intel, AheadComputing, and Princeton to help computer processors better predict future instructions and improve computing performance.

The team includes Dr. Paul V. Gratz, a professor in the Department of Electrical and Computer Engineering, Dr. Daniel A. Jiménez, a professor in the Department of Computer Science and Engineering, and Chrysanthos Pepi, a graduate student in the Department of Electrical and Computer Engineering.

“Processing instructions has become a major bottleneck in modern processor design,” Gratz said. “We developed a new technique, Skia, to better predict what's coming next and alleviate that bottleneck.”

A common problem for modern data center workloads is that the instruction stream – the steps a computer must take for processing – can be too large or difficult to process. Skia, a Greek word for shadow, can not only help better predict future instructions, but based upon that information, it can improve the throughput of instructions on the system. Throughput refers to units of completed processing per units of time.

“Think of throughput in terms of being a server in a restaurant,” Gratz said. “You have lots and lots of jobs to do. How many tasks can you complete, or how many instructions can you execute per unit time? You want high throughput, especially for computing.”

Processing instructions has become a major bottleneck in modern processor design. We developed a new technique, Skia, to better predict what's coming next and alleviate that bottleneck.

Dr. Paul V. Gratz

Improving throughput can lead to quicker performance and less power consumption for the data center.

“There are new bottlenecks in data center workloads associated with the instruction footprint and by fixing these, we can make the hardware better mapped and suited to that workload,” Gratz added. “If we make it up to 10% more efficient, a company previously needing to make 100 data centers around the country, now only needs to make 90, which is 10 less data centers. That's pretty significant. These data centers cost millions of dollars, and they consume roughly the equivalent of the entire output of a power plant.”

In data centers, modern processors improve efficiency by predicting instructions and retrieving them before they’re needed, relying on a system known as Fetch Directed Instruction Prefetching (FDIP). FDIP uses Branch Prediction Unit to anticipate and fetch instructions.

However, as data center applications grow more complex, issues can occur when the Branch Target Buffer (BTB), which helps to monitor and track instructions, faults. This hinders FDIPS’s effectiveness, causing incorrect predictions and cache pollution. Many of these missed branches, termed “Shadow Branches,” exist in previously fetched cache lines but aren’t being used by the current instruction sequence and remain undecoded.

Skia identifies and decodes these shadow branches in unused bytes, storing them in a memory area called the Shadow Branch Buffer, which can be accessed alongside the BTB.

“What makes this technique interesting is that most of the future instructions were already available, and we demonstrate that Skia, with a minimal hardware budget, can make data centers more efficient, nearly twice the performance improvement versus adding the same amount of storage to the existing hardware as we observe,” Pepi said.

Their findings, “Skia: Exposing Shadow Branches," were published in one of the leading computer architecture conferences, the ACM International Conference on Architectural Support for Programming Languages and Operating Systems. The team also traveled to the Netherlands to present their work to colleagues from around the globe.

Other collaborators on the project include David I. August, a professor in the Department of Computer Science from Princeton University, Krishnam Tibrewala, a graduate student in the Computer Science and Engineering Department at Texas A&M, Gilles Pokam, a senior principal engineer at Intel Corporation, and Bhargav Reddy Godala and Gino Chacon, senior central processing unit architects at AheadComputing.

Funding for this research is administered by the Texas A&M Engineering Experiment Station (TEES), the official research agency for Texas A&M Engineering.