Loguinov receives NSF grant for big data research

Image of Dmitri LoguinovDr. Dmitri Loguinov, a professor in the Department of Computer Science and Engineering at Texas A&M University, was recently awarded a grant from the National Science Foundation for his work centered on high-performance virtual-memory streaming.

Modern programs commonly process large amounts of input and operate on data that resides in external memory, for example hard drives and network. One particular type of sequential access, streaming, has become dominant in such applications, especially in a now-ubiquitous computational model known as MapReduce. Traditionally, however, operating systems have provided a poor input/output (I/O) interface for streaming and required tedious, nonreconfigurable, and error-prone code development.

Loguinov’s project aims to create algorithms for a virtual-stream interface that would enable development of big data software that is more easily managed, simpler to understand, inherently faster and less susceptible to bugs. Besides I/O, his work aims to provide novel abstractions for moving data between threads, which are parallel parts of a program that execute within the space of a single process.

Operating systems have used memory paging for many decades; however, the idea of exposing an interface that offers user-level processes a true streaming experience has not been widely considered before. Loguinov’s research shows that this is not only possible, but also highly desirable in data-intensive applications.

“In a nutshell, virtual streams are a great way to speed up transfer between threads, increase efficiency of external-memory algorithms, simplify the programming interface and improve various applications that require computing at large scale,” Loguinov said.

This research can not only address the key deficiencies in the current I/O model, but it also offers additional benefits, such as the ability to reuse existing libraries on input larger than RAM, significantly faster sorting and more scalable MapReduce computation.

The initial framework, where the process directly manages its memory from user space, can be deployed immediately; however, Loguinov expects the project’s results to eventually be of interest to operating system developers. With additional help from the operating system, streams should support even faster and lower overhead operation, lifting the various barriers that exist today.

This project spans a variety of areas in computer science, including operating systems, algorithms, networking, distributed systems and architecture. Loguinov credits development of the initial concepts and prototypes to undergraduate students at Texas A&M, especially John Keech and Yuan Yao, for their relentless pursuit of this topic despite their heavy course load.