As part of a three-year research project funded by the National Science Foundation, Dr. Chao Tian, associate professor in the Department of Electrical and Computer Engineering at Texas A&M University, and graduate student Dr. Ruida Zhou are seeking to provide a better balanced and comprehensive understanding of the tradeoffs between different constraints in general information retrieval systems.
In addition to the project, a paper was published by Zhou, the Chevron Fellow for the department, discussing some of their findings, specifically furthering knowledge on capacity-achieving private information retrieval codes from maximum distance separable (MDS)-coded databases with minimum message size. The paper was recently awarded the Institute of Electrical and Electronics Engineers Data Storage Best Student Paper Award for 2020-21.
“It is a great honor to receive the award and to have our work recognized,” said Zhou.
According to Tian, an information retrieval system can be viewed as a general mathematical model for data storage systems, where digital information can be stored and retrieved efficiently. Users send queries to the system, and the information retrieval system will provide answers to user queries after appropriate operations. Information retrieval systems have many applications and are referred to differently depending on the application.
“For example, traditional databases store simple structured data records, and mathematical operations are sometimes required on these records to provide answers to user queries,” said Tian. “Online video services are usually stored on distributed servers, and the video content is delivered through the internet. Such a system is referred to as a content delivery network.”
However, systems are inhibited by constraints — conditions under which the system must operate, such as the amount of storage space, privacy requirements, security requirements and the computation complexity requirement of the system.
Throughout the project, Tian and Zhou studied several significant constraints in private information retrieval systems. Although there has been considerable research on information systems, without a good understanding of the fundamental limits of such systems, it is unclear what performance can never be achieved due to restrictive constraints and what performance is not yet achieved because there are no known methods to achieve it.
The research makes the private retrieval of information more efficient and flexible.
To understand these limitations, Tian and Zhou looked at efficient coding approaches to develop converse bounds that describe fundamental limits. Their findings showed that certain performance is only possible when the system operates efficiently, regardless of innovation in coding.
“A deeper understanding has led to better and more flexible code designs that allow the system to operate more efficiently,” said Tian.
In his paper, Zhou delves further into capacity-achieving private information retrieval codes from MDS-coded databases with minimum message size. MDS codes are a type of efficient storage code widely used in data storage systems to combat device failures.
Using this knowledge, Zhou developed a new code construction that uses the shortest number of symbols per record (message). All previous capacity-achieving code constructions require many data symbols; each retrieved file or message needs to be big enough for these codes to be efficient. This implies that the code can be used in many more scenarios than previous codes since previous codes can only be applied to very large files.
“The research makes the private retrieval of information more efficient and flexible,” said Zhou. “It could be applied to information retrieval systems such as medical data, where privacy and efficiency are critical.”
Although their combined research findings are theoretical, they expect the research to be applied in future information retrieval systems since privacy has become more and more critical due to the prevalence of digital data and user data collection.
“The research may inspire new approaches to designing codes that achieve capacity while using fewer resources and improving system performance,” said Zhou. “This could lead to more efficient and secure data storage and transmission in various applications.”