Humanities take digital leap with Mellon Foundation grant

Photo of Furuta and Gutierrez-OsunaAn interdisciplinary team of Texas A&M University faculty recently received a two-year, $734,000 development grant from the Andrew W. Mellon Foundation to improve scholarly access to early modern texts. Funds from the grant will go toward the University’s Early Modern Optical Character Recognition (OCR) Project (eMOP), created to develop new methods and tools to improve the digitization, transcription and preservation of early modern texts. 

Led by faculty members in the College of Liberal Arts and Dwight Look College of Engineering, the project research team includes Dr. Laura Mandell, professor in the Department of English and director of the Initiative for Digital Humanities, Media, and Culture (IDHMC); Dr. Richard Furuta, professor in the Department of Computer Science and Engineering; and Dr. Ricardo Gutierrez-Osuna, an associate professor also in the Department of Computer Science and Engineering. Other project leaders include Dr. Todd Samuelson and Mr. Anton DuPlesis, book historians at Texas A&M’s Cushing Memorial Library & Archives. 

“The peculiarities of early printing technology make it difficult for OCR software to discern discrete characters and, thus, to render readable digital output,” said Mandell. “Receiving this grant will make it possible to improve the machine-translation of digital page images with cutting-edge crowd-sourcing and OCR technologies, both guided by book history.”

eMOP promises to improve the quality of digital surrogates for early modern texts by creating a database of early modern fonts, by training software that mechanically types page images (OCR) to read those typefaces and by creating crowd-sourced correction tools. 

“Our goal is to further the digital preservation processes currently taking place in institutions, libraries and museums globally,” stated Mandell.

The eMOP is part of the IDHMC, an initiative in the College of Liberal Arts developed as one of the eight Initial University Multidisciplinary Research Initiatives in Texas A&M’s Academic Master Plan. In collaboration with participating institutions and individuals, the aim of the IDHMC is to aggregate and re-tool many of the recent innovations in OCR in order to provide a stable community and expanded canon for future scholarly pursuits. In concert with the Advanced Research Consortium (ARC) and its digital hubs—NINES, 18thConnect, ModNets, REKn and the Medieval Electronic Scholarly Alliance —eMOP has received permission to work with more than 300,000 documents from Early English Books Online (EBBO) and Eighteenth-Century Collections Online (ECCO), totaling 45 million page images of documents published before 1800.

The IDHMC is committed to the improvement and growth of digital projects and resources. The two-year funding support for eMOP from the Andrew W. Mellon Foundation will help IDHMC educate, preserve and develop the future of humanities scholarship for the scholarly community.