Could too much documentation be a bad thing in software engineering?

May 2, 2019 By Lorian H Dusek

Felysha Walker

Communications Specialist II

979-458-4412

Computer Science and Engineering

Stephen Hughes and Scott Kolodziej at the SIGCSE 2019 conference. — Dr. Stephen Hughes and Scott Kolodziej in front of a SIGCSE presentation screen | Image: SIGCSE 2019

Scott Kolodziej and a team of undergraduate students wanted to go back to basics and study something that everyone took for granted, what is the best way to document your computer code? Everyone agrees that good documentation is important, but everyone seems to have a slightly different opinion on what exactly that means.

“I led a team of undergraduates in designing and executing an experiment to help determine exactly that,” said Kolodziej, a graduate student in the Department of Computer Science and Engineering at Texas A&M University. “We recruited students to participate in the study, which involved reading some code and answering questions to determine how well they understood what the code did.”

The team used code samples that were functionally the same, but with different documentation, allowing them to be able to correlate higher scores and faster performance with which documentation styles were better than others.

“This research is exciting to me because even with a relatively small amount of data we had statistically significant results that really help to inform how we should document the code we write,” Kolodziej said. “Ultimately, we’ve helped to answer the question of what constitutes good documentation: first, good naming, and second, good comments.”

They also uncovered an interesting correlation, poor documentation seems to lead to a more correct understanding of the code at the cost of time.

“While we’re not advocating that you document your code poorly, it may imply that too much documentation distracts the reader from what the code actually does, or lulls them into a false sense of understanding, even when the documentation is not meant to mislead,” Kolodziej said.

Opinions vary about what makes documentation good, clear and understandable. But hard evidence in support of these opinions can be hard to come by. Kolodziej and his team want to remove the anecdote and opinion and replace them with objective data about what makes quality software.

“Ultimately, we would like to add to a statistically and methodologically sound foundation to software engineering,” Kolodziej said. “In this case, what does good documentation look like?”

The project started as part of the Aggie Research Leadership/Scholars Program, an on-campus program to bridge the gap between undergraduates interested in research and graduate students looking for mentoring opportunities. Kolodziej built his team of seven undergraduates and met weekly to plan and design the details of the study. Once the experiment was designed, another 24 undergraduate students were recruited to participate in the study.

“I first became interested in empirical software engineering after reading some papers by Dr. Andreas Stefik from the University of Nevada, Las Vegas,” Kolodziej said. “I wanted to help contribute to the body of empirical evidence underpinning software engineering by conducting my own experimental study.”

Scott Kolodziej stands with 2019 SIGCSE committee members. — Scott Kolodziej stands with fellow 2019 SIGCSE winners in the graduate student category of the ACM student research competition. | Image: SIGCSE 2019

Kolodziej’s work resulted in a recent first place finish in the graduate category of the Special Interest Group on Computer Science Education (SIGCSE) Association for Computing Machinery (ACM) Student Research Competition. The results of his findings also included that better variable and function naming was much more effective than only using comments to code.

“This was very surprising, especially given the traditional importance given to commenting code,” Kolodziej said. “It implies that a software engineer should spend at least as much time coming up with descriptive and clear names for their variables and functions than simply commenting their code and hoping that makes up for names like ‘x’ and ‘y’.”

Kolodziej collaborated with Dr. Elizabeth Kolodziej from the Department of Statistics, and Dr. Jeff Huang and Dr. Tim Davis from the Department of Computer Science and Engineering. His undergraduate team included Spencer Anderson, Polina Golikova, Yara Mohamed, Sager Patel, Sahil Patel, Akash Ramesh, all from computer science and engineering; and Raghav Tankasala in the Department of Industrial and Systems Engineering.

Computer Science and Engineering