A 2009 paper led by alumnus Christian Bird ’10 and professors Premkumar Devanbu and Vladimir Filkov received a test-of-time award for its long-lasting impact on the field of software engineering from the Association for Computing Machinery’s 27th Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FEC 2019).
The paper, co-authored by undergraduates Eirik Aune ‘09 and John Duffy ’10 and Adrian Bachmann and Professor Avi Bernstein of the University of Zurich, discovered bias in bug fix reporting data that can make the study and exploitation of software defect data less effective.
“People assumed that bug fix data was representative of actual changes in the source code, but it’s not,” said Devanbu. “No one was talking or thinking about it, but it has the potential to affect everybody’s experimental work.”
Software with millions of lines of code can’t be analyzed by hand, so developers rely on automatic tools to identify bugs so they can be fixed faster and more efficiently. These tools are trained using bug-fix data from bug databases, which contain information on what the problem is, where the bug is in a given code and who’s working on it.
Developers who fix a bug are supposed to mark it fixed in the database and commit something like, “fixes bug #245” to link the two together and form a data point, but they often don’t, especially for more severe bugs. This leads to heavily biased data where over 60% of minor fixes are reported while only 15-20% of major ones are.
“This was a revelation,” said Filkov.
Bug-finding tools should be able to identify all bugs in a code regardless of severity, but this biased data leads to biased performance where the system would mostly find only minor bugs.
“Any statistical method is reliant on having a fair sample, because you’re assuming a sample is a good representative of the population,” said Devanbu. “If your sample is not, you get unexpected results.”
In the 10 years since the paper’s publication, the software community has become more serious about bug-tracking and reporting and embraced analytical solutions to these problems. Though there is still a ways to go, Devanbu and Filkov have seen improvement as the field continues to grow and flourish.
“You need good, well-balanced data sets for AI revolution that’s coming,” said Filkov, “because it’s all going to be data, and if that data is skewed, the results will be skewed.”
ESEC/FEC 2019 is one of the world’s premiere software engineering conferences, as it brings together researchers, practitioners and educators to discuss the latest innovations, trends and challenges in the field. Devanbu will travel to Tallinn, Estonia from August 26-30 to receive the award and give a plenary talk at this year’s conference.
This is Bird and Devanbu’s fourth (and Filkov’s third) award for a durably influential paper and the second in 2019, as they received an award from the Mining Software Repositories 2019 conference this spring.