Coding for Two Audiences: Humans and Computers
Software code is written to be read by both computers and humans. Machines quickly and perfectly understand the computational meaning, while humans read it the same way they read natural language: not as quickly and sometimes incorrectly. With a new $1.2M three-year NSF-funded project, a group of software engineers and social scientists at UC Davis will leverage this bimodality to develop tools that make writing, reading and maintaining code easier and improve the overall programming experience.
The project is led by computer science distinguished professor Prem Devanbu, his colleagues Cindy Rubio González and Aditya Thakur and his cross-campus collaborators Gerardo Con Díaz from the Department of Science and Technology Studies and Emily Morgan from the Department of Linguistics.
“When you write a program, it has two audiences,” explained Devanbu. “Code is meant for human consumption and it’s meant for computer consumption, and the fact that people choose to write code in ways that are easy for human beings to read reflects this bimodality.”
Since code has a human audience, there are millions upon millions of lines of code freely available on the internet that the team can use. With this wealth of data from different types of programs and coders of all experience levels, they can train algorithms to improve the human experience of programming.
Writing for Humans
Making code easy for people to read is a critical part of programming. Humans write, edit, check and maintain code, and the best way to catch potentially catastrophic errors is code review, a proofreading-like process where a second person reads through code and explains what it’s doing. The team plans to train an algorithm that can re-write code so it’s easier to read without changing the computational meaning.
The initial studies, led by Morgan, have shown that these small changes—the equivalent of changing “pepper and salt” to “salt and pepper”—can make a huge difference. Their goal is to train an algorithm that knows this and can read through code to give it a readability score; the higher the score, the harder it is to read. The program can then re-write the code until the readability score drops as much as possible. Improving readability will help coders and code reviewers parse through software and easily find and fix potential errors.
“Because these programs have well-defined meanings, we can change the code around without affecting the meaning,” said Devanbu. “If I can change around my program so the meaning doesn’t change but it’s much easier to read, then code review will be much easier.”
Learning Through Bimodality
The team also plans to use the data to make programming better for beginners. Beginners can be disheartened if their program keeps crashing, but often the problem is a slight error like a typo or a missing parenthesis instead of more significant systematic issues. If the team can train an algorithm that recognizes enough different programs, syntaxes and operations, Devanbu thinks it can work like a spellchecker that can identify spelling and grammatical errors in code. This would keep beginners encouraged and focused on learning how to code instead of finding these mistakes.
“The machine learning models are clever, so if you are converting Celsius to Fahrenheit, for example, the model would have seen that formula many times so it would remember it,” he explained. “If you use variable names that look they’re temperature variable names, then it knows that that’s probably what you meant to do and it would be able to fix that.”
Thinking bimodally about code forces programmers to think both about the software they’re writing and the human context it’s being developed in. Devanbu believes this is a crucial part of computer science curriculum, so for the final part of the project, he will be working with Díaz to develop new curriculum where students learn coding and ethics simultaneously from the beginning of their college careers.
“Programming is a deeply human experience, even though we don’t always think of it that way,” he said. “When you learn to write in a natural language, you’re taught to write in the context of some human experience, like war or romance, and you are asked to think about how that maps onto the way you write. Code shouldn’t be taught any differently.”