UD researchers mine words to help programmers
Vijay Shanker and Lori Pollock, both professors of computer and information sciences, have received a $400,000 NSF grant to use natural language processing to improve software maintenance tools so that programmers can spend less time troubleshooting and more time writing new code.
1:59 p.m., Oct. 5, 2007--Two University of Delaware scientists have received a $400,000 National Science Foundation grant to help computer programmers maintain large and complex systems by mining software for important data that often goes overlooked--the simple words used in naming software components.

The project is aimed at analyzing how programmers have named the various components of their software by improving analytical tools, thereby easing system maintenance and freeing programmers to spend less time troubleshooting and more time writing new software.

Leading the research team, which includes both UD graduate and undergraduate students, are Lori L. Pollock and Vijay K. Shanker, both professors in the Department of Computer and Information Sciences.

“Today's software is so huge and complex that programmers spend most of their time maintaining existing software as opposed to writing new software,” Pollock said. “We hope to provide tools for software evaluation and maintenance that will make their job easier. That, in turn, will make software less expensive and more reliable for the consumer.”

“We are processing natural language use in these programs, looking at the language the programmers have used to name the components,” Shanker said. “We have found a lot of useful information in the way programmers select names.”

He said the project takes on added importance given the movement to open source programs, which must be more readable so that other users can easily modify the programs.

Pollock said the researchers have had much positive reaction to their work during recent workshop presentations, adding that the future is unlimited. “This just keeps exploding as we constantly find new applications for the extracted information,” she said.

The research centers on improving the quality of various tools that developers use throughout the lifetime of a large software system. The team is particularly interested in helping programmers maintain software that already exists because a considerable amount of time is spent maintaining existing software systems, Pollock said.

In fact, it has been estimated that because of the size and complexity of modern software and increased code reuse, between 60-90 percent of programming resources are devoted to modifying applications to meet new requirements or to fix discovered bugs.

To make modifications or fix bugs, programmers first must identify the concept that must be changed. Then they must locate and comprehend it before carefully implementing a change in the code.

Software engineers increasingly rely on available software tools to automate maintenance tasks as much as possible. However, Pollock said that despite all of the available automated support, recent studies have shown that more development time is still spent reading, locating and comprehending code than actually writing code.

Pollock and Shanker believe that software maintenance tools can be significantly improved by adapting natural language processing to source code analysis.

The researchers said the approach is novel in that they are analyzing how programmers have named various components of their software. For example, the appearance of words such as "store" and "write" in naming components indicate “saving.” While these words are not synonymous in normal English, they are used interchangeably in programs. By applying, integrating, and adapting the analysis of the use of natural language, such as English, the researchers are able to improve search and program navigation tools.

Pollock and Shanker evaluate their newly developed strategies by designing and conducting experimental studies of the use of the tools by software developers, with one evaluation involving the Quantum Leap Innovations firm that was founded by UD graduates and is headquartered in the Delaware Technology Park.

The professors said that although they have been working in the same department for 15 years, this is the first time they have collaborated on research. The project was spurred by a UD graduate student, David Shepherd, who has since earned his doctorate and taken a postdoc position at the University of British Columbia. Shepherd was studying software engineering tools and, after taking a course with Shanker, realized Shanker's work in natural language processing could be coupled with Pollock's in optimization and automatic program analysis.

Pollock joined the UD faculty in 1992. She received bachelor's degrees in computer science and economics from Allegheny College, and a master's and doctorate in computer science from the University of Pittsburgh.

Shanker joined the UD faculty in 1987. He received bachelor's degrees in physics from the University of Madras and in electronics from the Indian Institute of Science, a master's in automation from the Indian Institute of Science and a doctorate in computer science from the University of Pennsylvania.

Article by Neil Thomas