1
The way a system is designed is not always how it is used in practice. Codex: Illuminating How Programmers Use Ruby in Practice Ethan Fast, Lucy Wang, Daniel Steffee, Michael Bernstein Stanford University Joel Brandt Adobe Research Applications INTRODUCTION Often, the way a system is designed to be used differs from the way it is used in practice. Programming language idioms evolve over time and users leverage library APIs in undocumented ways for purposes unanticipated by the engineers who originally developed them. Both users and creators of such systems must sometimes ask questions with no concrete answer. What is the best idiom or library to use for a particular task? Does my code follow common practice? How is language being used today? By mining open source software, it is possible to measure how code is being used in practice. We present Codex, a data store that models this knowledge for the Ruby programming language. Codex is a database constructed from over 3 million lines of Ruby code that can be queried across a variety of parameters (e.g. identifier names, method calls, code complexity, and AST nodes) to enable new data-driven applications. We describe three applications that demonstrate how Codex can help programmers: statistical linting, pattern annotation, and an open source Ruby Gem. We thank the Stanford University Computer Science Department and the CURIS Program for their support. CODEX: A Database of Ruby Patterns DETECTING UNLIKELY CODE Sometimes programmers program badly: they write incorrect code or violate language conventions. CodexLint points out unlikely components of code, indicating that they may be incorrect or just unconventional. We show users a statistic that represents how many times we saw their code relative to a more common piece of code, which is often interesting even when the unlikely code is not actually incorrect. We highlight unlikely components of code from these four categories: RUBY GEM STATISTICAL LINTING PATTERN FINDING CODEX EDITOR Function chaining For example, this line will cause unexpected behavior: foo = bar.downcase! We have never seen the function downcase! chained after an assignment. The reason is bar.downcase! modifies bar and returns nil, while the non-bang version bar.downcase will work as intended. Function signatures The arguments in this line should be swapped: arr = Array.new([1,2,3], 2) Once swapped, arr will equal [[1,2,3], [1,2,3]] Block return values Not all blocks use their return values. For example, this line does nothing: arr = arr.each{|word| word.downcase} Instead, changing the method calling the block (replacing each with map) or changing the block's return statement (replacing downcase with downcase!) would work to modify the array. Identifier types "array" is an unusual name for a hash "ptr", "ind", and "count" are unusual names for a string CodexLib is an open source Ruby Gem generated to factor out particularly common idioms into a new standard library. We found 13 instances of this pattern, which extracts a Class from a given name and also handles namespaces: if var0.const_defined?(var1) var0.const_get(var1) else var0.const_missing(var1) end This can be combined into a single method, const_full_get(var1), and would help those trying to use const_get by itself (a problem that has been discussed on StackOverflow). RUBY IDIOMS Many valuable Ruby idioms are not collated in documentation. While standard library documentation is available, common idioms that combine and extend basic functions are not codified in a single place. Instead, they live in the minds of programmers and on the message boards of community forums. PATTERN ANNOTATION Codex identifies common idioms and annotates them with a crowd of human workers. For instance, consider the idiom: sum every element in a list. It cannot be found in any source of official documentation. However, Codex identifies inject { |x,y| x + y } as a common snippet and passes it off to human workers who provide the annotation.

Codex: Illuminating How Joel Brandt Stanford University ...CrowdscaleProgrammingPractice...that models this knowledge for the Ruby programming language. Codex is a database constructed

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Codex: Illuminating How Joel Brandt Stanford University ...CrowdscaleProgrammingPractice...that models this knowledge for the Ruby programming language. Codex is a database constructed

The way a system is designed is not always how it is used in practice.

Codex: Illuminating How Programmers Use Ruby in Practice

Ethan Fast, Lucy Wang, Daniel Steffee, Michael Bernstein � Stanford University Joel Brandt � Adobe Research

Applications INTRODUCTION Often, the way a system is designed to be used differs from the way it is used in practice. Programming language idioms evolve over time and users leverage library APIs in undocumented ways for purposes unanticipated by the engineers who originally developed them. Both users and creators of such systems must sometimes ask questions with no concrete answer. What is the best idiom or library to use for a particular task? Does my code follow common practice? How is language being used today? By mining open source software, it is possible to measure how code is being used in practice. We present Codex, a data store that models this knowledge for the Ruby programming language. Codex is a database constructed from over 3 million lines of Ruby code that can be queried across a variety of parameters (e.g. identifier names, method calls, code complexity, and AST nodes) to enable new data-driven applications. We describe three applications that demonstrate how Codex can help programmers: statistical linting, pattern annotation, and an open source Ruby Gem.

We thank the Stanford University Computer Science Department and the CURIS Program for their support.

CODEX: A Database of Ruby Patterns

DETECTING UNLIKELY CODE

Sometimes programmers program badly: they write incorrect code or violate language conventions. CodexLint points out unlikely components of code, indicating that they may be incorrect or just unconventional. We show users a statistic that represents how many times we saw their code relative to a more common piece of code, which is often interesting even when the unlikely code is not actually incorrect. We highlight unlikely components of code from these four categories: 

RUBY GEM

STATISTICAL LINTING

PATTERN FINDING

CODEX EDITOR

Function chaining For example, this line will cause unexpected behavior: foo = bar.downcase! We have never seen the function downcase! chained after an assignment. The reason is bar.downcase! modifies bar and returns nil, while the non-bang version bar.downcase will work as intended.

Function signatures The arguments in this line should be swapped: arr = Array.new([1,2,3], 2) Once swapped, arr will equal [[1,2,3], [1,2,3]]  

Block return values Not all blocks use their return values. For example, this line does nothing: arr = arr.each{|word| word.downcase} Instead, changing the method calling the block (replacing each with map) or changing the block's return statement (replacing downcase with downcase!) would work to modify the array.   Identifier types •  "array" is an unusual name for a hash •  "ptr", "ind", and "count" are unusual

names for a string  

CodexLib is an open source Ruby Gem generated to

factor out particularly common idioms into a new standard library. We found 13 instances of this pattern, which extracts a Class from a given name and also handles namespaces:

if var0.const_defined?(var1)   var0.const_get(var1) else   var0.const_missing(var1) end

This can be combined into a single method, const_full_get(var1), and would help those trying to use const_get by itself (a problem that has been discussed on StackOverflow).

RUBY IDIOMS

Many valuable Ruby idioms are not collated in documentation. While standard library documentation is available, common idioms that combine and extend basic functions are not codified in a single place. Instead, they live in the minds of programmers and on the message boards of community forums.

PATTERN ANNOTATION Codex identifies common idioms and annotates them with a crowd of human workers. For instance, consider the idiom: sum every element in a list. It cannot be found in any source of official documentation. However, Codex identifies

inject { |x,y| x + y } as a common snippet and passes it off to human workers who provide the annotation.