8
Emacs for Data Science Data Science Data Engineering Health Data Science contact us Insight Overview White Paper Fellows Blog Apply Emacs for Data Science Robert Vesco June 18, 2015 Robert Vesco Data Scientist, Bloomberg LP Insight Fellow, 2015 University of Maryland Robert Vesco is an alumnus from the January 2015 session of Insight in New York City. He recently received his Ph.D. in Management from the University of Maryland. In the following post, which originally appeared on his personal blog, Robert discusses emacs as a tool for data scientists. Robert is now a data scientist at Bloomberg LP. If you want an editor that works with R, python, SAS, Stata, SQL and almost any other data science language, if you want an editor with IDE-like features, if you want an editor that works on any platform and as well as on the terminal, if you’re a fan of literate programming, or if you want an editor that is highly customizable and will be around after most editors have come and gone, then you’d be hard pressed to find anything better than emacs. Each programming language has a text editor or IDE that is well suited for that language. If you work in exclusively in R, you might want to work in Rstudio. If you work in python, you might be tempted by Spyder. Chances are there is a specialized IDE for whatever language you typically work in. But that’s the rub. What if you want to work in another language? Or combine languages? You end up using several IDEs, but not knowing them well. Plus, once they fall out of favor or stop being updated, your hard-gained knowledge is lost. At the other end of the spectrum there are text editors like notepad++ and sublime. These work with 1

Emacs for Data Science

Embed Size (px)

DESCRIPTION

Emacs

Citation preview

  • Emacs for Data Science

    Data Science Data Engineering Health Data Science contact us

    Insight Overview White Paper Fellows Blog Apply

    Emacs for Data Science

    Robert Vesco

    June 18, 2015

    Robert VescoData Scientist, Bloomberg LPInsight Fellow, 2015University of Maryland Robert Vesco is an alumnus from the January 2015 session of Insightin New York City. He recently received his Ph.D. in Management from the University ofMaryland. In the following post, which originally appeared on his personal blog, Robertdiscusses emacs as a tool for data scientists. Robert is now a data scientist at Bloomberg LP.

    If you want an editor that works with R, python, SAS, Stata, SQL and almost any other data science language,if you want an editor with IDE-like features, if you want an editor that works on any platform and as well ason the terminal, if youre a fan of literate programming, or if you want an editor that is highly customizableand will be around after most editors have come and gone, then youd be hard pressed to find anything betterthan emacs.

    Each programming language has a text editor or IDE that is well suited for that language. If you work inexclusively in R, you might want to work in Rstudio. If you work in python, you might be tempted by Spyder.Chances are there is a specialized IDE for whatever language you typically work in. But thats the rub. Whatif you want to work in another language? Or combine languages? You end up using several IDEs, but notknowing them well. Plus, once they fall out of favor or stop being updated, your hard-gained knowledge islost. At the other end of the spectrum there are text editors like notepad++ and sublime. These work with

    1

  • just about any language you can imagine and with some add-ons you can get additional features, but theytend to be limited to certain platforms and customization is often non-trivial.

    A modern data scientist often has to work on multiple platforms with multiple languages. Some projects maybe in R, others in Python. Or perhaps you have to work on a cluster with no gui. Or maybe you need towrite papers with latex. You can do all that with Emacs and customize it to do whatever you like. I wont liethough. The learning curve can be steep, but I think the investment is worth it.

    Below are some key features that I think make Emacs an excellent editor for any data scientist.

    IDE-like features

    For most programming languages, you get out-of-the-box syntax highlighting. Packages like ESS and Elpyprovide additional features like autocompletion, documentation and debugging capabilities. The number ofIDE features available will vary by language, but at minimum there is probably syntax highlighting and someform of autocompletion.

    Figure 1: Autocompletion

    One of the things that I enjoy is easy access to help and function parameters . . . which often also come withautocomplete.

    Figure 2: Help for Functions

    2

  • Figure 3: Parameter help for Function

    Enough with the print statements already and debug that R and python code!

    Figure 4: Interactive debugging with conditional breakpoint

    One of the features that first sold me on Emacs was interactive commands. With a keyboard short cut youcan send a buffer, function, paragraph or line to the interpreter. Let me be clear you dont even have tohighlight the code. This saves you a ton of time when youre doing statistical analysis1.

    Figure 5: Interactive Commands

    SQL Too

    3

  • Do you work with databases? Many of the same benefits mentioned above also apply to sql. Work with sqlite,postgresql, mysql and other databases interactively. Do you have a long SQL statement you are debugging?No problem. Iterate quickly.

    Figure 6: Interactive SQL

    Org mode / Literate Programming

    Do you write publications? Do you want to keep your code and paper together? You a believer in reproducibleresearch? With emacs you can put any language you want in your document. While Rstudio allows this also,youre limited to just R and latex.

    Figure 7: Literate Programming: Code & Stata

    do you need latex? No problem.

    #+BEGIN_LaTeX\frac{3}{4}#+END_LaTeX

    They key to this magic is a monster package called org mode. It is one of emacs killer features. You can alsouse this to organize your code . . . or your life.

    $$\frac{3}{4}$$

    Terminal/remote editing

    Sometime you need to remote into a server. Or perhaps you are working on a cluster with no gui and youneed to interactively debug your scripts.

    4

  • Figure 8: Works in the terminal just as well

    Interacting with the shell

    Is there are terminal command you wish you could run? In emacs you can run terminal commands easily.But what makes this feature super cool is that it can operate on your text. You can select a region of code,send it to a terminal command and have that stdout replace the text in your buffer!

    5

  • Figure 9: Using SED to find and replace text in the buffer

    Rectangle Editing

    Data scientists often work with tabular data. Sometimes you may want to delete or move a column around.Or perhaps there is a block of white space you need to change.

    Figure 10: Using rectangle mode to alter blocks of text

    Everything at your finger tips

    6

  • Emacs has numerous packages that allow you to search and find files, functions and anything else that youcan imagine. But by far the best is helm. With just a few keys you can instantly find what you are lookingfor. I couldnt do it justice, but this demo gives you a taste for the amazing things it can do.http://tuhdo.github.io/helm-intro.htmlAny feature you wantPerhaps youre wedded to sublimes multiple cursors? You can get it: http://emacsrocks.com/. Or perhapsyoure a long time vim user? Evil Mode gives you the editing power of Vim with the utility of emacs. Ifyoure a git user, Emacs has magit, which makes working with git a joy. If there is something that it doesnthave, check for pacakges, else emacs is the most customizable editor you will find. Almost everything aboutit can be made to work your particular work flow.30+ years old and a large user baseEmacs has been around a long time. Code that was written a decade ago mostly still works. And every yearits getting better. However, emacs 24 is amazing. If you tried emacs years ago, you should give it anothertry. It now has package management built in, so you can easily add testing packages. Importantly, there isno sign that emacs is going away anytime soon and its free. It will likely be around for at least anotherdecade if not more.So what are the downsides?Legacy code on the intertubes confuses peopleEmacs has been around a long time. Emacs 24 was a huge improvement, but it also broke a lot of things.Same goes for Org-mode between versions 7 and 8. A lot of stuff on the intertubes will lead you astray andfrustrated if youre not aware.Emacs-lisp for customizationI actually enjoy working with lisp because it is so different from other languages I work with. However, manyothers would prefer using a language like python.Not noob friendlyEmacs is not for the faint of heart. Depending on where you install it, you may have no gui to guide you atall. And even if you do, its likely to be spartan. Moreover, while it can be customized quickly to meet yourneeds, people who are starting off with Emacs may fail to see its appeal.To help ease this process though, there are several starter packages to enable useful features out-of-the-box. Forscientists, Kieran Healys starter package might be useful: http://kieranhealy.org/resources/emacs-starter-kit/Another useful package is prelude: https://github.com/bbatsov/preludeIf youre on a Mac, Ive heard Aquamacs will keep you warm and comfy: http://aquamacs.org/Most of these will give you the power of Emacs, quickly. Personally, I prefer to build my Emacs up by scratchso it does what I want it to do and no more. But these packages are great ways to get a feel for its power.Multiple packagesFor data scientists, Emacs comes with many tools out of the box, but there are a variety of packages thatare focused on specific languages. For those working in R, Stata, Julia or SAS http://ess.r-project.org/ isessential. It provides a whole framework for working with statistical applications.Unfortunately, if you decide you want to work with Python or Scala be prepared to experiment with severaldifferent packages.For instance, while Emacs has basic Python support, you probably want linting, refactoring or other usefulfeatures. Many packages have tried to implement these features, some better than others. Personally, Ilike elpy, https://github.com/jorgenschaefer/elpy, but its not perfect. For Ipython, there is the IpythonNotebook for Emacs, https://github.com/millejoh/emacs-ipython-notebook/ or Ipython for Org-mode,https://github.com/gregsexton/ob-ipython.

    7

  • So while there is likely a package, or several, for any language you want, the downside of options is that youhave to wade through them. It can be painful sometimes.

    What am I missing?

    While I tried to include most of the features that I think would appeal to data scientists, let me know if Imissed any killer feature and Ill try to include it here. https://twitter.com/robertvesco

    2015-01-05

    Footnotes:1

    Like many other features this will depend on the package you install. That said, its easy to implement thisfeature for your favorite language

    Share Tweet

    Find out more about the Insight Data Science Fellows Program.

    If you are interested in collaborating with an Insight Fellow on a startup data project in a future session,please email us at [email protected].

    index

    BeautifulCity.me: Discovering Street Art with Data

    Overview White Paper Fellows Blog Jobs Apply

    Email LinkedIn Twitter Facebook Extra Filler

    2014 | Insight Data Science [email protected] Mountain View, CA

    8

    InsightRobert VescoJune 18, 2015Footnotes:BeautifulCity.me: Discovering Street Art with Data