Upload
codementor
View
1.126
Download
3
Embed Size (px)
DESCRIPTION
Codementor Office Hours: https://www.codementor.io Pup is a flexible command line tool written in Go for parsing HTML. It reads from stdin, prints to stdout, and allows the user to filter parts of the page using CSS selectors. Inspired by jq, pup aims to be a fast and flexible way of exploring HTML from the terminal. Pup was on the top of Hacker News when it debuted. On 10/15 at 11am PDT / 2pm EDT, Pup's creator Eric Chiang hosted a Codementor Office Hours on Go and command line programming. An intro to command line programming and building tools for it in Go. We will run through some basic command line tools: grep, awk, sed, and jq. We'll talk about curl, wget and pup, then wrap it up with a conversation about Go. Eric Chiang is a software engineer and founding member at Yhat, a NYC startup building products for enterprise data science teams. Eric enjoys of Go, data analysis, Javascript, network programming, Docker, and grilled cheese sandwiches.
Citation preview
stdin,stdoutpup, Go & life at the command-line
stdin,stdoutpup, Go & life at the command-line
$ cd ~/talks/codementor$ cat hello.txtHello, Code Mentor!$
CLI life: Data
data
[LOG] some data[LOG] more data[LOG] even more
col1,col2,col3some,data,andeven,more,data
{ “some”: “data”, “more”: “data”}
<div>
<h1>Some</h1><p>data</p>
</div>
grep & nl
grep
cat
grep
cat
pipes!
$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50
$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50
$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50
$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50
$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50
$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50
$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50
$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50
$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50
$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50
curl & wget
$ wget -O Shakespeare.txt \ http://gutenberg.org/cache/epub/100/pg100.txt
$ wget -O Shakespeare.txt \ http://gutenberg.org/cache/epub/100/pg100.txt
$ wget -O Shakespeare.txt \ http://gutenberg.org/cache/epub/100/pg100.txt
wget =
$ wget --load-cookies cookies.txt
$ wget -O Shakespeare.txt \ http://gutenberg.org/cache/epub/100/pg100.txt
$ curl -o Shakespeare.txt \ http://gutenberg.org/cache/epub/100/pg100.txt
$ cat Shakespeare.txt | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50
$ curl http://gutenberg.org... | \ sed -e 's/\s+/\n/g' | \ tr -d ' ' | \ grep -e '^$' -v | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -nr | \ head -n 50
curl & wget
curl & wget
I hate HTML
HTML is really hard
“Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp.”
“Have you tried using an XML parser instead?”
But it gets worse
<tbody> <tr><img src="foo"></tr> <tr><img/><br> </tbody></table>
<tbody> <tr><img src="foo"></tr> <tr><img/><br> </tbody></table>
Yes, this is valid HTML:
<tbody> <tr><img src="foo"></tr> <tr><img/><br> </tbody></table>
Yes, this is valid HTML:
<tbody> <tr><img src="foo"></tr> <tr><img/><br> </tbody></table>
Yes, this is valid HTML:
<tbody> <tr><img src="foo"></tr> <tr><img/><br> </tbody></table>
Yes, this is valid HTML:
NEVER TRY TO WRITE AN HTML PARSER
Nokogiri 鋸
pup
Still HTML
$ curl -L -s reddit.com/r/programming/ | \ pup p.title a[href^=http] attr{href}
$ curl -s https://news.ycombinator.com/ | \ pup td.title a[href^=http] attr{href}
$ curl -L -s reddit.com/r/programming/ | \ pup p.title a[href^=http] attr{href}
$ curl -s https://news.ycombinator.com/ | \ pup td.title a[href^=http] attr{href}
$ curl -L -s reddit.com/r/programming/ | \ pup p.title a[href^=http] attr{href}
$ curl -s https://news.ycombinator.com/ | \ pup td.title a[href^=http] attr{href}
$ curl -L -s reddit.com/r/programming/ | \ pup p.title a[href^=http] attr{href}
$ curl -s https://news.ycombinator.com/ | \ pup td.title a[href^=http] attr{href}
$ curl -s https://news.ycombinator.com/ | \ pup td.title a[href^=http] json{}[ { "attrs": { "href": "https://hacks.mozilla.org/2014/10/passwordless-authentication-secure-simple-and-fast-to-deploy/" }, ...]
$ curl -s https://news.ycombinator.com/ | \ pup td.title a[href^=http] json{}[ { "attrs": { "href": "https://hacks.mozilla.org/2014/10/passwordless-authentication-secure-simple-and-fast-to-deploy/" }, ...]
$ curl -s https://news.ycombinator.com/ | \ pup td.title a[href^=http] json{}[ { "attrs": { "href": "https:.../" }, "tag": "a", "text": "SHOW HN: pup" }, ...]
github.com/EricChiang/pup
Part II: Building CLI tools in Go
import java.util.Scanner;
class Hello { public static void main(String[] args) { Scanner reader = new Scanner(System.in); System.out.print("Enter your name: "); String name = reader.nextLine(); System.out.printf("Hello, "+name+"!"); }}
import java.util.Scanner;
class Hello { public static void main(String[] args) { Scanner reader = new Scanner(System.in); System.out.print("Enter your name: "); String name = reader.nextLine(); System.out.printf("Hello, "+name+"!"); }}
import java.util.Scanner;
class Hello { public static void main(String[] args) { Scanner reader = new Scanner(System.in); System.out.print("Enter your name: "); String name = reader.nextLine(); System.out.printf("Hello, "+name+"!"); }}
import java.util.Scanner;
class Hello { public static void main(String[] args) { Scanner reader = new Scanner(System.in); System.out.print("Enter your name: "); String name = reader.nextLine(); System.out.printf("Hello, "+name+"!"); }}
Why Go?
Why not?
Taken from Rob Pike’s talk public static void main (2012)
Taken from Rob Pike’s talk public static void main (2012)
“dear god make it stop”
Why not?
Why not?
I suck at this -->
Go
package main
import "fmt"
func main() {fmt.Println("Hello, world!")
}
line
package main
import "io"import "os"
func main() {io.Copy(os.Stdout, os.Stdin)io.WriteString(os.Stdout, "\n")
}
package main
import "io"import "os"
func main() {io.Copy(os.Stdout, os.Stdin)io.WriteString(os.Stdout, "\n")
}
package main
import "io"import "os"
func main() {io.Copy(os.Stdout, os.Stdin)io.WriteString(os.Stdout, "\n")
}
package main
import "io"import "os"
func main() {io.Copy(os.Stdout, os.Stdin)io.WriteString(os.Stdout, "\n")
}
package main
import "io"import "os"
func main() {io.Copy(os.Stdout, os.Stdin)io.WriteString(os.Stdout, "\n")
}
$ echo "Hello, World"Hello, World
$ echo "Hello, World"Hello, World$ go get github.com/ericchiang/line
$ echo "Hello, World"Hello, World$ go get github.com/ericchiang/line$ echo "Hello, World" | lineHello, World
$
url-encode
Live demo!
gox
Messing with zip
Thanks!