Nerd talk: regexes

  • View
    113

  • Download
    3

Embed Size (px)

Transcript

  • 1. Regexes: It's magic!

2. Some people, when confronted with a problem, think 'I know, I'll use regular expressions!' Now they have two problems. 3. * 4. Perl style regex: It's magic done right! 5. Metacharacters ^ beginning $ end . anythingescape/^....G..AA$/ 6. Escaped characters s whitespace/^wwwwGwwAA$/S not-whitespace/^dddddddd$/w word d digit . dotcounterslash 7. Repetition ? 0 or 1 time/^w{4}Gw{2}AA$/* 0 or more times/^d{1,2}d{1,2}d{2,4}$/+ 1 or more times *? ungreedy * +? ungreedy + {m} m times {m, n} m up to n times {m, n}? ungreedy {m,n} 8. Grouping [ABC] any of these characters (AB|BC|CA) any of these expressions (THIS!) save this [A-Za-z0-9] ranges/^[ACTG]{4}G[ACTG] {2}AA$/ /^(0?[1-9]|[0-2]d|3[01]) (0?d|1[0-2]) (d{2}|d{4})$/ 9. OVERKILLhttp://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb 10. In Python (sigh...) 11. E.g.: finding files 12. E.g.: finding filesiel' v 'Dan ' | grep bo p -v 'bu e '->' | gr ep -la | gr ls 13. E.g.: demultiplexing fasta 1. Barcode 2. Primer 3. Random nucleotidesgrep -P '1:N:0:ACTGGTT' -A3 no-group-separator multiplex_R1.fastq | grep -P '^[ACTGN] {4}CCC[ACGT]T[GC]AGATA' -A2 -B1 --no-group-separator > deplexed_R1.fq 14. E.g.: paper figures! From the subset of unique sequences that span the entire region under study, how many unique sequences are matched by each primer combination? 15. Sed: find & replace Are you gonna talk about vim regexes? Sed regexes are weird My work around: use ranges [0-9] [A-Z] [a-z] [A-Za-z] 16. Sed: find & replace Are you gonna talk about vim regexes? Sed regexes are weird My work around: use ranges [0-9] [A-Z] [a-z] [A-Za-z]E.g.: Oh noes, Americans don't know how to separate decimals! sed 's/./,/g' hisfile.tab > myfile.tab Oh noes, this bloody file was edited in Windows! sed 's/r/n/' theirfile.tab > decentfile.tab Oh noes, Cassava 1.6 has a slash in it! sed 's,/1, 1:N:0:NNNNNN,' oldfile.fq > newfile.fq 17. Other neat stuff grep (-c) sort (-n, -r, -k, -t) uniq -c 18. LMGTFY: sed http://www.tutorialspoint.com/unix/unix-regular-expressions.htm grep http://linux.about.com/od/commands/l/blcmdl1_grep.htm Perl http://www.cs.tut.fi/~jkorpela/perl/regexp.html Python http://docs.python.org/2/howto/regex.html Vim http://vimregex.com/ 19. sed 's/fear of regex/love of regex/g'