Upload
kuniko
View
21
Download
1
Embed Size (px)
DESCRIPTION
Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang). In the last slide. More Unix features worthy to mention job control I/O redirection and piping text processing (vi, grep , sed , awk , …) Programming vs. language. Programming. Before. - PowerPoint PPT Presentation
Citation preview
1
Bioinformatics ProgrammingEE, NCKU
Tien-Hao Chang (Darby Chang)
2
In the last slide More Unix features worthy to
mention– job control
– I/O redirection and piping
– text processing (vi, grep, sed, awk, …)
Programming vs. language
3
Programming
4
BeforeLearning advanced data structures
and the associated algorithms
5
structA brick to construct advanced data structure in C
6
struct struct is similar to array from the view that
both of them can aggregate a set of objects into a single object (here is not that one in object-oriented)– array: aggregate objects with the same type
– struct: aggregate objects with different types
struct is the condensation of ‘structure’ Each entry is a struct declaration is usually
called a ‘field’ or ‘member’
7
struct
Declaration A struct declaration consists of a list of fields, each
of which can have any type– struct mydata { //declare the structure of mydata
char name[8];char id[10];int math;int eng;
};
– defines a type, referred to as struct mydata
To create a new variable of this type– // define a variable ‘student’ of the type ‘mydata’
struct mydata student;
8
struct
The Memory Space
Memory
Student
name
id
math
eng
9
struct
Test Memory Space #include<stdio.h>
#include<stdlib.h>int main(void) {
struct data {char name[10];char sex[2];int math;};struct data student;printf("sizeof(student)=%d\n", sizeof(student));return 0;
} Result 16
10
struct
Access Fields The dot (.) operator
– struct_variable.field_name
For example– student.math = 90;
– student.eng = 20;
– printf("%s’s Math score is %d\n", student.name, student.math);
A convenient shortcut to initializing members of struct is shown below– struct data student={"Mary Wang",74};
11
struct
Array of Structures You may define an array of structures
– struct student { //declare the structure of studentchar name[8];char id[10];int math;int eng;
};// define an array of 3 variable of the type ‘student’struct student stu[3];
[0] [1] … [7]
[0] [1] … [9]
name
id
math
eng
stu[0]
stu[1]
stu[2]...
12
struct
Pointer to Structure Pointers can be used to refer to a struct by its address
– struct mydata { // declare the structure of mydata
char name[8];char id[10];int math;int eng;
} student; // define a mydata variable, student
struct mydata * ptr; // define a pointer of mydata
ptr = &student; // point ptr to the variable, student
Access files from struct pointers– the dereference (->) operator
– struct_pointer_variable->field_name
– student->math = 90
13
struct
Nested Structures Since struct declaration constructs new types, it is trivial to use struct fields
just like normal types such as int, double, …– #include<stdio.h>
#include<stdlib.h>int main(void) {
struct date { // declare dateint month;int day;
};struct student { // declare nested structure, student
char name[10];int math;struct date birthday;
} s1={"David Li", 80, {2,10}}; // define a student variable, s1printf("student name:%s\n",s1.name);printf("birthday:%d month, %d day\n", s1.birthday.month, s1.birthday.day);printf("math grade:%d\n",s1.math);return 0;
}
14
struct
Self-referential Structure Fields are not allowed to be defined as the same
type as the declaration they belong But fields can be defined as pointers to the same
type as the declaration they belong Such a struct with pointer fields referencing to the
same strcut type, is called self-referential structure– struct PERSON {
char name[8];int age;struct PERSON * son; // self-referential pointer
};
name age son
15
Any Questions?
16
WhyFields are not allowed to be defined as the same type as the declaration they belong?
But fields can be defined as pointers to the same type as the declaration they belong?
Hint: think from the perspective of memory
17
The ClosenessBetween C and the realistic representation is the reason of both a) why C-based program is so fast and b) why C is suitable for teaching
18
Languages Comparison Since the 1950s, computer scientists have devised thousands of
programming languages. Many are obscure, perhaps created for a Ph.D. thesis and never heard of since.
Compiling to machine code– some languages transform programs directly into Machine Code—the
instructions that a CPU understands directly
– this transformation process is called compilation
– assembly, C, and C++
Interpreted languages– other languages are either interpreted such as Basic, Perl, and
Javascript
– or a mixture of both being compiled to an intermediate language, including Java and C#
19
Languages Comparison
Compile vs. Interpret An interpreted language is processed at runtime. Every line is read,
analyzed, and executed. Having to reprocess a line every time in a loop is what makes interpreted languages so slow.– this overhead results in that interpreted code runs between 5–10 times
slower than compiled code
– their advantage is not needing to be recompiled after changes and that is handy when you're learning to program.
Because compiled programs almost always run faster than interpreted, languages such as C and C++ tend to be the most popular for writing games.
Java and C# both compile to an interpreted language which is very efficient. Because the Virtual Machine that interprets Java and the .NET framework that runs C# are heavily optimized, it's claimed that applications in those languages are as fast if not faster as compiled C++.
20
Languages Comparison
Level of Abstraction How close a particular language is to the hardware?
Machine Code is the lowest level followed by assembly. C++ is higher than C because C++ offers greater abstraction. Java and C# are higher than C++ because they compile to an
intermediate language called bytecode.
When computers first became popular in the 1950s, programs were written in machine code. Programmers had to physically flip switches to enter values. This is such a tedious and slow way of creating an application that higher level computer languages had to be created.
21
Super coder!
http://www.evula.org/dragoon/pics/supercoder.jpg
22
Assembler: Fast to run, slow to write– The readable version of Machine Code
• Mov A,$45
– Because it is tied to a particular CPU, assembly is not very portable.
– Languages like C have reduced the need for assembly except where memory is limited or time critical code is needed. This is typically in the kernel code or in a driver.
Basic: For beginners– Basic is an acronym for Beginners All purpose Symbolic Instruction Code and
was created to teach programming in the 1960s.
– Microsoft have made the language their own with many different versions including VBScript for websites and the very successful Visual Basic.
– It is an interpreted language with the only advantage of easy-to-learn. But now it is more like a syntax alternative to C because most programmers are lazy.
Pascal: Conscientious programming– Pascal was devised as a teaching language a few years before C but had limited
usage.
– Until Borland's Turbo Pascal (for Dos) and Delphi (for Windows) appeared, it is suitable for commercial development.
– However Borland was up against Microsoft and lost the battle.
23
C: System programming– C was devised in the early 1970s by Dennis Ritchie. It can be thought of as a general
purpose tool—very useful and powerful but very easy to let bugs through that can make systems insecure.
– C has been described as portable assembly.
– The syntax of many scripting languages is based on C.
C++: A classy language– C++ (or C plus classes as it was originally known) came about ten years after C and
successfully introduced Object Oriented Programming to C, as well as features like exceptions and templates.
– Learning all of C++ is a big task—it is by far the most complicated of the programming languages here but once you have mastered it, you'll have no difficulty with any other language.
C#: Microsoft's big bet– C# was created by Delphi's architect Anders Hejlsberg after he moved to Microsoft
and Delphi developers will feel at home with features such as Windows forms.
– C# syntax is very similar to Java, which is not surprising as Hejlsberg also worked on J++ after he moved to Microsoft.
– Learn C# and you are well on the way to knowing Java. Both languages are semi-compiled, so that instead of compiling to machine code, they compile to bytecode and are then interpreted.
24
Perl: Websites and utilities– Very popular in the Linux world, Perl was one of the first web languages and
remains very popular today.
– For doing ‘quick and dirty’ programming on the web it remains unrivalled and drives many websites.
– It has though been somewhat eclipsed by PHP as a web scripting language.
PHP: Websites coding– PHP was designed as a language for Web Servers and is very popular in
conjunction with Linux, Apache, MySql and PHP or LAMP for short.
– It is interpreted, but pre-compiled so code executes reasonably quickly.
– It can be run on desktop computers but is not as widely used for developing desktop applications.
– Based on C syntax, it also includes Objects and Classes.
JavaScript : Programs in your browser– Javascript is nothing like Java, instead its a scripting language based on C syntax
but with the addition of Objects and is used mainly in browsers.
– JavaScript is interpreted and a lot slower than compiled code but works well within a browser.
– Invented by Netscape and in doldrums for years. Popular again because of AJAX; Asynchronous Javascript and XML. This allows parts of web pages to update from the server without redrawing the entire page.
25
Position 2010 Position 2009 Delta in Position Language Ratings 2010 Delta 2009 Status
1 1 = Java 17.509% -2.29% A2 2 = C 17.279% +1.42% A3 4 ↑ PHP 9.908% +0.42% A4 3 ↓ C++ 9.610% -0.75% A5 5 = (Visual) Basic 6.574% -1.71% A6 7 ↑ C# 4.264% -0.06% A7 6 ↓ Python 4.230% -0.95% A8 9 ↑ Perl 3.821% +0.40% A9 10 ↑ Delphi 2.684% -0.03% A10 8 ↓↓ JavaScript 2.651% -0.96% A11 11 = Ruby 2.327% -0.27% A
12 32 ↑↑↑↑↑↑↑↑↑↑ Objective-C 1.970% +1.79% A
13 - ↑↑↑↑↑↑↑↑↑↑ Go 0.921% +0.92% A
14 15 ↑ SAS 0.769% -0.03% A15 13 ↓↓ PL/SQL 0.737% -0.31% A16 22 ↑↑↑↑↑↑ MATLAB 0.661% +0.20% B17 17 = ABAP 0.639% +0.00% B18 16 ↓↓ Pascal 0.603% -0.13% B19 19 = ActionScript 0.594% +0.11% B
20 27 ↑↑↑↑↑↑↑ Fortran 0.563% +0.24% B
26
http://www.simplyhired.com/a/jobtrends/graph/q-Perl%2C+Ruby%2C+Python%2C+Php%2C+Javascript%2C+Flex%2C+Groovy/t-line
27
Languages Comparison
Summary
Other noteworthy programming languages– Java, Python, Ruby, Go, …
The popularity forms for many reasons– history (programmers are lazy), business, and functionality
Lasting wars– Java vs. .NET (C will, in some form, live forever)
– Perl vs. PHP vs. Ruby (web programming)
– Perl vs. Python (scripting)
– There might be a dominant system language and a scripting language in the future, but it probably converges to a coexistence world.
Lower Level
Higher Level
» more readable» faster to develop» more coding sugar» avoid careless mistakes
» easy to debug» faster program» general purpose» powerful to do evil
28
Any Questions?
29
Algorithm
30
Algorithm Specification
– a finite set of instructions that accomplishes a particular task
– criteria• input: zero or more quantities that are externally supplied
• output: at least one quantity is produced
• definiteness: clear and unambiguous
• finiteness: terminate after a finite number of steps
• effectiveness: instruction is basic enough to be carried out
Representation– a natural language, like English or Chinese
– a graphic, like flowcharts
– a computer language, like C
31
Algorithm
Selection Sort From those integers that are currently unsorted, find the smallest
and place it next in the sorted list
i [0] [1] [2] [3] [4]- 30 10 50 40 20
0 10 30 50 40 20
1 10 20 50 40 30
2 10 20 30 40 50
3 10 20 30 40 50
32
33
Algorithm
Binary Search [0] [1] [2] [3] [4] [5] [6]
8 14 26 30 43 50 52
left right middle [middle] : target0 6 3 30 < 434 6 5 50 > 434 4 4 43 == 43 (found)
0 6 3 30 > 180 2 1 14 < 182 2 2 26 > 182 1 - (not found)
Searching a sorted listwhile (there are more integers to check) {
middle = (left + right) / 2;if (target < list[middle])
right = middle - 1;else if (targeeet == list[middle])
return middle;else left = middle + 1;
}
34
int binsearch(
int list[], int target,
int left, int right)
{
int middle;
while (left <= right) {
middle = (left + right) / 2;
switch (COMPARE(list[middle], target)) {
case -1: left = middle + 1;
break;
case 0: return middle;
case 1: right = middle – 1;
}
}
return -1;
}
» Program 1.6: Searching an ordered list
35
Algorithm
Recursive Algorithms Beginning programmers view a function as something that is
invoked (called) by another function– it executes its code and then returns control to the calling function
This perspective ignores the fact that functions can call themselves (direct recursion)
They may call other functions that invoke the calling function again (indirect recursion)– extremely powerful
– frequently allow us to express an otherwise complex process in very clear term
We should express a recursive algorithm when the problem itself is defined recursively
36
int binsearch(
int list[], int target,
int left, int right)
{
int middle;
while (left <= right) {
middle = (left + right) / 2;
switch (COMPARE(list[middle], target)) {
case -1: return
binsearch(list,target,middle+1,right);
case 0: return middle;
case 1 : return
binsearch(list,target,left,middle-1);
}
}
return -1;
}
» Program 1.7: Recursive implementation of binary search
37
Any Questions?
38
Data Abstraction
39
Data Abstraction Data type
– A data type is a collection of objects and a set of operations that act on those objects
– For example, the data type int consists of the objects {0, +1, -1, +2, -2, …, INT_MAX, INT_MIN} and the operations +, -, *, /, and %
The data types of C– basic data types: char, int, float, and double
– group data types: array and struct
– pointer data type
– user-defined types
Abstract data type– An abstract data type (ADT) is a data type that is organized in such a
way that the specification of the objects and the operations on the objects is separated from the representation of the objects and the implementation of the operations.
– We know what is does, but not necessarily how it will do it.
40
41
The array as an ADT
42
ToEvaluate which algorithm is better
43
Algorithm
Performance Analysis Criteria
– Is it correct?
– Is it readable?
– …
Performance analysis (machine independent)– space complexity: storage requirement
– time complexity: computing time
Performance measurement (machine dependent)
44
Performance Analysis
Space Complexity S(P)=C+SP(I) Fixed space requirements (C)
– independent of the inputs and outputs
– instruction, constants, simple variables
Variable space requirements (SP(I))– depend on the instance characteristic I
– number, size, values of inputs and outputs associated with I
– recursive stack space, including formal parameters, local variables, and return address
45
Any Questions?
46
AnalyzeSomeone’s exercise
47
The recursion stack space needed is 6(n+1),
since the depth of recursion is n+1.
48
Performance Analysis
Time Complexity T(P)=C+TP(I)
The time, T(P), taken by a program, P, is the sum of its compile time C and its run (or
execution) time, TP(I)
TP(I)=caADD(I)+csSUB(I)+…– Program step: A syntactically or semantically meaningful
program segment whose execution time is independent of the instance characteristics.
– Introduce a new variable, count, into the program
– Tabular method
49
Time Complexity
Iterative Summation float sum(float list[], int n) {
float tmp = 0; ++count; // for assignment
int I;
for (i = 0; i < n; ++i) {
++count; // for the for loop
tmp += list[i];
++count; // for assignment
}
++count; // last execution of for
++count; // for return
return tempsum;
} 2n+3 steps
50
Time Complexity
Tabular MethodStatement s/e Frequency Total Steps
float sum(float list[], int n) 0 0 0
{ 0 0 0
float tmp=0; 1 1 1
int i; 0 0 0
for (i=0; i<n; ++i) 1 n+1 n+1
tmp+=list[i]; 1 n n
return tmp; 1 1 1
} 0 0 0
Total 2n+3
51
Any Questions?
52
Asymptotic notation
53
Asymptotic Notation
Basic Concepts There are two programs, one with
complexity c1n2+c2n and the other with
complexity c3n
– for sufficiently large of value of n, c3n will
be faster than c1n2+c2n
– for small values of n, either could be faster• c1=1, c2=2, c3=100 c1n2+c2n c3n for n 98
• c1=1, c2=2, c3=1000 c1n2+c2n c3n for n 998
54
Asymptotic Notation
O, , O [big “oh’’]
– f(n)=O(g(n)) iff there exist positive constants c and n0 such that f(n) cg(n) for all n,
n n0
– upper bound, worst case
[big omega]– f(n) = (g(n)) (read as “f of n is big omega of g of n”) iff there exist positive
constants c and n0 such that f(n) cg(n) for all n, n n0
– lower bound, best case
[big theta]– f(n) = (g(n)) iff there exist positive constants c1, c2, and n0 such that c1g(n) f(n)
c2g(n) for all n, n n0
– upper and lower bound
Notice that relationship between analyses and notations. For example, sometimes we would analyze the big theta of the worst case of an algorithm.
55
Asymptotic Notation
Theorems If f(n) = amnm+…+a1n+a0, then f(n) = O(nm)
If f(n) = amnm+…+a1n+a0 and am > 0, then f(n) = Ω(nm)
If f(n) = amnm+…+a1n+a0 and am > 0, then f(n) = Θ(nm)
Examples– f(n) = 3n+2
3n+2 4n, for all n 2, 3∴ n+2 = O(n)
3n+2 3n, for all n 1, 3∴ n+2 = Ω(n)
3n 3n+2 4n, for all n 2, 3∴ n+2 = Θ (n)
– f(n) = 10n2+4n+2
10n2+4n+2 11n2, for all n 5, 10∴ n2+4n+2 = O(n2)
10n2+4n+2 n2, for all n 1, 10∴ n2+4n+2 = Ω(n2)
n2 10n2+4n+2 11n2, for all n 5, 10∴ n2+4n+2 = Θ(n2)
– 10n2+4n+2 = O(n2)// 10n2+4n+2 11n2 for n 5– 6*2n+n2 = O(2n) // 6*2n+n2 7*2n for n 4
56
Practical ComplexityTo get a feel for how the various functions grow with n, you are advised to study the following three figures
57
58
59
60
Performance Measurement Although performance analysis gives us a
powerful tool for assessing an algorithm’s space and time complexity, at some point we also must consider how the algorithm executes on our machine
61
Any Questions?
62
FibonacciIn nOut the n-th Fibonacci number
Requirement- a recursive version and an iterative version- report - time/space complexity - practical time - code size (less meaningful in C)- using C would be the best
Bonus- an algorithm of O(n) time and O(1) space complexity- the best time complexity is O(1)- use Makefile to automate the report
63
Fibonacci
A Reference Kenji Mikawa and Ichiro Semba (2005). "An O
(1) time algorithm for generating Fibonacci strings." Electronics and Communications in Japan (Part II: Electronics) 88(9): 67-72.
Provided by 陳偉銘– “However, the majority in this course is male,
so…”
64
Deadline2010/3/23 23:59
Zip your code, a step-by-step README of how to execute the code and anything worthy extra credit. Email to [email protected].
65
Recall that
http://www.dianadepasquale.com/ThinkingMonkey.jpg
66
gcc
Multiple Source Files If there are multiple source file
– $ gcc file1.c file2.c -o myprog
Or– $ gcc -c file1.c
$ gcc -c file2.c$ gcc file1.o file2.o -o myprog
The second one compiles source files separately. If only file1.c was modified– $ gcc -c file1.c
$ gcc file1.o file2.o -o myprog
Notice that file2.c does not need to be recompiled.– significant time savings when there are numerous source files
This process, though somewhat complicated, is generally handled automatically by a makefile.
67
But how do you knowwhich files should be re-compiled?
http://faculty.northseattle.edu/tfurutani/che140/labbook_files/image005.jpg
68
Don’t invent the wheel
http://www.morphcoaching.com/mypics/Wheel_invention.jpg
69
Makefile
70
Makefile A Makefile is the configuration file used by a standard
program called “make” make is like a project manager in a graphical
development environment, but includes many extra features
Allows an entire project to be intelligently built with one command on the command line– make avoids re-building targets which are up-to-date, thus,
saving typing and compiling time a lot
– Makefiles largely similar to the Project and Workspace files you might be used to from Visual C++, JBuilder, Eclipse, etc
71
Makefile
Filenames When you key in make, the make looks for the
default filenames in the current directory. For GNU make these are– GNUMakefile
– makefile
– Makefile
If there more than one of the above in the current directory, the first one according to the above chosen
It is possible to name the Makefile anyway you want, then for make to interpret it– $ make -f <your-filename>
72
Makefile
Dependencies Sometimes one file depends on another file
– e.g. a C file depends on its header files
If a header file changes, the C files that #include that header file should be recompiled to take into account the changes to the header
interface.h interface.cmain.c
main.o
final executable file(my_project)
interface.o
73
Makefile
A Simple Makefile “Rule” hello: hello.c
gcc hello.c -o hello Save this text as name “Makefile” in the
same directory as the source code To build the project, type “make” Result is an executable named hello If hello file exists, and the file creation time is
newer than hello.c, what should “make” do?– nothing
74
Makefile
Generic Form of a Rule target1 target2 ..: prerequisite1 prerequisite2 ...
<tab>command1
<tab>command2
Target is the output file Prerequisites are the files that are needed by target (and that can
cause target to be recompiled if they change). Command (or action) is the actual command to turn the
prerequisites into the target. Characters after “#” are regarded as comments Line oriented
– If the dependencies or commands are too long and you would like to span them across several lines for clarity and convenience, escape the end of line by “\” at the end.
– Make sure NOT to use tabs for such lines.
75
Makefile
Target make performs corresponding actions of specific targets Target could be a filename that you want to generate or
a phony target, where the later is specially useful for many action automation
Suggested phony targets from GNU– all Default action (build/compile the executable)
– install install previously built executable
– clean clean temporary files generated during the build process, usually the .o or .obj files
The first target listed in the file will be used if no target is formally specified
76
Makefile
Multiple Targets MyProject: main.o interface.o
gcc main.o interface.o -o MyProjectmain.o: main.c interface.h
gcc -c main.c -o main.ointerface.o: interface.c interface.h
gcc -c interface.c -o interface.o
Build MyProject– $ make
– $ make MyProject
– make will figure out the appropriate order from the prerequisites
Compile a non-master targets– $ make main.o
interface.h interface.cmain.c
main.o
final executable file(my_project)
interface.o
77
Makefile
Command A list of actions needed to generate the rule’s target May be empty (just indicate dependencies) Every action is usually a typical shell command you would
normally type to do the same thing You can hide commands with a preceding ‘@’ symbol Every command MUST be preceded with a tab!
– This is how make identifies actions as opposed to variable assignments and targets. Do not indent actions with spaces!
Each action line invoke a sub shell to execute the commands– The sub shell ends after that line
– Some changes (such as cd to another directory or set shell variables) won’t pass to the next line
– Use ‘;’ symbol to execute multiple commands in one line
78
Makefile
Variables In a large Makefile, a good idea is to use variables to
make later changes easy For example, rather than typing ‘gcc’ in the
command part of every rule, create a variable at the top of the Makefile– CC = gcc
Commands can then be– ${CC} source_file.c -o executable_file
Case sensitive Use only alphabets, numbers, and ‘_’ Both $(VAR) or ${VAR} are okay
79
Makefile
Other Features Implicit rules
– GNU make thus provides some implicit rules for common practices such as the object file of foo.c would be foo.o. For example, the following rules are unnecessary
• foo.o: foo.cgcc -c -o foo.o foo.c
Phony target– The target is always out-of-date and thus the actions are always performed
– e.g. ‘.PHONY: clean’
Automatic variables (internal macros)– $@ the filename of the target of the rule
– $< the name of the first prerequisite
– $? the names of all the prerequisites that are newer than the target
– $^ the names of all the prerequisites
– $* the main filename of the target of the rule
Flow control– ifeq, ifneq, ifdef, ifndef, for, if-then-else, …