View
229
Download
0
Category
Preview:
Citation preview
Map ReduceProgramming
Waue Chen
Why ?
Moore’s law ?每隔 18 個月, CPU 的主頻就會增加一倍2005 開始失效
多核及平行運算時代來臨
What is Hadoop
Hadoop 是一個 open source 可運作於大規模 cluster 上的平行分散式程式框架
提供一個分散式文件系統 HDFS ,用來在各個節點上存儲數據
高容錯性,自動處理失敗節點 實現了 Google 的 MapReduce 算法
What is MapReduce 把應用程序分割成許多很小的工作單元 每個單元可以在任何節點上執行或運算
MapReduce: ExampleMapReduce: Example
MapReduce in Parallel: ExampleMapReduce in Parallel: Example
Thinking in Hadoop:MapReduce
HDFS Map Class Reduce Class Overall Configuration
Program prototypeClass MR{
Class Map …{ }
Class Reduce …{ }
main(){
JobConf conf = new JobConf(“MR.class”);
conf.setInputPath(“the_path_of_HDFS ”);
conf.setMapperClass(Map.class);
conf.setReduceClass(Reduce.class);
JobClient.runJob(conf);
}}
Word Count SampleClass WordCount{
main(){
JobConf conf = new JobConf(WordCount.class);conf.setJobName("wordcount"); // set pathconf.setInputPath(new Path(“/user/waue/input”));conf.setOutputPath(new Path(“counts”));FileSystem.get(conf).delete(new Path(wc.outputPath));// set map reduceconf.setOutputKeyClass(Text.class); // set every word as keyconf.setOutputValueClass(IntWritable.class); // set 1 as valueconf.setMapperClass(MapClass.class);conf.setReducerClass(ReduceClass.class);conf.setNumMapTasks(1);conf.setNumReduceTasks(1);// runJobClient.runJob(conf);
}}
Word Count Sampleclass MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);private Text word = new Text();public void map( LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = ((Text) value).toString();StringTokenizer itr = new StringTokenizer(line);while (itr.hasMoreTokens()) {
word.set(itr.nextToken());output.collect(word, one);
}}}
1
234
56789
Word Count Sample
class ReduceClass extends MapReduceBase implements Reducer< Text, IntWritable, Text, IntWritable> {
IntWritable SumValue = new IntWritable();public void reduce( Text key, Iterator<IntWritable> values,OutputCollector<Text, IntWritable> output, Reporter reporter)throws IOException {
int sum = 0;while (values.hasNext())
sum += values.next().get();SumValue.set(sum);output.collect(key, SumValue);
}}
1
2 3
45678
Result
MapReduce with HBase prototypeClass MR_HBase{
Class Map extends …{
}
Class Reduce extends…{
}
main(){
JobConf conf = new …;
conf.setInputPath(…);
conf.setMapperClass( …);
conf.setReduceClass( …);
JobClient.runJob(conf);
}
}
HBase API:
TableMap
TableReduce
WordCountIntoHbase SampleClass WordCountIntoHbase{
main(){
BuildHTable build_table = new BuildHTable( Table_Name, ColumnF);
if (!build_table.checkTableExist(Table_Name)) {
if ( !build_table.createTable() )
System.err.println("create table error !");
} else System.out.println("Table existed !");
JobConf conf = new JobConf(WordCount.class);conf.setJobName("wordcount"); conf.setInputPath(new Path(“/user/waue/input”));//conf.setOutputPath(new Path(“counts”));//FileSystem.get(conf).delete(new Path(wc.outputPath));//conf.setOutputKeyClass(Text.class); // set every word as key//conf.setOutputValueClass(IntWritable.class); // set 1 as value//conf.setMapperClass(MapClass.class);conf.setReducerClass(ReduceClass.class);conf.setNumMapTasks(0);conf.setNumReduceTasks(1);JobClient.runJob(conf);
}}
class ReduceClass extends TableReduce<LongWritable, Text> {
Text col = new Text( “word:text” );
private MapWritable map = new MapWritable();
public void reduce( LongWritable key, Iterator<Text> values,
OutputCollector<Text, MapWritable> output, Reporter reporter)
throws IOException {
ImmutableBytesWritable bytes
= new ImmutableBytesWritable(values.next().getBytes());
map.clear();
map.put(col, bytes);
output.collect(new Text(key.toString()), map);
}}
WordCountIntoHbase Sample
1
2
3
4
5
6
7
8
result
WordCountFromHbase
Word Counting from Hbase after WordCountIntoHbase run.
In Trac … http://trac.nchc.org.tw/cloud/browser/sample/h
adoop-0.16/tw/org/nchc/code/WordCountFromHBase.java
What’s HBaseRecordPro parse your record create Hbase set the first line as column qualify store in HBase Automatically Locally http://trac.nchc.org.tw/cloud/wiki/HBaseRe
cordPro
HBaseRecordPro
name:locate:years waue:taiwan:1981 rock:taiwan:1981 aso:taiwan:1981 jazz:taiwan:1982
Run HBaseRecordPro.java
hql> Select * from Table;
Detailed Code Explanation
Apache log parser http://trac.nchc.org.tw/cloud/wiki/
LogParser
More .. ? Enjoy http://trac.nchc.org.tw/cloud/
How to code Hadoop in Eclipsehttp://trac.nchc.org.tw/cloud/browser/hadoop-
eclipse.pdfMap Reduce in Hadoop/HBase Manualhttp://trac.nchc.org.tw/cloud/wiki/MR_manualMy Code sourceshttp://trac.nchc.org.tw/cloud/browser/sample/h
adoop-0.16/tw/org/nchc/code
Then .. ? Intrusion-Detection-System log parser
Count => the last Format => 6 lines / 1 cell
Apache Pig Pig is a platform for analyzing large data sets that
consists of a high-level language
[**] [1:2189:3] BAD-TRAFFIC IP Proto 103 PIM [**][Classification: Detection of a non-standard protocol or event] [Priority: 2] 07/08-14:57:56.500718 140.110.138.253 -> 224.0.0.13PIM TTL:1 TOS:0xC0 ID:11078 IpLen:20 DgmLen:54[Xref => http://cve.mitre.org/cgi-bin/cvename.cgi?name=2003-0567][Xref => http://www.securityfocus.com/bid/8211]
References
API http://hadoop.apache.org/hbase/docs/current/
api/index.htmlhttp://hadoop.apache.org/core/docs/r0.16.4/a
pi/index.html用 Hadoop 進行分佈式並行編程
http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop1/index.html
Recommended