Big Java, Big Data

Embed Size (px)

Text of Big Java, Big Data

  • ..Big Java, Big Data / Monster Su

  • .. Big Java, Big Data

    monster.kcsu@gmail.com

    http://www.facebook.com/monster.kcsu

    July 20, 2012

  • .. Profile

    Java

    XML/Web ServicesDesignPatternsEJB/JPA Java EE Struts/Spring/Hibernate Open SourceFramework JBoss ASGlassFish Application Server

    Apache HadoopGoogle App EngineMicrosoft Azure Cloud PlatformiOSAndroidWindows Phone SmartHandheld Device

  • .. Outline

    ...1 Introduction

    ...2 Big Java

    ...3 Big Data

    ...4 Tool

    ...5 Summary

  • ...1 Introduction

    ...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform

    ...3 Big DataRDBMS DataNoSQL DataSocial Network Data

    ...4 Tool

    ...5 Summary

  • ..Big Data http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/

    12 12

  • ..Big Data -

  • ..Big Data -

  • .. Big Data

    RDBMS Log

  • ..Big Data Vhttp://www.ibm.com/developerworks/data/library/dmmag/DMMag_2011_Issue2/BigData/

  • ..A Very Short History of Big Datahttp://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/

    1941 Information Explosion 1944 16

    2040 Yale 2 108 6,000 6,000

    2005 Data is the next Intel inside.SQL is the new HTML. (Tim O'Reilly)

    2007 IDC 2006 161 EB 2010 988EB 2010 2011 2010 1,200EB2011 1,800 EB

  • .. Big Data

  • .. Big Data

  • ..Big Data - IBMhttp://www-01.ibm.com/software/tw/data/bigdata/cases.html

  • ..Big Data - Microsofthttp://www.windowsazure.com/en-us/home/case-studies/

  • .. Big Data Big Java -

    Virtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform

    Big Data - RDBMS Multi-Tenancy SupportNoSQL Database SupportFramework for RDBMS/NoSQL/Social

  • ...1 Introduction

    ...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform

    ...3 Big DataRDBMS DataNoSQL DataSocial Network Data

    ...4 Tool

    ...5 Summary

  • ...1 Introduction

    ...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform

    ...3 Big DataRDBMS DataNoSQL DataSocial Network Data

    ...4 Tool

    ...5 Summary

  • .. 32-Bit vs. 64-Bit CPU32-Bit CPU Address Space 4 GB Windows Kernel Mode 2 GB User Mode 2 GBJVM Heap Address Space 1.2-1.8 GB64-Bit CPU Register Address SpaceHeap Thread Performance Memory

  • ..Hotspot VM FAQhttp://www.oracle.com/technetwork/java/hotspotfaq-138619.html

    A 64-bit implementation means that manyof the built-in Java types are doubled insize from 32 to 64. This is not true.All existing 100% pure Java programswould continue running just as they dounder a 32-bit VM.There's no public API that allows you todistinguish between 32 and 64-bitoperation.However, if you'd like to write code which isplatform specific (shame on you),

    Sunny Chan - Java 7 Hotspot VM

  • ..64-Bit JVM Optimization32-Bit Speed + 64-Bit Space

    64-Bit Address 8-Byte AlignmentPerformance Address 8 3 Bit 0 Higher-Order Bits Lower-Order Zero Bits

    64-Bit Execution32-Bit Pointer Length (Address >> 3)32 GB Heap Size (4 GB 23)

  • ..64-Bit JVM Optimization32-Bit Speed + 64-Bit Space

    SunCompressed Oops(Oops = Ordinary Object Pointers)(JDK6u23 )IBMPointer Compression/Compressed RefsOracle/BEAJRockit -XX Command-line Options

  • .. 32-Bit 64-Bit System Property

    os.arch

    sun.arch.data.model

    Java Interpreter 4 GB (java -Xms4g Xmx4g) java -d32 -version java -d64 -version java -version (JDK 1.6 ) file /usr/bin/java (Unix )

  • .. 32-Bit 64-Bit System Property1 public class OSArchitecture2 {3 public static void main(String[] args)4 {5 System.out.print("os.arch=");6 System.out.println(System:getProperty("os:arch"));7 System.out.print("sun.arch.data.model=");8 System.out.println(System:getProperty("sun:arch:data:model"));9 }

    10 }

  • .. 32-Bit 64-Bit Java Interpreter

  • ...1 Introduction

    ...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform

    ...3 Big DataRDBMS DataNoSQL DataSocial Network Data

    ...4 Tool

    ...5 Summary

  • .. Multi-Core/Thread J2SE 1.0 - J2SE 1.4

    Thread Runnable synchronizedwaitnotify JVM Thread OS Thread Thread OS Workload Core JVM Heap Size Thread Immutable Object finalMutable Object volatile Concurrency

  • ..JSR 166: Concurrency UtilitiesJava SE 5 - Java SE 7

    Doug Lea Spec LeadConcurrent Programming in Java Java SE 5 java.util.concurrent Multi-Core Threading/Hardware Level API ParallelismJava SE 7 Fork/Join FrameworkJDK7u4 G1 Garbage Collector Multi-Core

    -Actor Model

  • .. Processor Runtime.getRuntime().availableProcessors()

    Java Application CPU CPU Hyper-Threading CPU

    1 public class AvailableProcessors2 {3 public static void main(String[] args)4 {5 System.out.println(Runtime:getRuntime():availableProcessors());6 }7 }

  • .. Thread Pool SizeThread Pool

    Runtime.getRuntime().availableProcessors() CPU-Bound Application CPU Thread Pool Performance

    1 ExecutorService e =2 Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());3 e.execute(new Runnable() {4 public void run() {5 // do one task6 }7 });

  • ...1 Introduction

    ...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform

    ...3 Big DataRDBMS DataNoSQL DataSocial Network Data

    ...4 Tool

    ...5 Summary

  • ..Ateji PXJava Parallel Programming Made Simple - http://www.ateji.com/px/

    Patrick Viry 2010 7 Thread Hardware Concept Natural Construct Language Level Parallelism Shared/Distributed Memory

  • .. JOMPhttp://www2.epcc.ed.ac.uk/computing/research_activities/jomp/OpenMP-Like Shared-Memory ParallelProgramming in Java Compiler DirectiveLibraryRoutine System Property JOMP Preprocessor Source Code JavaPerformance Hand-Coded Multi-Thread

    1 //omp parallel shared(a,b,n)2 {3 //omp for4 for (i = 1 ; i < n ; i++) {5 b[i] = (a[i] + a[i-1]) * 0.5;6 }7 }

  • ...1 Introduction

    ...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform

    ...3 Big DataRDBMS DataNoSQL DataSocial Network Data

    ...4 Tool

    ...5 Summary

  • .. Apache Hadoop

    Hadoop Node Scalability Failure

  • .. Hadoop Hadoop Google

    2003 SOSP The Google File System2004 OSDI MapReduce: Simplified Data Processing onLarge Cluster2006 OSDI Bigtable: A Distributed Storage System forStructured Data

  • ..Hadoop http://hadoop.apache.org/common/releases.html

    0.20 0.20.X 0.20 0.20.20X.YLegacy Stable Security Branch 0.20.205.0 1.0.XCurrent Stable0.23.XCurrent Alpha MapReduce 21.1.XCurrent Beta0.22.X Feature Security

  • .. Hadoop 1.0.X 0.20.205.0 0.20 SecurityBranch Bug Fix Performance Enhancement Enterprise-Ready Hadoop MapReduce HDFS HBase Kerberos HDFS RESTful API (WebHDFS)

  • .. MapReduce ()Input Splitting -> -> -> (k1, v1)Mapper ( n , XXX) -> (XXX, 1) (k1, v1) Mapper (k2, v2)

    Shuffling n (XXX, 1) -> (XXX, n) k2 List(k2, v2) (k2, sum(v2))Reducer (XXX, n) -> (XXX, m) Reducer List(k3, v3)

  • ..Hello MapReduce - Word Counthttp://www.rabidgremlin.com/data20/

  • ..Mapper (k1, v1) -> (k2, v2)

    1 // (k1, v1) -> (k2, v2)2 public class WordCountMapper extends Mapper3 {4 private final static IntWritable one = new IntWritable(1);5 private Text word = new Text();67 // k1=key (), v1=value ()8 public void map(Object key, Text value, Context context)9 throws IOException, InterruptedException

    10 {11 StringTokenizer itr = new StringTokenizer(value.toString());12 while (itr.hasMoreTokens())13 {14 word.set(itr.nextToken());15 // k2=word (), v2=1 ()16 context.write(word, one);17 }18 }19 }

  • .. Reducer k2 (k2, v2) -> (k3, v3)1 public class WordCountReducer2 // (k2, v2) -> (k3, v3)3 extends Reducer4 {5 private IntWritable result = new IntWritable();67 // k2=key (), v2=values ()8 public void reduce(Text key, Iterable values,9 Context context) throws IOException, InterruptedException

    10 {11 int sum = 0;12 for (IntWritable val : values)13 sum += val.get();14 result.set(sum);15 // k3=key (), v3=result ()16 context.write(key, result);17 }18 }

  • .. MapReduce Job/Driver Client Unit of WorkMapReduce Hadoop Job TaskMap Task Reduce TaskJob JobTracker TaskTracker Node JobTracker Task TaskTracker TaskTracker Task JobTracker

  • ..MapReduce Hadoop: The Definitive Guide 3rd Ed.

  • .. Job 1 public class WordCountJob2 {3 public static void main(String[] args) throws Exception4 {5 Configuration conf = new Configuration();6 Job job = new Job(conf, "word count");78 job.setJarByClass(com.wordcount.WordCountJob.class);9 job.setMapperClass(com.wordcount.WordCountMapper.class);

    10 job.setReducerClass(com.wordcount.WordCountReducer.class);1112 FileInputFormat.addInputPath(job, new Path("file", "", "input"));13 FileOutputFormat.setOutputPath(job, new Path("file", "", "output"));1415 job.setOutputKeyClass(Text.class);16 job.setOutputValueClass(IntWritable.class);1718 System.exit(job.waitForCompletion(true) ? 0 : 1);19 }20 }

  • ..Spring Datahttp://www.springsource.org/spring-data

    JPAMongoDBHadoop

  • .. Spring Data Hadoop1 2 89

    1011 1718 2122

  • .. Spring Data Hadoop1 public class WordCountJobSpringData2 {3 public static void main(String[] args) throws Exception4 {5 Abstrac