Big Java, Big Data

  • Published on
    11-Jul-2015

  • View
    2.636

  • Download
    0

Embed Size (px)

Transcript

<ul><li><p>..Big Java, Big Data / Monster Su</p></li><li><p>.. Big Java, Big Data</p><p>monster.kcsu@gmail.com</p><p>http://www.facebook.com/monster.kcsu</p><p>July 20, 2012</p></li><li><p>.. Profile</p><p> Java </p><p> XML/Web ServicesDesignPatternsEJB/JPA Java EE Struts/Spring/Hibernate Open SourceFramework JBoss ASGlassFish Application Server</p><p> Apache HadoopGoogle App EngineMicrosoft Azure Cloud PlatformiOSAndroidWindows Phone SmartHandheld Device </p></li><li><p>.. Outline</p><p>...1 Introduction</p><p>...2 Big Java</p><p>...3 Big Data</p><p>...4 Tool</p><p>...5 Summary</p></li><li><p>...1 Introduction</p><p>...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform</p><p>...3 Big DataRDBMS DataNoSQL DataSocial Network Data</p><p>...4 Tool</p><p>...5 Summary</p></li><li><p>..Big Data http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/</p><p> 12 12 </p></li><li><p>..Big Data - </p></li><li><p>..Big Data - </p></li><li><p>.. Big Data </p><p> RDBMS Log</p></li><li><p>..Big Data Vhttp://www.ibm.com/developerworks/data/library/dmmag/DMMag_2011_Issue2/BigData/</p></li><li><p>..A Very Short History of Big Datahttp://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/</p><p>1941 Information Explosion 1944 16 </p><p>2040 Yale 2 108 6,000 6,000 </p><p>2005 Data is the next Intel inside.SQL is the new HTML. (Tim O'Reilly)</p><p>2007 IDC 2006 161 EB 2010 988EB 2010 2011 2010 1,200EB2011 1,800 EB</p></li><li><p>.. Big Data </p></li><li><p>.. Big Data </p></li><li><p>..Big Data - IBMhttp://www-01.ibm.com/software/tw/data/bigdata/cases.html</p></li><li><p>..Big Data - Microsofthttp://www.windowsazure.com/en-us/home/case-studies/</p></li><li><p>.. Big Data Big Java - </p><p>Virtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform</p><p>Big Data - RDBMS Multi-Tenancy SupportNoSQL Database SupportFramework for RDBMS/NoSQL/Social</p></li><li><p>...1 Introduction</p><p>...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform</p><p>...3 Big DataRDBMS DataNoSQL DataSocial Network Data</p><p>...4 Tool</p><p>...5 Summary</p></li><li><p>...1 Introduction</p><p>...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform</p><p>...3 Big DataRDBMS DataNoSQL DataSocial Network Data</p><p>...4 Tool</p><p>...5 Summary</p></li><li><p>.. 32-Bit vs. 64-Bit CPU32-Bit CPU Address Space 4 GB Windows Kernel Mode 2 GB User Mode 2 GBJVM Heap Address Space 1.2-1.8 GB64-Bit CPU Register Address SpaceHeap Thread Performance Memory </p></li><li><p>..Hotspot VM FAQhttp://www.oracle.com/technetwork/java/hotspotfaq-138619.html</p><p>A 64-bit implementation means that manyof the built-in Java types are doubled insize from 32 to 64. This is not true.All existing 100% pure Java programswould continue running just as they dounder a 32-bit VM.There's no public API that allows you todistinguish between 32 and 64-bitoperation.However, if you'd like to write code which isplatform specific (shame on you), </p><p> Sunny Chan - Java 7 Hotspot VM </p></li><li><p>..64-Bit JVM Optimization32-Bit Speed + 64-Bit Space</p><p>64-Bit Address 8-Byte AlignmentPerformance Address 8 3 Bit 0 Higher-Order Bits Lower-Order Zero Bits </p><p>64-Bit Execution32-Bit Pointer Length (Address &gt;&gt; 3)32 GB Heap Size (4 GB 23)</p></li><li><p>..64-Bit JVM Optimization32-Bit Speed + 64-Bit Space</p><p>SunCompressed Oops(Oops = Ordinary Object Pointers)(JDK6u23 )IBMPointer Compression/Compressed RefsOracle/BEAJRockit -XX Command-line Options</p></li><li><p>.. 32-Bit 64-Bit System Property </p><p>os.arch</p><p>sun.arch.data.model</p><p> Java Interpreter 4 GB (java -Xms4g Xmx4g) java -d32 -version java -d64 -version java -version (JDK 1.6 ) file /usr/bin/java (Unix )</p></li><li><p>.. 32-Bit 64-Bit System Property1 public class OSArchitecture2 {3 public static void main(String[] args)4 {5 System.out.print("os.arch=");6 System.out.println(System:getProperty("os:arch"));7 System.out.print("sun.arch.data.model=");8 System.out.println(System:getProperty("sun:arch:data:model"));9 }</p><p>10 }</p></li><li><p>.. 32-Bit 64-Bit Java Interpreter</p></li><li><p>...1 Introduction</p><p>...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform</p><p>...3 Big DataRDBMS DataNoSQL DataSocial Network Data</p><p>...4 Tool</p><p>...5 Summary</p></li><li><p>.. Multi-Core/Thread J2SE 1.0 - J2SE 1.4</p><p> Thread Runnable synchronizedwaitnotify JVM Thread OS Thread Thread OS Workload Core JVM Heap Size Thread Immutable Object finalMutable Object volatile Concurrency </p></li><li><p>..JSR 166: Concurrency UtilitiesJava SE 5 - Java SE 7</p><p>Doug Lea Spec LeadConcurrent Programming in Java Java SE 5 java.util.concurrent Multi-Core Threading/Hardware Level API ParallelismJava SE 7 Fork/Join FrameworkJDK7u4 G1 Garbage Collector Multi-Core </p><p> -Actor Model </p></li><li><p>.. Processor Runtime.getRuntime().availableProcessors()</p><p>Java Application CPU CPU Hyper-Threading CPU </p><p>1 public class AvailableProcessors2 {3 public static void main(String[] args)4 {5 System.out.println(Runtime:getRuntime():availableProcessors());6 }7 }</p></li><li><p>.. Thread Pool SizeThread Pool </p><p> Runtime.getRuntime().availableProcessors() CPU-Bound Application CPU Thread Pool Performance</p><p>1 ExecutorService e =2 Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());3 e.execute(new Runnable() {4 public void run() {5 // do one task6 }7 });</p></li><li><p>...1 Introduction</p><p>...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform</p><p>...3 Big DataRDBMS DataNoSQL DataSocial Network Data</p><p>...4 Tool</p><p>...5 Summary</p></li><li><p>..Ateji PXJava Parallel Programming Made Simple - http://www.ateji.com/px/</p><p> Patrick Viry 2010 7 Thread Hardware Concept Natural Construct Language Level Parallelism Shared/Distributed Memory </p></li><li><p>.. JOMPhttp://www2.epcc.ed.ac.uk/computing/research_activities/jomp/OpenMP-Like Shared-Memory ParallelProgramming in Java Compiler DirectiveLibraryRoutine System Property JOMP Preprocessor Source Code JavaPerformance Hand-Coded Multi-Thread</p><p>1 //omp parallel shared(a,b,n)2 {3 //omp for4 for (i = 1 ; i &lt; n ; i++) {5 b[i] = (a[i] + a[i-1]) * 0.5;6 }7 }</p></li><li><p>...1 Introduction</p><p>...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform</p><p>...3 Big DataRDBMS DataNoSQL DataSocial Network Data</p><p>...4 Tool</p><p>...5 Summary</p></li><li><p>.. Apache Hadoop</p><p> Hadoop Node Scalability Failure </p></li><li><p>.. Hadoop Hadoop Google </p><p>2003 SOSP The Google File System2004 OSDI MapReduce: Simplified Data Processing onLarge Cluster2006 OSDI Bigtable: A Distributed Storage System forStructured Data</p></li><li><p>..Hadoop http://hadoop.apache.org/common/releases.html</p><p>0.20 0.20.X 0.20 0.20.20X.YLegacy Stable Security Branch 0.20.205.0 1.0.XCurrent Stable0.23.XCurrent Alpha MapReduce 21.1.XCurrent Beta0.22.X Feature Security </p></li><li><p>.. Hadoop 1.0.X 0.20.205.0 0.20 SecurityBranch Bug Fix Performance Enhancement Enterprise-Ready Hadoop MapReduce HDFS HBase Kerberos HDFS RESTful API (WebHDFS)</p></li><li><p>.. MapReduce ()Input Splitting -&gt; -&gt; -&gt; (k1, v1)Mapper ( n , XXX) -&gt; (XXX, 1) (k1, v1) Mapper (k2, v2)</p><p>Shuffling n (XXX, 1) -&gt; (XXX, n) k2 List(k2, v2) (k2, sum(v2))Reducer (XXX, n) -&gt; (XXX, m) Reducer List(k3, v3)</p></li><li><p>..Hello MapReduce - Word Counthttp://www.rabidgremlin.com/data20/</p></li><li><p>..Mapper (k1, v1) -&gt; (k2, v2)</p><p>1 // (k1, v1) -&gt; (k2, v2)2 public class WordCountMapper extends Mapper3 {4 private final static IntWritable one = new IntWritable(1);5 private Text word = new Text();67 // k1=key (), v1=value ()8 public void map(Object key, Text value, Context context)9 throws IOException, InterruptedException</p><p>10 {11 StringTokenizer itr = new StringTokenizer(value.toString());12 while (itr.hasMoreTokens())13 {14 word.set(itr.nextToken());15 // k2=word (), v2=1 ()16 context.write(word, one);17 }18 }19 }</p></li><li><p>.. Reducer k2 (k2, v2) -&gt; (k3, v3)1 public class WordCountReducer2 // (k2, v2) -&gt; (k3, v3)3 extends Reducer4 {5 private IntWritable result = new IntWritable();67 // k2=key (), v2=values ()8 public void reduce(Text key, Iterable values,9 Context context) throws IOException, InterruptedException</p><p>10 {11 int sum = 0;12 for (IntWritable val : values)13 sum += val.get();14 result.set(sum);15 // k3=key (), v3=result ()16 context.write(key, result);17 }18 }</p></li><li><p>.. MapReduce Job/Driver Client Unit of WorkMapReduce Hadoop Job TaskMap Task Reduce TaskJob JobTracker TaskTracker Node JobTracker Task TaskTracker TaskTracker Task JobTracker</p></li><li><p>..MapReduce Hadoop: The Definitive Guide 3rd Ed.</p></li><li><p>.. Job 1 public class WordCountJob2 {3 public static void main(String[] args) throws Exception4 {5 Configuration conf = new Configuration();6 Job job = new Job(conf, "word count");78 job.setJarByClass(com.wordcount.WordCountJob.class);9 job.setMapperClass(com.wordcount.WordCountMapper.class);</p><p>10 job.setReducerClass(com.wordcount.WordCountReducer.class);1112 FileInputFormat.addInputPath(job, new Path("file", "", "input"));13 FileOutputFormat.setOutputPath(job, new Path("file", "", "output"));1415 job.setOutputKeyClass(Text.class);16 job.setOutputValueClass(IntWritable.class);1718 System.exit(job.waitForCompletion(true) ? 0 : 1);19 }20 }</p></li><li><p>..Spring Datahttp://www.springsource.org/spring-data</p><p> JPAMongoDBHadoop</p></li><li><p>.. Spring Data Hadoop1 2 89 </p><p>1011 1718 2122 </p></li><li><p>.. Spring Data Hadoop1 public class WordCountJobSpringData2 {3 public static void main(String[] args) throws Exception4 {5 AbstractApplicationContext ctx =6 new ClassPathXmlApplicationContext("springdata.xml");7 ctx.registerShutdownHook();8 }9 }</p></li><li><p>.. Word Count </p></li><li><p>...1 Introduction</p><p>...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform</p><p>...3 Big DataRDBMS DataNoSQL DataSocial Network Data</p><p>...4 Tool</p><p>...5 Summary</p></li><li><p>...1 Introduction</p><p>...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform</p><p>...3 Big DataRDBMS DataNoSQL DataSocial Network Data</p><p>...4 Tool</p><p>...5 Summary</p></li><li><p>.. Multitenancy ()Tenant Cloud Computing Service/Application ClientMultitenancy Application Tenant Tenant Multitenancy </p><p>Application Service Provider </p></li><li><p>.. Hibernate ORMHibernate-&gt;Hibernate Core-&gt;Hibernate Distribution-&gt;Hibernate ORM2011 12 4.0 JDK 1.6 JDBC 4.0 Hibernate Multitenant Database JBoss LoggingCache OSGi 2012 7 4.1.5.SP1 </p></li><li><p>..Hibernate ORM - MultitenancyDATABASE - Separate Database</p><p>http://msdn.microsoft.com/en-us/library/aa479086.aspx</p></li><li><p>..Hibernate ORM - MultitenancySCHEMA - Shared Database, Separate Schema</p><p>http://msdn.microsoft.com/en-us/library/aa479086.aspx</p></li><li><p>..Hibernate ORM - MultitenancyDISCRIMINATOR - Shared Database, Shared Schema</p><p>http://msdn.microsoft.com/en-us/library/aa479086.aspx</p></li><li><p>..Hibernate ORM - Multitenancyhttp://docs.jboss.org/hibernate/orm/4.1/devguide/en-US/html/ch16.html</p><p> hibernate.multiTenancy NONE</p><p> Multitenancy DATABASE</p><p>Separate DatabaseSCHEMA</p><p>Shared DatabaseSeparate SchemaDISCRIMINATOR</p><p>Shared DatabaseShared SchemaDiscriminator Data ( 5.0 )</p></li><li><p>.. MultiTenantConnectionProvider 1 public class MySQLMultiTenantConnectionProvider2 extends AbstractMultiTenantConnectionProvider3 {4 @Override5 protected ConnectionProvider getAnyConnectionProvider()6 {7 return MySQLConnectionProviderBuilder.MONSTER_CONNECTION_PROVIDER;8 }9</p><p>10 @Override11 protected ConnectionProvider selectConnectionProvider(String tenantId)12 {13 if (tenantId.equals("monster"))14 return MySQLConnectionProviderBuilder.MONSTER_CONNECTION_PROVIDER;15 else if (tenantId.equals("supreme"))16 return MySQLConnectionProviderBuilder.SUPREME_CONNECTION_PROVIDER;17 throw new HibernateException("Unknown tenantId");18 }19 }</p></li><li><p>.. Configuration 1 2 3 4 com.mysql.jdbc.Driver5 6 root7 password8 9 org.hibernate.dialect.MySQL5InnoDBDialect</p><p>10 1112 DATABASE13 14 15 com.javatwo.helper.MySQLMultiTenantConnectionProvider16 17 1819 20 21 </p></li><li><p>.. Session 1 public class MultiTenancy2 {3 public static void insertReader(String tenantId)4 {5 Reader reader = new Reader(tenantId, tenantId, 1, tenantId+"@iii");6 SessionFactory factory = HibernateUtil.getSessionFactory();7 Session session =8 factory:withOptions():tenantIdentifier(tenantId):openSession();9 try</p><p>10 {11 session.beginTransaction();12 session.save(reader);13 session.getTransaction().commit();14 }15 catch (Exception ex) { session.getTransaction().rollback(); }16 finally { session.close(); }17 }1819 public static void main(String[] args)20 {21 insertReader("monster"); insertReader("supreme");22 }23 }</p></li><li><p>.. Connection Pool</p></li><li><p>.. Database</p></li><li><p>.. Java Persistence APIJSR 317 - Java Persistence 2.0</p><p>RIEclipseLink 2.3GlassFish 3.1.1 Tenant Discriminator Column Shared Multitenant Table</p><p>JSR 338 - Java Persistence 2.1 Early Draft ReviewRIEclipseLink 2.42012 6 Eclipse Juno (4.2) Tenant Isolation NoSQL</p></li><li><p>...1 Introduction</p><p>...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform</p><p>...3 Big DataRDBMS DataNoSQL DataSocial Network Data</p><p>...4 Tool</p><p>...5 Summary</p></li><li><p>.. RDBMS Schema </p><p>Twitter AlterTable Schema </p><p>Join Storage Normalization Join Join </p><p>Consistency Guaranteed Consistency Immediate Consistency Eventual Consistency </p></li><li><p>.. NoSQL Wikipedia</p><p>It does not use SQL as its query language.It may not give full ACID guarantees.It has a distributed, fault-tolerantarchitecture.</p><p>BigData DiaryNoSQL is a movement promoting a looselydefined class of non-relational data stores.These data stores may not require fixedtable schemas, usually avoid joinoperations and typically scale horizontally.</p></li><li><p>.. NoSQL NoSQL </p><p>Raw PerformanceTransparent Scalability</p><p> RDBMS Cluster Sharding SQL NoSQL </p></li><li><p>.. NoSQL Database NoSQL </p><p>Key/ValueHBaseDynamoCassandraIn-MemorymemcachedDocumentCouchDBMongoDBGraphNeo4j</p><p> - Apache Cassandra</p></li><li><p>..Spring Datahttp://www.springsource.org/spring-data</p><p> JPAMongoDBHadoop</p></li><li><p>.. Spring Data Spring </p><p> Spring RDBMSNoSQLMapReduce Framework RDBMS/NoSQL </p><p> -Spring MVC RequireJS &amp; Backbone.js &amp;Spring Data JPA </p></li><li><p>.. MongoDB Java Driver 1 public class MongoDB2 {3 public static void main(String[] args) throws Exception4 {5 Mongo mongo = new Mongo("localhost", 27017);6 DB db = mongo.getDB("JavaTwo");7 DBCollection collection = db.getCollection("zips");89 BasicDBObject doc = new BasicDBObject();</p><p>10 doc.put("zip", "90210");11 doc = (BasicDBObject) collection.findOne(doc);1213 Gson gson = new Gson();14 City city = gson.fromJson(doc.toString(), City.class);15 Location loc = city.getLoc();1617 System.out.println("City = " + city.getCity());18 System.out.println("Location = " + loc.getY() + ", " + loc.getX());19 }20 }</p></li><li><p>.. Spring Data MongoDB 1 public class SpringData2 {3 public static void main(String[] args) throws Exception4 {5 Mongo mongo = new Mongo("localhost", 27017);6 MongoOperations mongoOps = new MongoTemplate(mongo, "JavaTwo");78 Query query = new Query(Criteria.where("zip").is("90210"));9 System.out.println("Found = " + mongoOps.count(query, "zips"));</p><p>1011 City city = mongoOps.findOne(query, City.class, "zips");12 Location loc = city.getLoc();1314 System.out.println("City = " + city.getCity());15 System.out.println("Location = " + loc.getY() + ", " + loc.getX());16 }17 }</p></li><li><p>...1 Introduction</p><p>...2 Big JavaVirtual MachineMulti-Core/Multi-Thread SupportLanguage ExtensionHadoop Platform</p><p>...3 Big DataRDBMS DataNoSQL DataSocial Network Data</p><p>...4 Tool</p><p>...5 Summary</p></li><li><p>..Spring Socialhttp://www.springsource.org/spring-social</p><p> FacebookTwitter LinkedIn</p></li><li><p>.. Spring Social SaaS( Commmunity ) Authentication Authorization SaaS Provider OAuth 1.0 1.0a 2.0 Twitter PublicData Service </p></li><li><p>.. Spring Social Twitter 1 public class Timeline2 {3 public static void main(String[] args)4 {5 TwitterTemplate twitterTemplate = new TwitterTemplate();6 TimelineOperations timelineOps = twitterTemplate.timeli...</p></li></ul>