Installation and setup spark published

DIPENDRA KUSI https://www.linkedin.com/in/er-dipendra-kusi-b3674193

2/11/17 SPARK SETUP

Installation and Setup Spark

2/11/17 SPARK SETUP

Step 1: First setup the Cloudera

Step 2: Open terminal in Cloudera and start spark

usr/bin/spark-shell

Step 3: After start of spark we can write scala command to execute in spark using spark context

Now read the file from hdfs. Here there is input file in hdfs

val dt = sc.textFile("/user/cloudera/project_data/input") We can keep file in hdfs using:

hadoop fs -put file0 /user/cloudera/project_data/input

2/11/17 SPARK SETUP

Step 4: Now, we will split the text content based on whitespace and then count the word

val wordcount = dt.flatMap(x=>x.split(" ")).map(x=>(x,1)) .reduceByKey((a,b)=>a+b))

Step 5: Now print the result:

for(value <- wordcount) {println(value)}

2/11/17 SPARK SETUP

Integrate the Spark in eclipse:

Step 1: First go to eclipse and setup the scala plugin.

Go to Help-> Eclipse Market Place

Step 2: Now search scala plugin and install the plugin

2/11/17 SPARK SETUP

Click on install

Click on confirm

Then, Accept and install

Step 3: Now, check whether scala plugin is installed or not in eclipse

Go to New-> other-> type scala

2/11/17 SPARK SETUP

If there is scala App then scala plugin is installed

Step 4: Now create maven project

Got to New->other-> type maven project -> next->next->next

2/11/17 SPARK SETUP

Step 5: Now give the

Group Id: edu.sparkproject

Artifact Id: WordCount

Click Finish

Step 6:

Now go to pom.xml file and edit dependency to spark

2/11/17 SPARK SETUP

Step 7: Now Copy and paste the code below in pom.xml

Link: http://pastebin.com/V5n0hM5P

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.scalaproject</groupId> <artifactId>scalaproject</artifactId> <version>0.0.1-SNAPSHOT</version> <pluginRepositories> <pluginRepository> <id>scala-tools.org</id> <name>Scala-tools Maven2 Repository</name> <url>http://scala-tools.org/repo-releases</url> </pluginRepository> </pluginRepositories> <repositories> <repository> <id>pele.farmbio.uu.se</id> <url>http://pele.farmbio.uu.se/artifactory/libs-snapshot</url> </repository> </repositories> <dependencies>

2/11/17 SPARK SETUP

<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.0</version> </dependency> </dependencies> <build> <plugins>  <plugin> <groupId>org.scala-tools</groupId> <artifactId>maven-scala-plugin</artifactId> <executions> <execution> <id>compile</id> <goals> <goal>compile</goal> </goals> <phase>compile</phase> </execution> <execution> <id>test-compile</id> <goals> <goal>testCompile</goal> </goals> <phase>test-compile</phase> </execution> <execution> <phase>process-resources</phase> <goals> <goal>compile</goal> </goals> </execution> </executions> </plugin> <plugin> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.7</source> <target>1.7</target> </configuration> </plugin>  <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>2.4</version> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration>

2/11/17 SPARK SETUP

<executions> <execution> <id>assemble-all</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-jar-plugin</artifactId> <configuration> <archive> <manifest> <addClasspath>true</addClasspath> <mainClass>fully.qualified.MainClass</mainClass> </manifest> </archive> </configuration> </plugin> </plugins> <pluginManagement> <plugins>  <plugin> <groupId>org.eclipse.m2e</groupId> <artifactId>lifecycle-mapping</artifactId> <version>1.0.0</version> <configuration> <lifecycleMappingMetadata> <pluginExecutions> <pluginExecution> <pluginExecutionFilter> <groupId>org.scala-tools</groupId> <artifactId> maven-scala-plugin </artifactId> <versionRange> [2.15.2,) </versionRange> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </pluginExecutionFilter> <action> <execute></execute> </action> </pluginExecution> </pluginExecutions>

2/11/17 SPARK SETUP

</lifecycleMappingMetadata> </configuration> </plugin> </plugins> </pluginManagement> </build> </project>

Now save it. It will download all the dependency.

Step 8: Now convert the project into Scala project

First delete the src/test/java folder

Now fix the error by clicking in quick fix and ok.

The error will disappear.

2/11/17 SPARK SETUP

Step 9: Now convert project into Scala Nature

Step 10:

Right click on project -> properties

2/11/17 SPARK SETUP

Step 11:

Now go to Scala Compiler -> tick on Use Project Setting -> select Fixed Scala Installation 2.10.6-> Apply ->

(Spark only support Scala version 2.10 so we need to match the scala version running on Spark )

2/11/17 SPARK SETUP

Step 12: Then go to Java Build Path -> remove Scala Library Container

(Spark core contain Scala Library Container so no need to have library here)

Now rename the package to Scala

2/11/17 SPARK SETUP

Step 13: Now add the Scala Object File

2/11/17 SPARK SETUP

Give the Scala Object Name -> Count

2/11/17 SPARK SETUP

Step 14:

Now copy code from and paste into Word.scala file

Link: http://pastebin.com/XNpbcJ2z

package com.scalaproject.scalaproject import org.apache.spark.SparkConf import org.apache.spark.SparkContext import java.nio.file.{Paths, Files} import java.io._ import org.apache.commons.io.FileUtils import org.apache.commons.io.filefilter.WildcardFileFilter import scala.collection.immutable

2/11/17 SPARK SETUP

object WordCount { def main(args: Array[String]) = { //Start the Spark context val conf = new SparkConf() .setAppName("WordCount") .setMaster("local") val sc = new SparkContext(conf) val test = sc.textFile("input.txt") test.flatMap( x => x.split("\\s+")).map(x=>(x,1)).reduceByKey((a,b)=>a+b).saveAsTextFile("output") //Stop the Spark context sc.stop } def splitting(v:String): Array[String] = { v.split(" ") } }

Step 15:

Now add the input.txt file as input file to be processed.

2/11/17 SPARK SETUP

Add the text to input.txt file so that we can process it.

2/11/17 SPARK SETUP

Step 16: Now run the code

Step 17: Refresh the project.

You will see the output folder in the project-> go inside it there will be part-0000 that contain the output

2/11/17 SPARK SETUP

Installation and setup spark published

Data & Analytics

Spark & Spark SQL

Women in Physiology ver4 · Published Life at the Extremes, the science of survival (Harper Collins) Charter medal, Institute of Biology Published The Spark of Life- electricity in

NGK SPARK pÚü6s RESISTOR TYPE SPARK PLUGS SPARK PLUGS ... · ngk spark pÚü6s resistor type spark plugs spark plugs bougies bujias

SPARK PLUGS - Anderson Automotive Enterprises · 2008. 7. 3. · SPARK PLUGS Part 2 – Heat Range By: William C. “Bill” Anderson, P.E. Reformatted for this website and published

Kerberizing spark. Spark Summit east

Spark summit2014 techtalk - testing spark

Introduc+ontoParallel Compu+ngwithApacheSpark · 7 Spark"Background" • AmplabUCBerkeley" • ProjectLead:"Dr." Matei&Zaharia& • Firstpaper"published"on"RDD’s"was"in"2012" •

SPARK Guide for Students (2)€¦ · Once the SPARK PLUS results for your unit have been published Now that your unit’s teaching team has notified you that the SPARK results have

Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel

BeatStep and Spark - Arturiadownloads.arturia.com/downloads/support/howto/pdf/BeatStep... · BeatStep and Spark : ... Clap Edit setup SPARK - SEQUENCER TUNE mode VDM ... Switch pattern

Spark Plug Thread Repair Spark Plug Spark Plug Sockets for Ford

[Spark meetup] Spark Streaming Overview

Installation and setup hadoop published

User Guide - HARMAN€¦ · 4 Hara Spark sr u Setup 4. Registration Successful Tap OK and close the window 5. Welcome! Meet the Spark Assistant that will guide you through the rest

Apache Spark - events.static.linuxfound.org€¦ · Apache Spark!! For High-Throughput Systems Michael Starch – NASA Jet Propulsion Laboratory . Agenda • Basic Concepts • Setup

Spark Platform Spark Core Spark Extensions Using … Platform Spark Core Spark Extensions Using Apache Spark About me Vitalii Bondarenko Data Platform Competency Manager Eleks 20 years

Spark Plug Thread Repair Spark Plug Spark Plug Sockets for

Overviewcna.mikkeliamk.fi/Public/Setup/Microsoft_Virtual_Machine... · Web viewMicrosoft Virtual Machine Converter Administration Guide Version 3.1 Microsoft Corporation Published:

Crunch Spark with Apache From MapReduce to · Spark with Apache Crunch Micah Whitacre @mkwhit. Invested in learning. Invested in learning Setup production clusters. Invested in learning

Quick Start: Using Apache Spark for Large-Scale Data ...export PYSPARK_PYTHON="$(which python)" fi • On Cooley, interactive Spark jobs setup IPython notebook by defaults. You can