intellij-idea打包Scala代码在spark中运⾏
、创建好Maven项⽬之后(记得添加Scala框架到该项⽬),修改l⽂件,添加如下内容:
<properties>
<spark.version>2.1.1</spark.version>
<scala.version>2.11</scala.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
以ist结尾的单词<execution>
<goals>string format用法
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<args>
<arg>-target:jvm-1.5</arg>
</args>
</configuration>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.19</version>
<configuration>
<skip>true</skip>
</configuration>
</plugin>
</plugins>
</build>
其中保存之后,需要点击下⾯的import change,这样相当于是下载jar包
⼆、编写⼀个Scala程序,统计单词的个数
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object WordCount {
def main(args: Array[String]) {
if (args.length == 0) {
}
val input_path = args(0).toString
只会python能到工作吗val output_path = args(1).toString
val conf = new SparkConf().setAppName("WordCount")
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
val sc = new SparkContext(conf)
val inputFile = sc.textFile(input_path)
val countResult = inputFile.flatMap(line => line.split(""))
.
map(word => (word, 1))
.reduceByKey(_ + _)
.map(x => x._1 + "\t" + x._2)
.saveAsTextFile(output_path)
}
}
三、打包
file->Porject Structure->Artifacts->绿⾊的加号->JAR->
然后填写定义的类名,选择copy to..选项(打包这⼀个类)
python字符串拼接织梦cms系统正式收费点击ok之后,然后build->build Artifacts->build,等待build完成。然后可以在项⽬的这个⽬录中到刚刚打包的这个jar包
四、运⾏在spark集上⾯
1. 把jar包放到能访问spark集的机器上⾯
idea配置artifacts2. 运⾏
/usr/local/spark/bin/spark-submit --class WordCount --master spark://master:7077 /data/wangzai/package/WordCount.jar \ hdfs://master:9000/spark/test.data hdfs://master:9000/spark_output/spark_wordcount \
--executor-memory 1G \
--executor-cores 1 \
--num-executors 10
3. 结果
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论