Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
distributed-computing-assignements
3-bigdata-4-bigdata-spark-batch
Commits
c63cd1ff
Commit
c63cd1ff
authored
Nov 14, 2018
by
DANIEL DIAZ SANCHEZ
Browse files
Upload New File
parent
26930664
Pipeline
#49
canceled with stages
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
JavaWordCount.java
0 → 100644
View file @
c63cd1ff
package
cdist
;
import
java.util.Arrays
;
import
java.util.Iterator
;
import
scala.Tuple2
;
import
org.apache.spark.api.java.JavaRDD
;
import
org.apache.spark.api.java.JavaPairRDD
;
import
org.apache.spark.api.java.JavaSparkContext
;
import
org.apache.spark.api.java.function.FlatMapFunction
;
import
org.apache.spark.api.java.function.Function2
;
import
org.apache.spark.api.java.function.PairFunction
;
/* This example has been taken from Oreally examples */
public
class
JavaWordCount
{
public
static
void
main
(
String
[]
args
)
throws
Exception
{
// Set the input and output default files
// it can be a file from hdfs -> hdfs://server:port/path
// or a local file file:///path
// or a relative local file "fileName" under the working directory (ie. out)
String
inputFile
=
"file:///var/home/lab/asig/labgcd/workspace-cdist-spark-and-streaming/spark-aptel/in.txt"
;
String
outputFile
=
"out"
;
// let the user add params optionally to define input and output file
if
(
args
.
length
>
2
)
{
inputFile
=
args
[
0
];
outputFile
=
args
[
1
];
}
// Create a Java Spark Context, for the application with name "wordCount", use a local cluster
// (for using a existing cluster just substitute "local" with the name of the machine
JavaSparkContext
sc
=
new
JavaSparkContext
(
"local"
,
"wordcount"
,
System
.
getenv
(
"SPARK_HOME"
),
System
.
getenv
(
"JARS"
));
// Load our input data.
// will create an inmutable (RDD) set of strings (one per line)
JavaRDD
<
String
>
input
=
sc
.
textFile
(
inputFile
);
// Split up into words.
// make a map (line -> words in that line) and make it flat (so a sequence of words irrespectively of their line)
JavaRDD
<
String
>
words
=
input
.
flatMap
(
x
->
Arrays
.
asList
(
x
.
split
(
" "
)).
iterator
());
// Transform into word and count.
// associate 1 per word
// and then reduce by adding all the numbers per word (key)
JavaPairRDD
<
String
,
Integer
>
counts
=
words
.
mapToPair
(
s
->
new
Tuple2
<
String
,
Integer
>(
s
,
1
))
.
reduceByKey
((
x
,
y
)
->
x
+
y
);
// Save the word count back out to a text file, causing evaluation.
counts
.
saveAsTextFile
(
outputFile
);
}
}
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment