Publication date : 09/30/2025

Course : Spark Java, developing applications for Big Data

Practical course - 3d - 21h00 - Ref. SPK
Price : 2010 € E.T.

Spark Java, developing applications for Big Data




Often presented as the successor to Hadoop, Spark simplifies the programming of big data processing, enabling the use of Scala, Python or Java . This training course will teach programmers how to process data streams in real time, and how to carry out batch processing (from SQL to machine learning).


INTER
IN-HOUSE
CUSTOM

Practical course in person or remote class
Available in English on request

Ref. SPK
  3d - 21h00
2010 € E.T.




Often presented as the successor to Hadoop, Spark simplifies the programming of big data processing, enabling the use of Scala, Python or Java . This training course will teach programmers how to process data streams in real time, and how to carry out batch processing (from SQL to machine learning).


Teaching objectives
At the end of the training, the participant will be able to:
Master the fundamental concepts of Spark
Developing applications with Spark Streaming
Setting up a Spark cluster
Exploiting data with Spark SQL
A first approach to machine learning

Intended audience
Project managers, data scientists, developers, architects.

Prerequisites
Good knowledge of Java. Knowledge of big data.

Practical details
Hands-on work
Practical application of the concepts covered in the course using the Java language.

Course schedule

1
Introducing Apache Spark

  • History of the framework.
  • The different versions of Spark (Scala, Python and Java).
  • Comparison with the Apache Hadoop environment.
  • The different Spark modules.
Hands-on work
Install and configure Spark. Run a first example with word counting.

2
Programming with Resilient Distributed Datasets (RDD)

  • RDD presentation.
  • Create, manipulate and reuse RDDs.
  • Accumulators and broadcast variables.
  • Use partitions.
Hands-on work
Handling different datasets with RDDs and using the API provided by Spark.

3
Handling structured data with Spark SQL

  • SQL, DataFrames and datasets.
  • The different types of data sources.
  • Interoperability with RDDs.
  • Spark SQL performance.
  • JDBC/ODBC server and Spark SQL CLI.
Hands-on work
Dataset manipulation via SQL queries. Connection with an external database via Java DataBase Connectivity (JDBC) Open Database Connectivity (OBDC).

4
Spark on a cluster

  • The different types of architecture: standalone, Apache Mesos or Hadoop YARN.
  • Set up a cluster in standalone mode.
  • Pack an application with its dependencies.
  • Deploy applications with Spark-submit.
  • Size a cluster.
Hands-on work
Setting up a Spark cluster.

5
Real-time analysis with Spark Streaming

  • Operating principle.
  • Introducing Discretized Streams (DStreams).
  • The different types of source.
  • API handling.
  • Comparison with Apache Storm.
Hands-on work
Log consumption with Spark Streaming.

6
Graph manipulation with GraphX

  • Introducing GraphX.
  • The different operations.
  • Create graphs.
  • Vertex and Edge RDD.
  • Presentation of different algorithms.
Hands-on work
Handling the GraphX API through various examples.

7
Machine learning with Spark

  • Introduction to machine learning.
  • The different classes of algorithms.
  • Introducing SparkML and MLlib.
  • Implementing different algorithms in MLlib.
Hands-on work
Using SparkML and MLlib.


Customer reviews
4,7 / 5
Customer reviews are based on end-of-course evaluations. The score is calculated from all evaluations within the past year. Only reviews with a textual comment are displayed.
RAPHAËL R.
08/10/25
4 / 5

Hello, I had difficulty collating the information from the exercises. I work slowly and perhaps I didn't have the required level.
CLEMENT L.
08/10/25
5 / 5

The course is very interesting and corresponds well to the plan announced. I would have liked to see a bit more about how to design a spark architecture, how to define the number of cores and partitions depending on the problem.
MOUNIR B.
09/04/25
5 / 5

The trainer was very educational and put a lot of practical exercises into practice, which was very much appreciated.



Dates and locations
Select your location or opt for the remote class then choose your date.
Remote class

Last places available
Guaranteed date, in person or remotely
Guaranteed session

REMOTE CLASS
2026 : 15 June, 14 Sep., 23 Nov.

PARIS LA DÉFENSE
2026 : 15 June, 14 Sep., 23 Nov.