Course : Spark Java, developing applications for Big Data

Practical course - 3d - 21h00 - Ref. SPK
Price : 2360 CHF E.T.

Spark Java, developing applications for Big Data




Often presented as the successor to Hadoop, Spark simplifies the programming of big data processing, enabling the use of Scala, Python or Java . This training course will teach programmers how to process data streams in real time, and how to carry out batch processing (from SQL to machine learning).


INTER
IN-HOUSE
CUSTOM

In person or remote class
Available in English on request

Ref. SPK
  3d - 21h00
2360 CHF E.T.




Often presented as the successor to Hadoop, Spark simplifies the programming of big data processing, enabling the use of Scala, Python or Java . This training course will teach programmers how to process data streams in real time, and how to carry out batch processing (from SQL to machine learning).


Teaching objectives
At the end of the training, the participant will be able to:
Master the fundamental concepts of Spark
Developing applications with Spark Streaming
Setting up a Spark cluster
Exploiting data with Spark SQL
A first approach to machine learning

Intended audience
Project managers, data scientists, developers, architects.

Prerequisites
Good knowledge of Java. Knowledge of big data.

Practical details
Hands-on work
Practical application of the concepts covered in the course using the Java language.

Course schedule

1
Introducing Apache Spark

  • History of the framework.
  • The different versions of Spark (Scala, Python and Java).
  • Comparison with the Apache Hadoop environment.
  • The different Spark modules.
Hands-on work
Install and configure Spark. Run a first example with word counting.

2
Programming with Resilient Distributed Datasets (RDD)

  • RDD presentation.
  • Create, manipulate and reuse RDDs.
  • Accumulators and broadcast variables.
  • Use partitions.
Hands-on work
Handling different datasets with RDDs and using the API provided by Spark.

3
Handling structured data with Spark SQL

  • SQL, DataFrames and datasets.
  • The different types of data sources.
  • Interoperability with RDDs.
  • Spark SQL performance.
  • JDBC/ODBC server and Spark SQL CLI.
Hands-on work
Dataset manipulation via SQL queries. Connection with an external database via Java DataBase Connectivity (JDBC) Open Database Connectivity (OBDC).

4
Spark on a cluster

  • The different types of architecture: standalone, Apache Mesos or Hadoop YARN.
  • Set up a cluster in standalone mode.
  • Pack an application with its dependencies.
  • Deploy applications with Spark-submit.
  • Size a cluster.
Hands-on work
Setting up a Spark cluster.

5
Real-time analysis with Spark Streaming

  • Operating principle.
  • Introducing Discretized Streams (DStreams).
  • The different types of source.
  • API handling.
  • Comparison with Apache Storm.
Hands-on work
Log consumption with Spark Streaming.

6
Graph manipulation with GraphX

  • Introducing GraphX.
  • The different operations.
  • Create graphs.
  • Vertex and Edge RDD.
  • Presentation of different algorithms.
Hands-on work
Handling the GraphX API through various examples.

7
Machine learning with Spark

  • Introduction to machine learning.
  • The different classes of algorithms.
  • Introducing SparkML and MLlib.
  • Implementing different algorithms in MLlib.
Hands-on work
Using SparkML and MLlib.


Customer reviews
4,7 / 5
Customer reviews are based on end-of-course evaluations. The score is calculated from all evaluations within the past year. Only reviews with a textual comment are displayed.
RAPHAËL R.
08/10/25
4 / 5

Hello, I had difficulty collating the information from the exercises. I work slowly and perhaps I didn't have the required level.
CLEMENT L.
08/10/25
5 / 5

The course is very interesting and corresponds well to the plan announced. I would have liked to see a bit more about how to design a spark architecture, how to define the number of cores and partitions depending on the problem.
MOUNIR B.
09/04/25
5 / 5

The trainer was very educational and put a lot of practical exercises into practice, which was very much appreciated.



Publication date : 09/30/2025


Dates and locations

Last places available
Guaranteed date, in person or remotely
Guaranteed session
From 15 to 17 June 2026
FR
Remote class
Registration
From 14 to 16 September 2026
FR
Remote class
Registration
From 23 to 25 November 2026
FR
Remote class
Registration

REMOTE CLASS
2026 : 15 June, 14 Sep., 23 Nov.