Course : Real-time Big Data processing with Spark and Storm

Practical course - 3d - 21h00 - Ref. DSS
Price : 2010 € E.T.

Real-time Big Data processing with Spark and Storm




Big Data, known for its ability to process massive data, is now integrating a real-time component with the arrival of tools such as Spark and Storm. You'll discover the advantages of these tools, their real-time distributed computing systems and the notion of Streaming Big Data.


INTER
IN-HOUSE
CUSTOM

Practical course in person or remote class
Available in English on request

Ref. DSS
  3d - 21h00
2010 € E.T.




Big Data, known for its ability to process massive data, is now integrating a real-time component with the arrival of tools such as Spark and Storm. You'll discover the advantages of these tools, their real-time distributed computing systems and the notion of Streaming Big Data.


Teaching objectives
At the end of the training, the participant will be able to:
Understand the fundamentals of real-time big data application development
Evaluating Spark and Storm
Applying Storm and Spark real-time distributed computing systems
Process large quantities of data in real time

Intended audience
Designers, developers, architects.

Prerequisites
Good knowledge of software development. An understanding of big data issues is a plus.

Course schedule

1
Introduction to real-time architecture

  • Real-time processing.
  • Lambda architectures.
  • Kappa architectures.
  • SMACK architectures.
Hands-on work
Study of the implementation of a Kappa architecture for Spark and Strom.

2
Kafka architecture

  • The Kafka Producers, Brokers, Consumers overview.
  • Kafka's log files.
  • Avro schematics. Using ZooKeeper.
Hands-on work
Study of Kafka configuration in Kappa architecture.

3
Apache Storm architecture

  • Definition of the development environment.
  • Creation of Storm-based projects.
  • Definition of Storm components (Spout and Bolt).
  • Storm flow definition.
  • Data model (key, value).
  • Nimbus and ZooKeeper roles.
Case study
Kappa architecture implementation study for Storm.

4
Handling Storm messages

  • Service programming with Clojure, Java, Python.
  • Message life cycle.
  • The Storm API for defining reliability.
  • Reliability implementation strategy for a Big Data application.
Hands-on work
Implementation of a real-time social network processing project in the Kappa architecture.

5
Apache Spark architecture

  • The different versions of Spark (Scala, Python, R and Java).
  • Comparison with the Storm environment.
  • The different Spark modules.
  • The different types of architecture: Standalone, Apache Mesos or Hadoop YARN.
Hands-on work
Study of the implementation of the SMACK architecture for Spark.

6
Real time with Spark Streaming

  • Introducing Resilient Distributed Dataset (RDD) ?
  • Create, manipulate and reuse RDDs.
  • Accumulators and broadcast variables.
  • Operating principle.
  • The different types of source.
  • Comparison with Apache Storm.
Hands-on work
Implementation of a real-time social network processing project.

7
Other market players

  • Comparison of all streaming tools in the ecosystem (Storm, Spark Streaming, Flink, Samza).
  • Focus on Samza architecture.
Hands-on work
Study of the implementation of Kappa architecture with Samza.


Dates and locations

Last places available
Guaranteed date, in person or remotely
Guaranteed session
From 21 to 23 September 2026
FR
Remote class
Registration

REMOTE CLASS
2026 : 21 Sep.