Publication date : 09/29/2025

Course : Big Data - Python for data analysis

Practical course - 3d - 21h00 - Ref. PBD
Price : 2010 € E.T.

Big Data - Python for data analysis




The Python language provides a scientific ecosystem for statistical processing: from the construction of analysis models to their evaluation and representation. This training course will enable you to analyze data from a variety of sources using Python libraries.


INTER
IN-HOUSE
CUSTOM

Practical course in person or remote class
Available in English on request

Ref. PBD
  3d - 21h00
2010 € E.T.




The Python language provides a scientific ecosystem for statistical processing: from the construction of analysis models to their evaluation and representation. This training course will enable you to analyze data from a variety of sources using Python libraries.


Teaching objectives
At the end of the training, the participant will be able to:
Understanding the principle of statistical modeling
Know how to use the main data processing and analysis tools for Python
Know how to apply best practices in data cleaning and preparation prior to analysis
Choosing between regression and classification according to data type
Learn how to set up a simple learning model
Be able to extract data from a file

Intended audience
Python developers, data center managers, software developers, programmers, data analysts, data scientists.

Prerequisites
Mastery of Python programming. Basic knowledge of statistics or completion of the course "Statistics, mastering the fundamentals" (Ref. STA).

Practical details
Developing/performing analyses with Python, using Pandas, Numpy and SciPy modules.

Course schedule

1
Introduction to the scientific Python ecosystem

  • Overview of Python's scientific ecosystem: must-have libraries.
  • Know where to find new bookstores and assess their sustainability.
  • The main open source tools and software for data science.
Hands-on work
Installation of Python 3, Anaconda and Jupiter Notebook.

2
Working with data in Python

  • Python's scientific foundation: the SciPy Stack.
  • Best practices for getting your data science project off to a good start with Python.
  • Scientific file formats and libraries for manipulating them.
  • Pandas: analysis of tabular data (CSV files, Excel, etc.), statistics, pivots, filters, searches, etc.
  • NumPy: numerical calculation and linear algebra (vectors, matrices, images).
  • Data extraction, preparation, cleaning.
Hands-on work
Write Python scripts to work with data from files, to apply filters, formatting and cleaning processes.

3
Introduction to modeling

  • Steps in building a model.
  • Supervised and unsupervised algorithms.
  • The choice between regression and classification.
Hands-on work
Integration of Python scripts in the installed environment, for analysis.

4
Model evaluation procedures

  • Re-sampling techniques in training, validation and test games.
  • Testing the representativeness of training data.
  • Performance measurements for predictive models.
  • Confusion and cost matrix, ROC and AUC curves.
Hands-on work
Set up dataset sampling. Perform evaluation tests on several supplied models.

5
Supervised algorithms

  • The principle of univariate linear regression.
  • Multivariate regression.
  • Polynomial regression.
  • Regularized regression.
  • The Naive Bayes.
  • Logistic regression.
Hands-on work
Implement regressions and classifications on several types of data.

6
Unsupervised algorithms

  • Hierarchical clustering.
  • Non-hierarchical clustering.
  • Mixed approaches.
Hands-on work
Unsupervised clustering on multiple datasets.


Customer reviews
4,6 / 5
Customer reviews are based on end-of-course evaluations. The score is calculated from all evaluations within the past year. Only reviews with a textual comment are displayed.
CAMILLE L.
03/11/25
5 / 5

The content was very good, the pace was fast and the speaker was educational and funny.
NICOLAS M.
03/11/25
4 / 5

interesting and passionate speaker, who took the time to explain and open up the subject. A great time
GAËL R.
03/11/25
4 / 5

I had misjudged the content of the training offered by my employer following my request to progress in Python. According to the title, I was expecting more of a Data Analyst than a Data Scientist, using Python to load and transform data (ETL). I probably wouldn't be able to use the statistics part, but it was very interesting, and it seems to me that it would be possible to put together an 'operational' package for this ETL part in 3 days.



Dates and locations
Select your location or opt for the remote class then choose your date.
Remote class

Last places available
Guaranteed date, in person or remotely
Guaranteed session

REMOTE CLASS
2026 : 22 June, 21 Sep., 30 Nov.

PARIS LA DÉFENSE
2026 : 22 June, 21 Sep., 30 Nov.