Computer-based technology for Big Data

Goals

In many scientific fields, such as biology or environmental sciences, the rapid evolution of scientific instruments, as well as the intensive use of computer simulation, has led to a significant production of data in recent years. Scientific applications are now facing new problems related to the storage and use of these large volumes of data. The problem is much the same for the management of data collected by social networks, this time with the objective of commercial optimization.

The proposed teaching will allow students to discover 3 major technologies emblematic of big-data processing (MongoDB, Hadoop and Spark), which are widely used by companies or institutions that have to manage such volumes of data.

Programme

3 sessions of 2 hours each on MongoDB, Hadoop and Spark.
4 sessions of practical works on MongoDB, Hadoop and Spark.
1 practical work session of 2 hours on Spark MLlib.

Study

14h

Course

Code

24_I_G_S09_MSO_INFO_3_1

Responsibles

Alexandre SAIDI
Daniel MULLER
Mohsen ARDABILIAN
Stéphane DERRODE

Language

French

Keywords

Big Data, NoSQL, MongoDB, Hadoop, Spark, python

Offre de formation