Management and Analysis of Physics Datasets

Period: Second semester

Course unit contents: 

Part 1) Data Management
Introduction to data structures
Storage Models
Reliability
Authentication, Authorization
Local and Distributed File systems
Databases

Part 2) Data processing
Introduction to parallel processing
Distributed Computing Systems
Containerization
Hadoop as a paradigm for big data processing
Data processing with Spark
Data processing with Dask
Kafka as a distributed streaming platform

Planned learning activities and teaching methods: 

Frontal lectures for the introductory topics.
Hands-on sessions with live-coding examples run by the lecturers.
Exercises and examples to be done in the IT lab.

Ultime modifiche: mercoledì, 8 giugno 2022, 10:41