The motivation is to establish a foundation for analysing the data access behaviour of modern and popular data-intensive systems over modern persistent storage (e.g., NVMe SSDs), because it has not been studied as thoroughly unlike the memory access behaviour of such systems. This type of analysis is necessary to understand the requirements of data-intensive systems, and how to morph their data access patterns to take better advantage of modern fast persistent storage. Therefore, the goal is to identify a set of popular and modern state-of-the-art data-intensive systems and establish a methodology for conducting the disk access trace analysis.

The intended learning outcomes for this research project are:

- Survey related work for data access analysis of data-intensive systems.
- Identify a set of modern and popular state-of-the-art data-intensive systems for disk access trace analysis over modern persistent storage.
- Identify a set of workloads that matches these data-intensive systems.
- Investigate profiling tools to trace disk accesses.
- Design an experimental method and setup for analysing the disk access behaviour of modern and popular data-intensive systems over modern persistent storage.

The method will be experimental.