Summary of Project name: Data pre-processing in the Golden Age of Computer Architecture

Project name: Data pre-processing in the Golden Age of Computer Architecture

A machine learning pipeline can be defined as a sequence of predefined stages which serve to automate the workflow of machine learning tasks. Pipelines differ depending on the task at hand, and are highly customisable. Although pipelines are inherently flexible, three stages are commonplace in any pipeline. These stages are data preparation, model training and model evaluation.

As coined by (Hennessy and Patterson 2019), we are in the Golden Age of computer architecture, where one must acknowledge the end of Dennard scaling and Moore’s law and instead turn to hardware accelerators to look for performance gains. This has certainly been acknowledged within the field of Machine Learning. One concrete example is Google’s development of the Tensor Processing Unit (Jouppi et al. 2017). Another example is a wide array of machine learning researchers and practitioners targeting GPU’s for training complex models. This trend is intuitive. Neural network perform many operations which are independent in nature, something which the GPU is specialized for. But as the models become more complex, so does the availability of data. This availability is both with regards to the complexity, but also the
sheer volumes of data. Since processing large volumes of data is something the GPU is also designed for, why is this stage seemingly executed on the CPU? To be precise, the research question at hand is "What, if such exists, trade-offs exist between executing data-preprocessing on the GPU and CPU with regards to throughput, latency and power consumption?".

Although there are libraries designed for moving this stage to the GPU as well, such as NVIDIA’s DALI1, there seem to be little to no pointers as to when it is feasible, both computationally and monetarily, to do so.
The primary aim of this project is to assess, if there for commonplace arithmetic operations in the preprocessing step, are benefits of offloading these to the GPU. In order to assess this, the two-fold aim of this project is unveiled. Following (García-Martín et al. 2019), it is agreed that the machine learning community lacks investigations into energy consumption, and that improvements to metrics such as accuracy should take energy consumption into account. This investigation of energy consumption has been studied ex tensively, both within the field of Electrical Engineering and computer architecture, which begs the question as to why the field of Machine Learning, which builds on top of these, lacks such investigations.

The intended learning outcomes of the project is primarily to get familiar with the fields of energy estimation, both in machine learning and computer architecture. This includes investigating metrics, pitfalls and tools on precisely measuring power consumption and the limitations of doing so. This naturally leads to another learning outcome, which is being able to reflect and reason on the differences in power consumption patters of different types of hardware, namely the CPU and GPU. This includes investigating software-level and hardware-level differences between the two.

More concise and practical learning outcomes are using power consumption tools and profiling tools which measure energy consumption. These tools are presented in the section below. This learning outcome will help narrow in on how different tools report energy consumption, and to map out differences and similarities between the tools. Last and not least, an intended learning outcome is setting up a testing environment in very different environments.

The project revolves around conducting experiments on DASYA's RebelRig and on an NVIDIA Jetson device with an SoC formed of a CPU-GPU co-processor. These two environments represent a resourceful and resource constrained environment where exactly equal experiments will be conducted.

The experiments will consist of a subset of popular data pre-processing techniques, which are found by surveying popular machine learning libraries such as NumPy, TensorFlow, PyTorch and sci-kit learn. By analysing these functions, the most commonplace arithmetic expressions will be extracted and implemented in Python and or C++ in order to run these on the CPU and GPU.

The results will be obtained by running experiments and analyzing the functions chosen as a function of the size of data.

Teacher: Pinar Tözün