While nowadays hardware accelerators such as GPUs are commonplace, it remains challenging to present a programming model that allows domain experts to exploit their full potential. Previous work developed dOpenCL, which allows users to distribute OpenCL applications over a compute cluster, as well as CloudCL, which builds on top of dOpenCL and simplifies application development and further eases cluster distribution and job scheduling. However, the relatively low bandwidth offered by networks available on commodity cloud services in comparison with the compute capability of modern GPU devices often places limits to the scalability of these approach. In order to achieve better scalability while maintaining an accessible programming model, this thesis integrates transparent I/O Link Compression to dOpenCL and CloudCL using the 842 compression algorithm, using both the dedicated NX842 hardware compressor available on the POWER architecture and an optimized 842 implementation for OpenCL. During the course of this work, a performance-oriented design for a compression system for OpenCL data transfers is presented, a implementation of this design is developed, and an evaluation for this system is given, demonstrating that the final system is capable of accelerating data transfers over high-performance networks across a range of tested set-ups and benchmarks.

Titel Improved Data Transfer Efficiency for Scale-Out GPU Workloads using On-the-Fly I/O Link Compression
Verfasst von Joan Bruguera Micó
Serien-Detail Masterarbeit
Verlag Hasso-Plattner-Institut an der Universität Potsdam
Datum 21. Juli 2020
Seitenzahl 82
HinzugefĂĽgt am 15. April 2021
HinzugefĂĽgt von max.plauth
Buch-Verleih verfĂĽgbar
PDF-Download Fachgebiets-Angehörige+ max.plauth, till.lehmann