BeeGFS - Architecture, Implementation Examples, and Future Development
Authors: Mr. Sven Breuner (ThinkParQ GmbH)
Abstract: BeeGFS, an open source parallel file system has been gaining more and more acceptance in HPC and associated areas like deep learning, life sciences, and financial services areas and is in use at several TOP500 systems. It is designed to combine key aspects as high scalability, flexibility, and usability with support for mixed ISAs, including ARM and OpenPOWER. At the BoF, an architectural overview, interesting implementations examples, features of upcoming version 7, and further roadmap details will be provided. The organizers encourage participation of all who are interested in high performance parallel storage and feedback from users and solution providers.
Long Description: BeeGFS is one of the leading free open-source file systems (not only) for HPC. Since its start in 2005 as an in-house development called FhGFS at the Fraunhofer Center for HPC in Germany, it has evolved to a world-wide valued parallel file system with a steadily growing user base, thanks to its focus on three key aspects: maximum scalability, high flexibility and easy usability. This combination of priorities together with its support for mixed architectures, including ARM and OpenPOWER, made BeeGFS widely popular and brought it to several of the Top500 systems, ranging from the Tsubame 3.0 hybrid AI & HPC system at the Tokyo Institute of Technology, to the VSC-3 system from the Vienna Scientific Cluster with several thousand nodes and more.
As of 2016, the BeeGFS source code is publicly available. This was in response to a rising demand of the customers and an ever-expanding and more active community. To drive the evolution of BeeGFS into a more and more open model forward, ThinkParQ (the company behind BeeGFS) has joined the European Open File System community as of ISC’17 and would be supported by the EOFS community at the BoF. Therefore, the BeeGFS BoF session at SC’17 as one of the world’s biggest HPC events presents a fantastic opportunity to meet a diverse global community of customers, users and solution providers with high demands for parallel storage, to present and discuss recent and soon-to-be feature developments and gather requirements for the future of BeeGFS. To keep the users up to date, there will be a short presentation about the new BeeGFS version 7, which will be made available before SC'17. In a second presentation, we will show a first glimpse at BeeGFS’s new and long-awaited Windows client before we will discuss the BeeGFS roadmap and gather participants insight on the next step to make BeeGFS even more useful for the global community.
Last year’s presentation about BeeGFS at the Meeting of the SIGHPC - Big Data Chapter at SC was very well attended, so we expect 80 - 100 attendees for the first BeeGFS BoF session at SC'17.
For a quick survey, what the BeeGFS users really want and to help us prioritize, our questions to the audience will include the following topics:
- System description: number of compute nodes, storage servers, capacity, performance.
- Most demanding I/O workloads: Job sizes, run times, domain (e.g. AI, HEP, CFD, …), typical size of data set per project.
- Current storage features that are most critical.
- Additional features desired (priority ordered).
Conference Presentation: pdf
Birds of a Feather Index