Presented by

Abstract

To serve our users' home directories to whatever machine they log onto, we use DRBD to replicate storage of the filesystem and NFS to provide it for mounting. We want to upgrade to using Ceph for our storage replication. The problem? Convincing ourselves that this is actually an upgrade. DRBD replicates data across multiple nodes for reliability. In our current setup we create an XFS file system on top of a block replicated by DRBD, mount the filesystem, and serve it across the network using the NFS protocol. Ceph is an open source system for distributed storage, designed to scale for large data storage needs and remain fast and flexible. It replicates data, monitors status, and self heals. It already supports NFS exports in addition to a few other options for serving data over the network. Sounds smoother and easier than our existing strategy of using NFS on top of XFS on top of DRBD, right? We evaluated both systems on their ability to cope with typical workloads, on their ability to recover from failures, and on their ease of administration. We studied ceph's native export options as well as its performance over NFSv4. The first few benchmarks we ran were surprising in several ways. This talk is about the performance of distributed filesystems and some of the different pitfalls where you might end up measuring everything except the data you actually wanted. I'll talk about what makes each of the two solutions easier or harder to use, about mistakes and misconfigurations I made along the way, and why it's not enough just to run a test; you need to think about what's happening underneath. YouTube: https://www.youtube.com/watch?v=nv82Z-Ls0wg LA Archive: http://mirror.linux.org.au/pub/everythingopen/2023/clarendon_auditorium/Thursday/Ceph_and_NFS_Comparing_Distributed_Filesystems.webm