PVC Storage Cluster
In a "PVC-based cluster", the Ceph persistent data is stored on volumes requested from a storage class of your choice. This type of cluster is recommended in a cloud environment where volumes can be dynamically created and also in clusters where a local PV provisioner is available.
AWS Storage Example
In this example, the mon and OSD volumes are provisioned from the AWS gp2
storage class. This storage class can be replaced by any storage class that provides file
mode (for mons) and block
mode (for OSDs).
| apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: quay.io/ceph/ceph:v17.2.6
dataDirHostPath: /var/lib/rook
mon:
count: 3
allowMultiplePerNode: false
volumeClaimTemplate:
spec:
storageClassName: gp2
resources:
requests:
storage: 10Gi
storage:
storageClassDeviceSets:
- name: set1
count: 3
portable: false
encrypted: false
volumeClaimTemplates:
- metadata:
name: data
spec:
resources:
requests:
storage: 10Gi
# IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, gp2)
storageClassName: gp2
volumeMode: Block
accessModes:
- ReadWriteOnce
onlyApplyOSDPlacement: false
|
Local Storage Example
In the CRD specification below, 3 OSDs (having specific placement and resource values) and 3 mons with each using a 10Gi PVC, are created by Rook using the local-storage
storage class.
| apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
dataDirHostPath: /var/lib/rook
mon:
count: 3
allowMultiplePerNode: false
volumeClaimTemplate:
spec:
storageClassName: local-storage
resources:
requests:
storage: 10Gi
cephVersion:
image: quay.io/ceph/ceph:v17.2.6
allowUnsupported: false
dashboard:
enabled: true
network:
hostNetwork: false
storage:
storageClassDeviceSets:
- name: set1
count: 3
portable: false
resources:
limits:
cpu: "500m"
memory: "4Gi"
requests:
cpu: "500m"
memory: "4Gi"
placement:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: "rook.io/cluster"
operator: In
values:
- cluster1
topologyKey: "topology.kubernetes.io/zone"
volumeClaimTemplates:
- metadata:
name: data
spec:
resources:
requests:
storage: 10Gi
storageClassName: local-storage
volumeMode: Block
accessModes:
- ReadWriteOnce
|
PVC storage only for monitors
In the CRD specification below three monitors are created each using a 10Gi PVC created by Rook using the local-storage
storage class. Even while the mons consume PVCs, the OSDs in this example will still consume raw devices on the host.
| apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: quay.io/ceph/ceph:v17.2.6
dataDirHostPath: /var/lib/rook
mon:
count: 3
allowMultiplePerNode: false
volumeClaimTemplate:
spec:
storageClassName: local-storage
resources:
requests:
storage: 10Gi
dashboard:
enabled: true
storage:
useAllNodes: true
useAllDevices: true
|
In the simplest case, Ceph OSD BlueStore consumes a single (primary) storage device. BlueStore is the engine used by the OSD to store data.
The storage device is normally used as a whole, occupying the full device that is managed directly by BlueStore. It is also possible to deploy BlueStore across additional devices such as a DB device. This device can be used for storing BlueStore’s internal metadata. BlueStore (or rather, the embedded RocksDB) will put as much metadata as it can on the DB device to improve performance. If the DB device fills up, metadata will spill back onto the primary device (where it would have been otherwise). Again, it is only helpful to provision a DB device if it is faster than the primary device.
You can have multiple volumeClaimTemplates
where each might either represent a device or a metadata device. An example of the storage
section when specifying the metadata device is:
| storage:
storageClassDeviceSets:
- name: set1
count: 3
portable: false
volumeClaimTemplates:
- metadata:
name: data
spec:
resources:
requests:
storage: 10Gi
# IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, gp2)
storageClassName: gp2
volumeMode: Block
accessModes:
- ReadWriteOnce
- metadata:
name: metadata
spec:
resources:
requests:
# Find the right size https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing
storage: 5Gi
# IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, io1)
storageClassName: io1
volumeMode: Block
accessModes:
- ReadWriteOnce
|
Note
Note that Rook only supports three naming convention for a given template:
- "data": represents the main OSD block device, where your data is being stored.
- "metadata": represents the metadata (including block.db and block.wal) device used to store the Ceph Bluestore database for an OSD.
- "wal": represents the block.wal device used to store the Ceph Bluestore database for an OSD. If this device is set, "metadata" device will refer specifically to block.db device. It is recommended to use a faster storage class for the metadata or wal device, with a slower device for the data. Otherwise, having a separate metadata device will not improve the performance.
The bluestore partition has the following reference combinations supported by the ceph-volume utility:
| storage:
storageClassDeviceSets:
- name: set1
count: 3
portable: false
volumeClaimTemplates:
- metadata:
name: data
spec:
resources:
requests:
storage: 10Gi
# IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, gp2)
storageClassName: gp2
volumeMode: Block
accessModes:
- ReadWriteOnce
|
- A "data" device and a "metadata" device.
| storage:
storageClassDeviceSets:
- name: set1
count: 3
portable: false
volumeClaimTemplates:
- metadata:
name: data
spec:
resources:
requests:
storage: 10Gi
# IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, gp2)
storageClassName: gp2
volumeMode: Block
accessModes:
- ReadWriteOnce
- metadata:
name: metadata
spec:
resources:
requests:
# Find the right size https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing
storage: 5Gi
# IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, io1)
storageClassName: io1
volumeMode: Block
accessModes:
- ReadWriteOnce
|
- A "data" device and a "wal" device. A WAL device can be used for BlueStore’s internal journal or write-ahead log (block.wal), it is only useful to use a WAL device if the device is faster than the primary device (data device). There is no separate "metadata" device in this case, the data of main OSD block and block.db located in "data" device.
| storage:
storageClassDeviceSets:
- name: set1
count: 3
portable: false
volumeClaimTemplates:
- metadata:
name: data
spec:
resources:
requests:
storage: 10Gi
# IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, gp2)
storageClassName: gp2
volumeMode: Block
accessModes:
- ReadWriteOnce
- metadata:
name: wal
spec:
resources:
requests:
# Find the right size https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing
storage: 5Gi
# IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, io1)
storageClassName: io1
volumeMode: Block
accessModes:
- ReadWriteOnce
|
- A "data" device, a "metadata" device and a "wal" device.
| storage:
storageClassDeviceSets:
- name: set1
count: 3
portable: false
volumeClaimTemplates:
- metadata:
name: data
spec:
resources:
requests:
storage: 10Gi
# IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, gp2)
storageClassName: gp2
volumeMode: Block
accessModes:
- ReadWriteOnce
- metadata:
name: metadata
spec:
resources:
requests:
# Find the right size https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing
storage: 5Gi
# IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, io1)
storageClassName: io1
volumeMode: Block
accessModes:
- ReadWriteOnce
- metadata:
name: wal
spec:
resources:
requests:
# Find the right size https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing
storage: 5Gi
# IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, io1)
storageClassName: io1
volumeMode: Block
accessModes:
- ReadWriteOnce
|
To determine the size of the metadata block follow the official Ceph sizing guide.
With the present configuration, each OSD will have its main block allocated a 10GB device as well a 5GB device to act as a bluestore database.