NVMe-oF Block Storage¶
This feature is experimental
NVMe over Fabrics (NVMe-oF) allows RBD volumes to be exposed and accessed via the NVMe/TCP protocol. This enables both Kubernetes pods within the cluster and external clients outside the cluster to connect to Ceph block storage using standard NVMe-oF initiators, providing high-performance block storage access over the network.
Goals¶
The NVMe-oF integration in Rook serves two primary purposes:
-
External Client Access: Rook serves as a backend for external clients outside the cluster, enabling non-Kubernetes workloads to access Ceph block storage through standard NVMe-oF initiators. This allows organizations to leverage their Ceph storage infrastructure for both containerized and traditional workloads.
-
In-Cluster Consumption: Pods inside the Kubernetes cluster can consume storage via the NVMe-oF protocol, providing an alternative to traditional RBD mounts with potential performance benefits for certain workloads.
Both use cases are supported, allowing you to choose the appropriate access method based on your specific requirements and deployment scenarios.
For more background and design details, see the NVMe-oF gateway design doc. For the Ceph-CSI NVMe-oF design proposal, see the ceph-csi NVMe-oF proposal.
Prerequisites¶
This guide assumes a Rook cluster as explained in the Quickstart Guide.
Requirements¶
- Ceph Version: Ceph v20 (Tentacle) or later
- Disable the Ceph CSI operator: We are still updating the Ceph CSI operator with NVMe-oF support. Currently, it is required to disable the CSI operator to test NVMe-oF. In operator.yaml, set
ROOK_USE_CSI_OPERATOR: "false".
Step 1: Create a Ceph Block Pool¶
Before creating the NVMe-oF gateway, you need to create a CephBlockPool that will be used by the gateway:
Create the pool:
Step 2: Create the NVMe-oF Gateway¶
The CephNVMeOFGateway CRD manages the NVMe-oF gateway infrastructure. The operator will automatically create the following resources:
- Service: One per gateway instance for service discovery
- Deployment: One per gateway instance running the NVMe-oF gateway daemon
Create the gateway:
Apply the gateway configuration:
Verify the gateway is running:
Example Output
Step 4: Deploy the NVMe-oF CSI Driver¶
The NVMe-oF CSI driver handles dynamic provisioning of volumes. Deploy the CSI provisioner with the NVMe-oF driver.
Deploy the NVMe-oF CSI provisioner from the example manifest:
Verify the CSI provisioner pod is running:
Example Output
Step 5: Create the StorageClass¶
Create a StorageClass that uses the NVMe-oF CSI driver. You'll need to gather the following information from the gateway:
- nvmeofGatewayAddress: A stable address for the gateway management API
- nvmeofGatewayPort: The gateway port (default: 5500)
- listeners: A JSON array containing listener information for each gateway instance
Discover the values to use in the StorageClass:
-
nvmeofGatewayAddress: Use the Service
CLUSTER-IP.Example Output
-
listeners.address: Use the gateway pod IP.
Example Output
-
listeners.hostname: Use the gateway deployment name.
Example Output
Create the StorageClass:
Create the StorageClass:
Step 6: Create a PersistentVolumeClaim¶
Create a PVC using the NVMe-oF storage class:
Note
This PVC is created for CSI driver provisioning. The volume will be accessible via NVMe-oF protocol by both Kubernetes pods within the cluster and external clients outside the cluster using standard NVMe-oF initiators.
Create the PVC:
Verify the PVC is bound:
Example Output
Step 7: Deploy the NVMe-oF CSI Node Plugin¶
Deploy the NVMe-oF CSI node plugin:
Verify the node plugin pod is running:
Example Output
Step 8: Accessing Volumes via NVMe-oF¶
Once the PVC is created and bound, the volume is available via NVMe-oF. The volume can be accessed by both Kubernetes pods within the cluster and external clients outside the cluster.
Access from Kubernetes Pods¶
Kubernetes pods can consume NVMe-oF volumes by mounting the PVC directly. The CSI driver handles the NVMe-oF connection automatically when the pod mounts the volume.
Create a sample pod that mounts the PVC:
Verify the pod is running:
Example Output
Access from External Clients¶
External clients outside the Kubernetes cluster can connect to the gateway using standard NVMe-oF procedures.
Prerequisites for External Clients¶
- NVMe-oF Initiator: The client must have the
nvme-tcpkernel module loaded andnvme-cliinstalled - Network Access: The client must be able to reach the gateway service IP and ports
Discover Subsystems¶
From the external client, discover available NVMe-oF subsystems:
Replace <gateway-service-ip> with the gateway service ClusterIP or an accessible endpoint.
Connect to Subsystem¶
Connect to the discovered subsystem:
Replace:
<subsystem-nqn>with thesubsystemNQNvalue from your StorageClass (e.g.,nqn.2016-06.io.spdk:cnode1.rook-ceph)<gateway-ip>with the gateway service IP or pod IP
Access the Volume¶
Once connected, the NVMe namespace will appear as a block device on the client:
The device will typically appear as /dev/nvmeXnY where X is the controller number and Y is the namespace ID.
Format and Mount (Optional)¶
If you want to format and mount the device:
High Availability¶
For production deployments, configure multiple gateway instances for high availability:
- Increase Gateway Instances: Set
instances: 2or higher in theCephNVMeOFGatewayspec - Update StorageClass Listeners: Add all gateway instance addresses and instance names to the
listenersarray - Load Balancing: Each gateway instance has its own Service; list all of them to support multipath/HA
Example with multiple instances:
Then update the StorageClass listeners to include all gateway instances/services:
Troubleshooting¶
Check Gateway Pod Logs¶
Check CSI Provisioner Logs¶
Verify Gateway Service¶
Check PVC Events¶
Verify Ceph CSI Config¶
Ensure the rook-ceph-csi-config ConfigMap exists and contains the cluster configuration:
Teardown¶
Warning
Deleting the PVC will also delete the underlying RBD image and NVMe namespace. Ensure you have backups if needed.
To clean up all the artifacts created: