Skip to content

NVMe-oF Block Storage

This feature is experimental

NVMe over Fabrics (NVMe-oF) allows RBD volumes to be exposed and accessed via the NVMe/TCP protocol. This enables both Kubernetes pods within the cluster and external clients outside the cluster to connect to Ceph block storage using standard NVMe-oF initiators, providing high-performance block storage access over the network.

Goals

The NVMe-oF integration in Rook serves two primary purposes:

  1. External Client Access: Rook serves as a backend for external clients outside the cluster, enabling non-Kubernetes workloads to access Ceph block storage through standard NVMe-oF initiators. This allows organizations to leverage their Ceph storage infrastructure for both containerized and traditional workloads.

  2. In-Cluster Consumption: Pods inside the Kubernetes cluster can consume storage via the NVMe-oF protocol, providing an alternative to traditional RBD mounts with potential performance benefits for certain workloads.

Both use cases are supported, allowing you to choose the appropriate access method based on your specific requirements and deployment scenarios.

For more background and design details, see the NVMe-oF gateway design doc. For the Ceph-CSI NVMe-oF design proposal, see the ceph-csi NVMe-oF proposal.

Prerequisites

This guide assumes a Rook cluster as explained in the Quickstart Guide.

Requirements

  • Ceph Version: Ceph v20 (Tentacle) or later

Step 1: Create a Ceph Block Pool

Before creating the NVMe-oF gateway, you need to create a CephBlockPool that will be used by the gateway:

1
2
3
4
5
6
7
8
9
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: nvmeof
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3

Create the pool:

kubectl create -f deploy/examples/csi/nvmeof/nvmeof-pool.yaml

Step 2: Create the NVMe-oF Gateway

The CephNVMeOFGateway CRD manages the NVMe-oF gateway infrastructure. The operator will automatically create the following resources:

  • Service: One per gateway instance for service discovery
  • Deployment: One per gateway instance running the NVMe-oF gateway daemon

Create the gateway:

apiVersion: ceph.rook.io/v1
kind: CephNVMeOFGateway
metadata:
  name: nvmeof
  namespace: rook-ceph
spec:
  # Container image for the NVMe-oF gateway daemon
  image: quay.io/ceph/nvmeof:1.5
  # Pool name that will be used by the NVMe-oF gateway
  pool: nvmeof
  # ANA (Asymmetric Namespace Access) group name
  group: group-a
  # Number of gateway instances to run
  instances: 1
  hostNetwork: false

Apply the gateway configuration:

kubectl create -f deploy/examples/nvmeof-test.yaml

Verify the gateway is running:

kubectl get pod -n rook-ceph -l app=rook-ceph-nvmeof

Example Output

NAME                                         READY   STATUS    RESTARTS   AGE
rook-ceph-nvmeof-nvmeof-a-85844ff6b8-4r8gj   1/1     Running   0          91s

Step 3: Deploy the NVMe-oF CSI Driver via CSI Operator

The NVMe-oF CSI driver is deployed via the ceph-csi operator.

Apply the Driver CR for NVMe-oF that will trigger the creation of the Ceph-CSI/NVMe-oF deployment and daemonset:

kubectl create -f deploy/examples/csi/nvmeof/driver.yaml

Verify the CSI operator created the controller and node plugins:

kubectl get pods -n rook-ceph | grep nvmeof

Example Output

rook-ceph.nvmeof.csi.ceph.com-ctrlplugin-d9d77fb7c-kkl28   5/5     Running   0          60s
rook-ceph.nvmeof.csi.ceph.com-nodeplugin-xvt5g              2/2     Running   0          60s

Step 4: Create the StorageClass

Create a StorageClass that uses the NVMe-oF CSI driver.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-nvmeof
parameters:
  clusterID: rook-ceph
  pool: nvmeof
  subsystemNQN: nqn.2016-06.io.spdk:cnode1.rook-ceph
  nvmeofGatewayAddress: "rook-ceph-nvmeof-nvmeof-a.rook-ceph.svc.cluster.local"
  nvmeofGatewayPort: "5500"
  listeners: |
    [
      {
        "hostname": "rook-ceph-nvmeof-nvmeof-a"
      }
    ]
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-modify-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/controller-modify-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-expand-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-expand-secret-namespace: rook-ceph
  imageFormat: "2"
  imageFeatures: layering,deep-flatten,exclusive-lock,object-map,fast-diff
provisioner: rook-ceph.nvmeof.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true

Note

The provisioner name rook-ceph.nvmeof.csi.ceph.com is prefixed with the operator namespace.

kubectl create -f deploy/examples/csi/nvmeof/storageclass.yaml

Step 5: Create a PersistentVolumeClaim

Create a PVC using the NVMe-oF storage class:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nvmeof-external-volume
  namespace: default
spec:
  storageClassName: ceph-nvmeof
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 128Mi

Note

This PVC is created for CSI driver provisioning. The volume will be accessible via NVMe-oF protocol by both Kubernetes pods within the cluster and external clients outside the cluster using standard NVMe-oF initiators.

Create the PVC:

kubectl create -f deploy/examples/csi/nvmeof/pvc.yaml

Verify the PVC is bound:

kubectl get pvc nvmeof-external-volume

Example Output

NAME                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nvmeof-external-volume   Bound    pvc-b4108580-5cfa-46d3-beff-320088a5bf3c   128Mi      RWO            ceph-nvmeof    20m

Step 6: Create a Pod

Create a pod that consumes the NVMe-oF volume:

kubectl create -f deploy/examples/csi/nvmeof/pod.yaml

Verify the pod is running:

kubectl get pods -n default nvmeof-test-pod

Example Output

NAME              READY   STATUS    RESTARTS   AGE
nvmeof-test-pod   1/1     Running   0          60s

Step 7: Accessing Volumes via NVMe-oF

Once the PVC is created and bound, the volume is available via NVMe-oF. The volume can be accessed by both Kubernetes pods within the cluster and external clients outside the cluster.

Access from External Clients

External clients outside the Kubernetes cluster can connect to the gateway using standard NVMe-oF procedures.

Prerequisites for External Clients

  • NVMe-oF Initiator: The client must have the nvme-tcp kernel module loaded and nvme-cli installed
  • Network Access: The client must be able to reach the gateway service IP and ports

Discover Subsystems

From the external client, discover available NVMe-oF subsystems:

nvme discover -t tcp -a <gateway-service-ip> -s 5500

Replace <gateway-service-ip> with the gateway service ClusterIP or an accessible endpoint.

Connect to Subsystem

Connect to the discovered subsystem:

nvme connect -t tcp -n <subsystem-nqn> -a <gateway-ip> -s 5500

Replace:

  • <subsystem-nqn> with the subsystemNQN value from your StorageClass (e.g., nqn.2016-06.io.spdk:cnode1.rook-ceph)
  • <gateway-ip> with the gateway service IP or pod IP

Access the Volume

Once connected, the NVMe namespace will appear as a block device on the client:

lsblk | grep nvme

The device will typically appear as /dev/nvmeXnY where X is the controller number and Y is the namespace ID.

Format and Mount (Optional)

If you want to format and mount the device:

1
2
3
4
5
6
# Format the device
sudo mkfs.ext4 /dev/nvmeXnY

# Mount the device
sudo mkdir /mnt/nvmeof
sudo mount /dev/nvmeXnY /mnt/nvmeof

High Availability

For production deployments, configure multiple gateway instances for high availability:

  1. Increase Gateway Instances: Set instances: 2 or higher in the CephNVMeOFGateway spec
  2. Update StorageClass Listeners: Add all gateway deployment hostnames to the listeners array
  3. Load Balancing: Each gateway instance has its own Service; list all of them to support multipath/HA

Example with multiple instances:

1
2
3
spec:
  instances: 2
  # ... other settings

Then update the StorageClass listeners to include all gateway hostnames:

1
2
3
4
5
6
7
8
9
listeners: |
  [
    {
      "hostname": "rook-ceph-nvmeof-nvmeof-a"
    },
    {
      "hostname": "rook-ceph-nvmeof-nvmeof-b"
    }
  ]

Troubleshooting

Check Gateway Pod Logs

kubectl logs -n rook-ceph -l app=rook-ceph-nvmeof --tail=100

Check CSI Controller Plugin Logs

kubectl logs -n rook-ceph deploy/rook-ceph.nvmeof.csi.ceph.com-ctrlplugin --tail=100

Verify Gateway Service

kubectl describe service -n rook-ceph rook-ceph-nvmeof-nvmeof-a

Check PVC Events

kubectl describe pvc nvmeof-external-volume

Verify Ceph CSI Config

Ensure the rook-ceph-csi-config ConfigMap exists and contains the cluster configuration:

kubectl get configmap -n rook-ceph rook-ceph-csi-config -o yaml

Teardown

Warning

Deleting the PVC will also delete the underlying RBD image and NVMe namespace. Ensure you have backups if needed.

To clean up all the artifacts created:

# Delete the test pod
kubectl delete -f deploy/examples/csi/nvmeof/pod.yaml

# Delete the PVC
kubectl delete pvc nvmeof-external-volume

# Delete the StorageClass
kubectl delete storageclass ceph-nvmeof

# Delete the NVMe-oF CSI operator resources
kubectl delete -f deploy/examples/csi/nvmeof/csi-operator-nvmeof.yaml

# Delete the NVMe-oF gateway
kubectl delete -f deploy/examples/nvmeof-test.yaml

# Delete the block pool (optional)
kubectl delete -f deploy/examples/csi/nvmeof/nvmeof-pool.yaml

References