ceph docs: https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/ lifecycle: https://access.redhat.com/articles/1372203 upstream docs: http://docs.ceph.com/docs/luminous/ supported configurations: https://access.redhat.com/articles/1548993


Data Distribution and Organization in Ceph


Pools are logical partitions of the Ceph storage cluster, used to store objects under a common name tag. Each pool is assigned a specific number of hash buckets to group objects together for storage. These hash buckets are called Placement Groups (PGs) The number of placement groups configured when the pool is created can be increased dynamically, but can never be decreased. The CRUSH algorithm is used to select the OSDs hosting the data for a pool. Each pool is assigned a single CRUSH rule for its placement strategy. The CRUSH rule determines which OSDs store the data for all the pools assigned that rule. A pool name must be specified for each request and permissions are granted for each Ceph user, either to all the pools in the cluster or to one or more specific pools. These permissions can be read, write, or execute.

Placement Groups

A Placement Group (PG) aggregates a series of objects into a hash bucket, or group, and is mapped to a set of OSDs. An object belongs to only one PG, and all objects belonging to the same PG return the same hash result.

An object is mapped to its PG by the CRUSH algorithm based on the hashing of the object's name. The placement strategy is known as the CRUSH placement rule. The placement rule identifies the failure domain that is to be chosen within the CRUSH topology to receive each replica or erasure code chunk.

When a client writes an object to a pool, it uses the pool's CRUSH placement rule to determine the object's placement group. The client then uses its copy of the cluster map, the placement group, and the CRUSH placement rule to calculate to which OSDs a copy of the object (or its erasure-coded chunks) should be written.

The layer of indirection provided by the placement group is important when new OSDs become available to the Ceph cluster. When OSDs are added to or removed from a cluster, placement groups are automatically rebalanced between operational OSDs.

Mapping an Object to Its Associated OSDs

A Ceph client gets the latest copy of the cluster map from a monitor. This tells it about all the MONs, OSDs, and MDSs in the cluster. This does not tell it where objects are stored; the client must use CRUSH to compute the location of objects it needs to access.

To calculate the Placement Group ID (PG ID) for an object, the Ceph client needs the object ID and the name of the object's storage pool. The client hashes the object ID, and then calculates the hash modulo the number of PGs to get a PG ID. It then looks up the numeric ID for the pool, based on the pool's name, and prepends the pool ID to the PG ID.

The CRUSH algorithm is then used to determine which OSDs are responsible for a placement group (the Acting Set ). The OSDs in the Acting Set that are currently up are in the Up Set . The first OSD in the Up Set is the current primary OSD for the object's placement group, and all other OSDs in the Up Set are secondary OSDs.

The Ceph client can then directly work with the primary OSD in order to access the object.

Operating Ceph


deploy: /usr/share/ceph-ansible/site.yml variables: /usr/share/ceph-ansible/group_vars

for i in a b c d e> do> ssh-copy-id student@server$i> ssh-copy-id ceph@server$i> done
sudo systemctl start/stop ceph.target|ceph-mon.target|ceph-osd.target|ceph-mgr.target


Deploy N ceph cluster

Copy /usr/share/ceph-ansible to another location and configure ansible.cfg to point to another hosts file.

Creating replicated pools



ceph osd pool create pool-name pg-num [pgp-num] > [replicated] [crush-ruleset-name] [expected-num-objects]

pool-name is the name of the new pool.

pg-num is the total number of Placement Groups (PGs) for this pool.

pgp-num is the effective number of placement groups for this pool. Normally, this should be equal to the total number of placement groups.

replicated specifies that this is a replicated pool, and is normally the default if not included in the command.

crush-ruleset-name is the name of the CRUSH rule set you want to use for this pool. The osd_pool_default_crush_replicated_ruleset configuration parameter sets the default value.

expected-num-objects is the expected number of objects in the pool. If you know this number in advance, Ceph can prepare a folder structure on the OSD's XFS file system at pool creation time. Otherwise, Ceph reorganizes this directory s
structure at runtime as the number of objects increases. This reorganization has a latency impact.

ceph osd pool set pool-name size number-of-replicas

Manage Ceph

Set pool type

After creating a pool, administrators must explicitly indicate the type of Ceph applications that will be able to use it.

ceph osd pool application enable pool-nameapp

Where app is:

cephfs for the Ceph File System.
rbd for the Ceph Block Device.
rgw for the Ceph Object Gateway. 

ceph osd pool application enable myfirstpool rbd

List pools

ceph osd lspools ceph osd pool ls detail


ceph df ceph osd pool stats

The ceph df command displays pool usage statistics. The ceph osd df command displays disk usage statistics on OSDs.

Pool quotas

ceph osd pool set-quota pool-name max_objects obj-count max_bytes bytes

For example, to limit the myfirstpool pool to 1000 objects, use the following command: $ ceph osd pool set-quota myfirstpool max_objects 1000

You can remove a quota by setting its value to 0. Remember that you can review the usage statics of the pools with the ceph osd df command.


ceph osd pool rename current-name new-name


ceph osd pool mksnap pool-name snap-name ceph osd pool rmsnap pool-name snap-name

rados -p pool-name -s snap-name get object-namefile rados -p pool-name rollback object-name snap-name rados --id name -p pool -N namespace put name /locla/file/path

Modify pools

Delete pool

In Red Hat Ceph Storage 3, for extra protection, Ceph sets the mon_allow_pool_delete configuration parameter to false . With this directive, and even with the --yes-i-really-really-mean-it option, the ceph osd pool delete command does not result in the deletion of the pool.

You can set the mon_allow_pool_delete parameter to true and restart the mon services to allow pool deletion.

But even with mon_allow_pool_delete set to true you can still protect your pool from deletion by setting the nodelete option to true at the pool level:

ceph osd pool set pool-name nodelete true

Configuring Namespaces in a Pool

A namespace is a logical group of objects in a pool. A user's access to a pool can be limited so that the user can only store or retrieve objects in that namespace.

By default, each pool contains a namespace with an empty name, known as the default namespace. Consult the Ceph API documentation for instructions on how to pass both the pool and namespace parameters at http://docs.ceph.com/docs/luminous/rados/api/librados/

Erasure Coding Pools

When you store an object in an erasure coded pool, the object is divided into a number of data chunks, and the data chunks are stored in separate OSDs. In addition, a number of coding chunks are calculated based on the data chunks, and are also stored in different OSDs. The coding chunks can be used to reconstruct the object's data if an OSD containing a chunk fails.


ceph osd pool create pool-namepg-num [pgp-num] erasure [erasure-code-profile] [crush-ruleset-name] [expected_num_objects] ceph osd pool create mysecondpool 50 50 erasure

Get erasure profile

ceph osd erasure-code-profile get default

Set new erasure profile

ceph osd erasure-code-profile set profile-name arguments...

For example, the following command creates a profile that divides objects into three data chunks ( k=3 ) and protects them with two coding chunks ( m=2 ). The crush-failure-domain=rack parameter ensures that Ceph will not store two chunks in the same rack.

ceph osd erasure-code-profile set myprofile k=3 m=2 crush-failure-domain=rack


ceph osd erasure-code-profile ls


ceph osd erasure-code-profile rm profile-name


ceph osd erasure-code-profile get profile-name


View config

ceph daemon type.id config show ceph daemon osd.0 config show

ceph daemon type.id config get parameter ceph daemon mds.servera config get mds_data

Managing the Ceph Configuration File with Ansible

Starting and stopping Ceph daemons

Stop a specific daemon systemctl stop ceph- $type @ $id Stop all OSD daemons systemctl stop ceph-osd.target Stop all daemons systemctl stop ceph.target Start a specific daemon systemctl start ceph- $type @ $id Start all OSD daemons systemctl start ceph-osd.target Start all daemons systemctl start ceph.target Restart a specific daemon systemctl restart ceph- $type @ $id Restart all OSD daemons systemctl restart ceph-osd.target Restart all daemons systemctl restart ceph.target

Managing Ceph Authentication


Authentication with Cephx

In a Red Hat Ceph Storage cluster, users are used to authorize communication between clients, applications, and daemons in the cluster. To securely authenticate these users, Ceph uses the cephx authentication protocol, based on shared secret keys.

Accounts used by Ceph daemons have names that match the associated daemon: osd.1 or mgr.serverc. Accounts used by client applications using librados have names that start with client.

$ ceph --id operator3 osd lspools

Authorization with Cephx

When you create a new user account, you need to grant it permissions to control what that user is authorized to do in the Red Hat Ceph Storage cluster. Permissions within cephx are known as capabilities , and you grant them by daemon type ( mon , osd , mgr , or mds ).

Within cephx , and for each daemon type, several capabilities are available:

r grants read access. Each user account should have at least read access on the Monitors to be able to retrieve the CRUSH map.
w grants write access. Clients need write access to store and modify objects on OSDs. For Managers (MGRs), w grants the right to enable or disable modules.
x grants authorization to execute extended object classes. This allows clients to perform extra operations on objects such as setting locks with rados lock get or listing RBD images with rbd list .
* grants full access.
class-read and class-write are subsets of x . You mostly use them on pools used for RBD. 

For example, the following command creates the formyapp1 user account, and gives it the capability to store and retrieve objects from any pool:

$ ceph auth get-or-create client.formyapp1 mon 'allow r' osd 'allow rw'

cephx offers predefined capability profiles.

$ ceph auth get-or-create client.forrbd mon 'profile rbd' osd 'profile rbd'

You should restrict user OSD permissions such that users can only access the pools they need. The following command creates the formyapp2 user and limits their access to read and write on the myapp pool:

$ ceph auth get-or-create client.formyapp2 > mon 'allow r' > osd 'allow rw pool=myapp'

cephx can also restrict access to objects by other means:

By object name prefix
  $ ceph auth get-or-create client.formyapp3 \> mon 'allow r' \> osd 'allow rw object_prefix pref'

By namespace
  $ ceph auth get-or-create client.designer \> mon 'allow r' \> osd 'allow rw namespace=photos'

By path (CephFS)
  $ ceph fs authorize cephfs client.webdesigner  /webcontent rw
  $ ceph auth get client.webdesigner

By Monitor command . This method restricts administrators to a specific list of commands. The following example creates the operator1 user account and limits its access to two commands:
  $ ceph auth get-or-create client.operator1 mon 'allow r, allow command "auth get-or-create", allow command "auth list"'

User Management

Creating New User Accounts

$ ceph auth get-or-create client.application1 mon 'allow r' osd 'allow rw' -o /etc/ceph/ceph.client.application1.keyring

Modifying User Capabilities

$ ceph auth caps client.application1 mon 'allow r' osd 'allow rw pool=myapp' $ ceph auth caps client.application1 osd '' (remove)

Delete user

$ ceph auth del client.application1


create pool and user

Create a snapshot

Cloning rbd