Cloudian : Rubrik Archiving and Cloud Tiering



Background

One way to save on expanding a Rubrik Cluster is to offload it. You can do that with an on-prem object store, or in the cloud. But if on-prem is still your way to go, you can consider object store solution. You already know about OpenIO but there are other players : Cloudian is one of them. Their architecture is web scale, same way as Rubrik, Nutanix and all the modern companies these days. The good think with Cloudian is the cloud tiering. If some archives are exceeding a certain lifetime, you can decide to offload your Cloudian appliance and start moving them in the cloud of your choice. Very clever !

Cloudian

Firstly, you need to have a running Cloudian platform. It could be in 3 different flavours :
  • Turn key solution : Cloudian provides hardware and software;
    • Good option when you need to be up and running quickly without any pre-existing kit.
  • Bare metal solution : You provide the hardware, Cloudian provides the software;
    • Good option if you want to reuse existing hardware !
  • Virtual Appliance : Cloudian provides the virtual appliance to run on your 
    • This one is ideal for proof of concept, not recommended for production
If you are curious on how to install the PoC version, Niels Roetert from Cloudian did a great video about the entire process. However, you still need to contact Cloudian to get a trial license.



Once the product is installed, open the UI in a browser. I've personally lost 30 minutes to figure out what are the default credentials. Don't waste your time, the admin password is "public".

Next step is to create an account to connect Rubrik on Cloudian. S3 (or on-prem s3-like) credentials are not simply a username and a password, it works with keys.



Select the user who is able to access the platform and go to security credentials and create a key.


We will use the key and the secret key on the Rubrik side.

Rubrik

Now, on the Rubrik UI, we need to add a new archival location from the gear menu (Top Right).



Quick Notes

- The host name is always given as a URL, do not forget http (or https if you have enabled SSL on Cloudian);
- Bucket  prefix is the name of the bucket;
- Number of Buckets is defining how many buckets Rubrik will create on the Cloudian side. I've setup one here as a matter of test but in production you may want to have 5 or 10 buckets. This will speed up the search operations. Imagine, if you are archiving many objects for a long time, you will endup with billions of files into a single bucket, this will drastically increase access time to the objects. There are some sizing guide in Cloudian and/or Rubrik to help you configuring this.
- Archival location is the name that will reference your Cloudian instance and this name will be used in the SLAs.

Last step is to create a SLA that is actually using this archiving target. On an existing SLA, you can go to the second screen to configure archiving

The "instant archive" option means archiving will start immediately, so you can see the immediate effect of that setting.

And then, after a couple of days, our objective is achieved : platform off-loading ! 



You should be able to see the bucket filling in on the Cloudian side.

Next, we need another offloading mechanism. Indeed, you are move data out of Rubrik to Cloudian, but what about moving data out of Cloudian to the cloud ? This is what we are calling Cloud Tiering.


Configure Cloud Tiering


Disclaimer : at this stage, this configuration is not officially supported by Rubrik and discussions are taking place to have a clear understanding of the pros and cons of such setup. The main pain point is about the high-latency bucket with certain cloud providers. If you have a low-latency bucket, it is definitely workable !

Cloud tiering can be part of your strategy, data older than 6 month are moved away from your premise to the cloud. In this example, we do tiering on GCP. The tiering is configured at the bucket level on Cloudian :


In the above configuration, any objects created over 7 days ago (this is only for testing purpose, this is where you configure 6 month) will be move to a bucket on GCP with the configured name. Of course, you need to have a target bucket created upfront on GCP and you will have to pass the credentials to be able to access that bucket.

<Note about VPN and GCP>
 if you have a VPN established between GCP and your organization, the storage.googleapis.com endpoint used by default by Cloudian won't work. Indeed, you need to use another GCP endpoints which are : 199.36.153.8, 199.36.153.9, 199.36.153.10, 199.36.153.11. The reference is here.

I ended up with a host file similar to this one : 

# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
[...]
199.36.153.8 storage.googleapis.com
199.36.153.9 storage.googleapis.com
199.36.153.10 storage.googleapis.com
199.36.153.11 storage.googleapis.com
<End of note about VPN and GCP>

Once in place, you can check the progress in the Cloudian log file : 

# tail -f cloudian-tiering-request-info.log
2020-08-06 10:26:45,923|CREATEBUCKET|S3|rubrik-edge-0|null|cloudian-data|null|0|16|OK|618536
2020-08-06 10:26:46,269|HEADBUCKET|S3|rubrik-edge-0|null|cloudian-data|null|0|16|OK|346256
[...]
2020-08-06 11:39:23,140|MOVEOBJECT|S3|rubrik-edge-0/9e2a32fb-193d-4a49-9fd3-f80d4bada2d1|null|cloudian-data|null|48|562|OK|529405
2020-08-06 11:39:25,802|STREAMOBJECT|S3|rubrik-edge-0/9e2a32fb-193d-4a49-9fd3-f80d4bada2d1|null|cloudian-data|null|48|8192|OK|277369
2020-08-06 11:39:26,573|STREAMOBJECT|S3|rubrik-edge-0/9e2a32fb-193d-4a49-9fd3-f80d4bada2d1|null|cloudian-data|null|48|8192|OK|218392
2020-08-06 11:39:26,883|DELETEOBJECT|S3|rubrik-edge-0/9e2a32fb-193d-4a49-9fd3-f80d4bada2d1|null|cloudian-data|null|48|8192|OK|70766
2020-08-06 11:45:42,972|MOVEOBJECT|S3|rubrik-edge-0/ecc0a3c8-cb21-4441-91bd-a4715a2029eb|null|cloudian-data|null|48|562|OK|289530
2020-08-06 11:45:46,083|STREAMOBJECT|S3|rubrik-edge-0/ecc0a3c8-cb21-4441-91bd-a4715a2029eb|null|cloudian-data|null|48|8192|OK|273261
2020-08-06 11:45:46,845|STREAMOBJECT|S3|rubrik-edge-0/ecc0a3c8-cb21-4441-91bd-a4715a2029eb|null|cloudian-data|null|48|8192|OK|188577
2020-08-06 11:45:47,251|DELETEOBJECT|S3|rubrik-edge-0/ecc0a3c8-cb21-4441-91bd-a4715a2029eb|null|cloudian-data|null|48|8192|OK|181485
2020-08-06 11:51:50,872|MOVEOBJECT|S3|rubrik-edge-0/6eb11b09-134b-473d-8b8c-7dbb4da2351e|null|cloudian-data|null|48|562|OK|447644
Then, the content starts to be moved to GCP : 


Quick tip to get the size of a bucket from the GCP Cloudshell : 

$ gsutil du -s -h gs://cloudian-data
1.94 GiB gs://cloudian-data

[..]

$ gsutil du -s -h gs://cloudian-data
2.21 GiB gs://cloudian-data
Now, when restoring an element from Rubrik which has been archived in both Cloudian and then tiered to the cloud, Rubrik will wait for Cloudian to retrieve the data from the cloud. It is obviously increasing the retora time by a significant amount of time. Remember the goal here is to save on local storage on both Rubrik and Cloudian. You should rarely do this type of activity (restoring archived item from the cloud). But now, you have an amazing setup !

This is the official Rubrik video on cloud archiving : 



I hope this help, it took me a couple of days to understand and configure the all chain.

Comments

What's hot ?

ShredOS : HDD degaussing with style

ThingSpeak : Create some useful formulas