HPE Nimble Storage SFA. Whats that about then? Part 5 – A closer look at the deduplication.
Nimble SFA – A closer look at deduplication.
Following on from part 4 – creating a Veeam backup job and what to look out for of this blog series, we will take a look at how the Nimble deduplication works and some of the gotchas that I came across whilst evaluating the unit.
Lets have a look at the some of the high level specifications again for the Nimble SFA devices. I mentioned at the start of this blog series I have a SF100 at my disposal.
Up to date information about the SFA technical specifications can be found here. 18:1 reduction in data is currently being touted at time of writing.
Working with Nimble, when sizing the arrays we have been starting with a conservative 5:1 reduction in data on the array. We will see my real world figures later on in this blog post.
The Nimble SFA arrays offer global dedupe for any volumes created within a storage pool. The deduplication works at 4KB block size and is inline which means that the data is deduplicated as it hits the array, not stored in a landing zone and processed later. There is a great post here from Nimble Storage which covers in more detail how the deduplication works.
How to make use of the deduplication.
Thin provisioning is the name of the game with the SFA. What does that mean? Lets take another look at the storage specifications above. The SF100 starts with a usable capacity of 16TB, If we were to thick provision that storage, we would allocate the 16TB of storage up front and that would be all that is available in terms of storage space. Thin provisioning allows you to over allocate the 16TB usable space. Lets assume between compression and dedeuplication we can make space savings of 50% or a 2:1 reduction in data. This would allow us to provision 32TB of thin provisioned storage from the 16TB available.
As I have mentioned in previous blog posts in the series, the Nimble SFA is block based storage, it does not present a file system of any kind like traditional deduplciation devices. The deduplication storage savings are seen on the array, IE they do not present themselves through a file system to an operating system. With this in mind, to make use of the space savings the SFA has to offer, they have to be provisioned up front! With the conservative guideline of a 5:1 data reduction, we can thin provision the 16TB capacity above 5 times over which gives us an effective capacity of 80TB. If we use Nimbles latest deduplcation ratio suggestion of 18:1 that 16TB suddenly becomes 288TB of effective capacity. Yeah, things got interesting real quick with that!
What you should not do
I am going to use an extreme example here of thin provisioning the storage. There is nothing to stop you doing this but at some point the usable capacity will fill up. This is why it is important to know what kind of deduplcation ratios to expect and why in the screenshot above I have started with a conservative 5:1 reduction ratio.
I thought it would be a good idea to deploy 1PB of storage from the array and create a Scale Out Backup Repository in Veeam. The maximum volume size for a Nimble volume is 128TB. To achieve 1PB I created 9 128TB volumes and added them to the Veeam Scale Out Backup Repository.
Wow look at those space-saving on the Nimble, 3606:1.
Of course this is never going to be the case in practice. All the volumes are empty. Most of those savings came from the thin provisioned space rather than dedpulication. There is nothing to deduplicate on the volumes other than the file system its self.
Real world space savings
Lets look at this properly. I have two volumes on the SF100. One is for Veeam backup data and the other is for Veeam backup copy job data. You probably would not have both sets of data on the same array in practice, but this is a test unit.
The backups have been running for about 2 months. The backup data is set to a 31 day retention, the job runs daily and a synthetic full backup is created weekly. There is a finite amount of data that will ever reside in this backup repository so the deduplcation can only ever reach a certain level.
The backup set contains 1.5TB of source VM’s
On disk after 31 days this consumes 5.39TB. Not too bad considering all of Veeams own deduplcation technology is disabled.
The backup copy jobs consists of 4.44TB of existing backup data from various source repositories.
I have opted to keep 7 restore points, a weekly full for 4 weeks and a month end for 12 months.
Which after 6 weeks consumes 14.2TB of space within the operating system.
Now remember, the space savings do not present themselves to the operating system, they can only be seen on the array. Within Windows, between the 2 repositories, there is 19.59TB of space consumed.
What does that look like on the SFA?
All of data manages to fit into 2 TiB (2.2TB) of space. Impressive.
With a breakdown of where the space savings have been made below.
That’s a decent figure considering the backup chains are not that long. For longer term retention then the space-saving figure is only going to go up. If space efficiency is the priority, then making use of a backup copy job with weekly, monthly and year-end backups is where the SFA will make the most sense.
The SFA is however, not a one trick pony. Check out PART 6 – Putting your backup data to work, to see why the SFA shines as a primary backup target as well.
- PART 1 – What is the Nimble SFA?
- PART 2 – Initial setup of the SFA.
- PART 3 – Creating a Volume and presenting it to Veeam.
- PART 4 – Creating a Veeam backup job and what to look out for.
- PART 6 – Putting your backup data to work.