The Economic Value of Data Domain

White Paper

The Economic Value of Data Domain and Integrated Data Protection Appliances (IDPA)

Validating the Cost Efficiency of Dell EMC Backup Appliance Solutions

By Vinny Choinski, Senior ESG Lab Analyst; and Christophe Bertrand, Senior Analyst

June 2018


To out-innovate and out-pace their competition, organizations must be on a consistent path to keep their infrastructure modern. IT is under constant pressure to deliver optimized infrastructure for new business initiatives and supporting applications all while trying to contain or even reduce costs. In fact, respondents to ESG’s ongoing research consistently cite cost reduction as one of the top business drivers affecting their IT spending. When asked in a research survey how their organizations intended to contain costs in 2017, 27% of respondents said that they would be purchasing new technologies with better ROI.1

To drill down on one specific group, 35% of IT managers tasked with implementing data protection processes and technologies routinely cite cost as one of their top challenges (see Figure 1). This group seems to find workload-centric issues, particularly challenges tied to protecting virtualized environments and remote offices, especially problematic as well. However, when looking at the overall rankings of data protection challenges, a bigger story emerges. Costs and virtualization are top concerns, but the next three reported considerations are both operational and functional in nature. They are challenges related to performance, distributed architecture backup and recovery, and database and application protection.2

These challenges are also consistent with the top mandates from IT leaders, which relate to better speed, agility, reliability, and cost control, all of which have been consistently reported in previous ESG data protection modernization reports. ESG analysis confirms that inconsistencies between what implementers struggled with and what leaders mandated often resulted in very little actual modernization or transformation occurring. Implementers were not able to act on leaders’ mandates due to technical impediments in their data protection environments.3

This paper discusses how Dell EMC Data Domain systems, and converged solutions built on the Data Domain architecture, such as the Dell EMC Integrated Data Protection Appliance (IDPA), help deliver the agility IT implementers need to transform their infrastructures to meet senior IT leader mandates, solve today’s data protection challenges, and, most importantly, reduce the cost of storing protection data. ESG’s findings are based on an audit and analysis of key performance indicators (KPIs): real-world data from deployed systems, including the original purchase price, environmental costs, capacity/utilization, and performance.

Data Domain and IDPA Architectural Benefits

In 2006, EMC made the decision to avoid bolting data deduplication code onto its existing data protection solutions, marking a strategic shift in its data protection strategy. This shift began by leveraging technology from two key acquisitions. The first purchase delivered source-based deduplication technology, and the second was Data Domain, with target dedupe technology. The technologies from these two acquisitions became fundamental components in EMC’s data protection solutions. Now under the Dell EMC brand, Data Domain systems, including the IDPA, are flash-enabled, fully integrated, purpose-built data protection appliances designed to reduce the amount of disk storage needed to retain and protect data. With both source- and target-based data deduplication natively integrated into the architecture, these systems make it possible to complete more backups in less time, provide faster and more reliable restores, and reduce the amount of storage capacity needed for data protection. A recent generation of Data Domain and IDPA systems introduced the inclusion of flash for metadata, enabling performance at scale and 20x faster instant access and restore of virtual machines directly from Data Domain and IDPA compared with previous generations.

Key Data Domain and IDPA architectural features include:

  • Stream-informed Segment Layout (SISL): SISL enables Data Domain and IDPA systems to perform 99% of deduplication processing in CPU and RAM. This means the systems do not rely on the number of disks to increase performance.
  • Variable-length Segmentation: For optimal deduplication rates, Data Domain and IDPA systems leverage variable-length segmentation to break up streams based on the natural structure of the data for optimal deduplication rates. This allows the system to determine whether the segment is unique before compressing and storing it.
  • Inline Deduplication: Data Domain and IDPA systems perform deduplication in CPU and memory as the backup stream is received by the system. This means only unique data is ever sent to and stored on disk. This eliminates the need for a disk staging area and compute resources for post-process deduplication.
  • Data Domain Boost Software: The Data Domain Boost software distributes parts of the deduplication process to the application clients or the backup server. With DD Boost, only unique data traverses the connection between the backup server or clients and the Data Domain and IDPA system. This also helps free up resources on the customer’s network and the Data Domain and IDPA system for improved target-side deduplication performance.
  • Data Invulnerability Architecture: Data Domain and IDPA are built to ensure data can be reliably recovered. The Data Invulnerability Architecture provides inline write and read verification, which protects against and automatically recovers from data integrity issues during data ingest and retrieval. Continuous fault detection and self-healing ensure data remains recoverable throughout its lifecycle on the Data Domain and IDPA systems.

Data Domain and IDPA Economic Benefits

Let’s take a closer look at how the Data Domain and IDPA architectures translate into economic benefits for business stakeholders. Data deduplication enables customers to store more data on the same amount of physical disk space. This reduces storage capacity requirements and drives down cost. Data deduplication implemented at the source or client side also helps with capacity savings, but with the added benefit of improving backup performance. With source-side deduplication, only unique data blocks are sent from the source to the target during the backup operation, which significantly reduces network traffic. This improved network efficiency allows for backup data growth using existing network infrastructure and possibly eliminates or postpones the need for expensive network upgrades. Obviously, the less data that needs to be transferred, the faster the backup performance. Shorter backup durations also allow customers to increase the frequency of backups, reducing the risk of data loss, which can be extremely costly to an organization. With DD Boost software, Data Domain and IDPA systems support both source- and target-side deduplication, giving customers the flexibility to deploy deduplication where it makes the most sense for their environments.

As shown in Figure 3, ESG’s analysis of real-world data, including hardware, software, power, cooling, and deduplication, demonstrates that Data Domain and IDPA systems are easily capable of serving storage to data protection environments for fractions of a penny per GB per month.

Note how the cost to protect ranges between 0.062 and 1.2 cents per gigabyte for the twelve customers that ESG examined. This relatively wide range in savings is primarily due to the amount of deduplication capacity savings that are being achieved by Dell EMC customers in production environments. The balance of this report takes a closer look at the cost to protect with a focus on how deduplication savings can be magnified with an end-to-end combination of Dell EMC Data Protection Suite software and Data Domain and IDPA hardware.

Deduplication Efficiency Does Matter

ESG began its exploration of the economic value of Data Domain and IDPA deduplication capacity savings by auditing and analyzing call-home support data from 12 active Dell EMC customers. As shown in Figure 4, the customer environments ranged from approximately 270 TB to more than 90 PB of data to be protected. The customers spanned multiple industries, including technology, manufacturing, insurance, and health care. The selection of customers in different industries was designed to capture deduplication results across different types of data sets. The gray bars in Figure 4 show the amount of data being protected for each environment. The green bars show the amount of data stored after deduplication.

  • Deduplication: The largest deduplication rate observed in the analyzed data was 126:1. This result came from the customer environment identified by the first data point on the left side of Figure 4. The average deduplication rate for all analyzed customers was 41:1.
  • Protection: Even the customer with the smallest observed deduplication rate of approximately 9:1 could protect almost 92 PB of data using just 10 PB of capacity. This customer is identified by the last data point on the right side of Figure 4. The customer is part of the manufacturing industry and has a highly distributed environment.

Figure 5 provides another lens on the Data Domain and IDPA deduplication effect. This figure shows the percentage of deduplication achieved for each of the same 12 customer environments. Deduplication rates range from 85% to 99%. Typically, the longer the Data Domain and IDPA solution has been receiving data in the environment, the higher the deduplication rate. This is because it is more likely to see the same data patterns over time and only needs to store unique data.

The variable-length segmentation feature of the Data Domain and IDPA architectures is a key component to achieving this level of deduplication. With variable-length segmentation, the Data Domain and IDPA solutions can more easily align with natural patterns, including database timestamp markers, in the data structures being sent to the device for protection. Variable-length segmentation produces significantly higher levels of deduplication over fixed-length architectures. It also enables more effective scalability within a single storage pool. This also results in fewer devices to manage, saving on operational costs.

Another benefit of high variable-length segmentation deduplication comes when data is replicated between Data Domain and IDPA devices. Less data means less data to replicate. This means not only better replication performance, but also lower WAN bandwidth requirements for the replication process.

  • Deduplication Rates: The audited and analyzed deduplication rates ranged from 85% to 99%, with an average of 96%. This means that only a fraction (in best cases 1% or less) of the production data needed to be stored on the Data Domain and IDPA systems for protection.
  • Under 90%: Only one customer from the analyzed data had a deduplication rate under 90%. Again, this customer represents the highly distributed environment with distributed protection pools.
  • 90% and Above: The remainder of the clients analyzed achieved deduplication rates above 90%, and most customer environments were running a deduplication rate of 99%.

Speed and Scalability

Increasing storage capacity through deduplication does not always go hand in hand with high performance. Data Domain and IDPA address this challenge by leveraging two key performance elements: Stream-informed Segment Layout (SISL) and DD Boost software.

SISL is an architectural element of the Data Domain and IDPA platforms that enables 99% of the deduplication process to occur in CPU and RAM. This means the solution does not rely on a high disk spindle count for performance. As a result, smaller footprint solutions can achieve the same performance as higher spindle count configurations. Also, because the Data Domain and IDPA systems use Intel CPUs, they benefit from performance enhancements in each new release of the Intel processors.

DD Boost software also helps deliver performance by distributing parts of the deduplication process to the application clients or the backup server. With DD Boost, only unique data needs to be moved from the backup server or clients to the Data Domain and IDPA system. DD Boost reduces the amount of data moved by up to 99%. The net benefit of quicker backup job completions was audited by ESG. As shown in Figure 6, most customers routinely complete backup jobs in an hour or less.

  • Under 15 Minutes: For one customer in the technology industry, 76% of all backup jobs finished in less than 15 minutes.
  • Under 1 Hour: For eight of the twelve customers analyzed, 70% to 98% of all backup jobs completed in less than an hour (as shown in Figure 6).
  • Under 4 Hours: For one customer in the insurance industry, all backup jobs completed in less than four hours.
  • More than 4 Hours: For two of the largest environments (in terms of the number and size of applications), it took more than four hours to complete all backup jobs.

Cost Savings Considerations

ESG completed the exploration of the economic value of Data Domain and IDPA deduplication capacity savings by auditing and analyzing call-home support data from over 15,000 Dell EMC backup appliances deployed worldwide. This sample was collected from systems that also recorded the backup software being used and the deduplication rates being achieved. The data shows that Dell EMC backup appliance deduplication efficiency can be taken even further with an end-to-end Dell EMC data protection solution. As shown in Figure 7, the pairing of Dell EMC appliance hardware with Data Protection Suite software magnified duplication savings and reduced three year storage capacity costs by 57% to 81% when compared to solutions from other vendors.

The Data Domain and IDPA solutions deliver other economic benefits beyond the storage capacity cost savings shown in Figure 7. ESG’s analysis of real-world customer data also revealed the following benefits:

  • Network Bandwidth Reduction: Like most enterprise applications, backup and recovery solutions rely on network resources to function. This means both LAN resources for local backup and restore and WAN resources for DR and business continuance. Inefficient data protection solutions can quickly consume valuable network resources and even impact user connectivity and productivity. By leveraging Data Domain and IDPA deduplication, a multi-national manufacturer was able to reduce its local and remote data protection bandwidth utilization by 98%.
  • Performance: Efficient data deduplication can have a major impact on improving overall data protection performance. By implementing DD Boost, which reduces the amount of data that needs to be transferred between the client and the Data Domain system, a heavy equipment manufacturer was able to improve its backup and restore speeds by 50%. The same customer improved its DR readiness by 90%.
  • Data Center Footprint Reduction: Real estate, especially the amount needed to support the space, power, and cooling requirements of a modern data center, is not cheap. It can be difficult, sometimes impossible, to physically expand a data center without relocation. With Data Domain, for example, a national department store chain eliminated physical tape in its data protection environment and freed up three full rows of highly valuable data center rack space.

The Bigger Truth

The top data protection mandates from IT leaders are focused on improving the fundamental reliability and agility of the solution(s) in use. The mandate that follows closely behind is cost reduction, which is also seen as a top priority among data protection implementers. These challenges should not be seen as contradictory or mutually exclusive; in fact, they can all be addressed by improved data protection solutions that are engineered as much for efficiency as they are for reliability and capability.

Efficiency comes in many flavors, often grounded in the cost of doing business and interoperability:

  • The ability to economically deliver protection storage by reducing the cost to serve the data protection application.
  • Interoperability across backup software solutions, since only one in four organizations has a single backup application.4
  • The scalability and deduplication efficiency of the protection storage system, which can be used to reduce the number, and cost, of systems that need to be deployed and managed.
  • The ability to leverage the same pool of capacity-optimized protection storage for backup and archive data.
  • Interoperability (and integration) between protection storage and myriad platforms, so that application owners (e.g., database administrators and vAdmins) can utilize their own tools for supplemental protection/recovery, while still being responsible IT citizens and leveraging efficient centralized protection storage.
  • The ability to speed deployment with integrated data protection solutions, reducing the number of systems that need to be deployed, simplifying licensing with an all-inclusive approach, and reducing interoperability issues.
  • The ability to simplify overall administration of data protection environments with centralized management and reporting—plus single-step upgrade/patches for all data protection components inside the appliance—which reduces the time needed to maintain the environment.
  • The ability to integrate broader capabilities, such as cloud (long-term retention, cloud disaster recovery), search, and analytics, which again, reduces deployment times and simplifies management.

By providing what many in the industry consider synonymous with “protection storage” and/or “deduplication,” Dell EMC has expanded from simply providing data protection components to providing a complete ecosystem that includes production storage, backup vendors, archive vendors, and plug-in accelerants for applications (i.e., DD Boost), among other elements. The Dell EMC IDPA is now part of this ecosystem with protection software, servers and storage, single UI management, cloud capabilities, and search and analytics, all included and pre-configured. As such, there is significant impetus to continue innovating in front of where the current market demands lie, resulting in further efficiency gains within deduplication mechanisms, enhancements with their integrations, and expansions of their capabilities beyond just on-premises deduplication, such as cloud-tiering, replication to/from cloud-hosted appliances, virtual appliances, disaster recovery to the cloud, etc.

One of the most important differentiable elements of any deduplication platform is the rigor of ensuring the integrity of the data within the repository, since so much relies on the integrity of every unique block within the system. As such, it is critical that organizations diligently investigate how zealous and how frequently the system checks the underlying storage for assured integrity. Dell EMC’s approach to this has been a hallmark of its backup appliance platform since its inception through what it refers to as its Data Invulnerability Architecture (DIA).

Based on an audit of field data collected from more than 15,000 production environments, ESG has confirmed that the combination of Dell EMC Data Domain and IDPA hardware and Data Protection Suite software reduces the cost of protection capacity by a range of 57% to 81% compared to Dell EMC backup appliance environments with competitive backup software. Organizations that are hesitant to make a Data Domain or IDPA investment based on “price” would be well served to reconsider the economic benefits over time including the cost of reduced downtime and data loss, improved performance, and, most important, a cost to protect of less than a penny per GB per month.

1Source: ESG Research Report, 2017 IT Spending Intentions Survey, March 2017.

2 Source: ESG Research Survey, Data Protection Modernization Trends, December 2016.

3 Source: ESG Research Report, 2015 Trends in Data Protection Modernization, September 2015.

4 Source: ESG Research Survey, Data Protection Modernization Trends, December 2016.