The importance of metadata to prevent data leaks
A new IDC report highlights how the widespread use of collaborative content technologies is fueling the aggressive growth of unstructured and semi-structured data. While collaboration produces highly valuable information, it also introduces significant risk due to increasingly complex and dynamic access control requirements.
“Digital integrity is a critical business differentiator for any organization. The high-profile data breaches in the last three years demonstrate that organizations who fail to protect sensitive data will incur serious regulatory and legal liabilities, along with revenue and market share declines,” said Vivian Tero, program director, Governance, Risk & Compliance Infrastructure at IDC. “Visibility, actionable intelligence and automation are critical to managing the explosion of unstructured and semi-structured content in distributed systems.”
IDC forecasts that the total digital universe volume will increase by a factor of 44 in 2020. According to the report, unstructured data and metadata have an average annual growth rate of 62 percent. More importantly, high-value information is also skyrocketing. In 2008, IDC found that 22 to 33 percent of the digital universe was high-value information (data and content that are governed by security, compliance and preservation obligations).
Today, IDC forecasts that high-value information will comprise close to 50 percent of the digital universe by the end of 2020.
Drivers for a metadata framework include:
Data loss is rising: IDC research notes that organizations average 14.4 unintentional data losses a year, mostly through employee negligence. Organizations need to ensure that controls are in place to mitigate the risks of data leakage, theft, loss and integrity arising from excessive access rights and permissions and non-existent audit trails. Excessive and/or out of date privilege and access rights were considered as having the most financial impact on the organization.
IT is drowning in the data deluge: IT budgets are, on average, growing at less than one-fifth the forecasted annual growth rates of digital information, according to IDC. At the same time, manual approaches to managing and protecting information have become unwieldy, error-prone and ineffective. IT needs automated analysis of the permissions structure to determine which containers require ownership, and analysis of actual access activity to identify likely data owners.
Stale data impacts the bottom line: Inactive and orphaned folders can be as high as 70 to 85 percent of the data in distributed systems. The majority of organizations have no process to identify the owner of files, and many are unable to determine which individuals and roles are authorized to access the data.
Impact on the cloud: Without adequate information on the security and compliance profile of the data – including data ownership, access controls, audits and classification – cloud computing initiatives are amorphous and imprecise. CFOs and CIOs will be hesitant to move critical data and processes into the cloud without visibility on access and ownership, traceability and data segregation.
Automation is key to success: Too often, users have access to significant amounts of data that isn’t relevant to them. Organizations therefore need to ensure that users and roles are aligned to correct groups, and that these groups enable access to the appropriate data containers.