ChaosSearch shared key findings from a survey of 1,020 U.S. IT professionals on data retention, data usage, and investments in data lake and cloud data platforms. The findings show that IT talent may be wasted on moving, migrating, pipelining, and transforming data — a process that can, and should, be automated to enable faster, more insightful analytics.
The volume and velocity of data has grown exponentially in the last two years as a direct result of the pandemic shifting the working world to be almost entirely digital. While this increase in data should present new opportunities for organizations, current infrastructures make retaining and analyzing vast amounts of data at scale challenging. That, coupled with the growing data scientist shortage, is leaving enterprises unable to derive truly valuable insights that power business decisions.
Data analytics challenges and trends
Data lake investment is on the rise. 69% of respondents indicated that their organizations have implemented a data lake, while 23% of respondents have not implemented a data lake but are planning to deploy one.
Respondents would prioritize investing in technology above investing in people to solve analytics challenges. When asked to choose between more time, budget, people, or technology to better solve existing analytics challenges, 59% of respondents selected technology – whereas no more than 13% of respondents chose time, budget, or people, respectively.
IT talent is wasted on data prep. Respondents spend almost as much time prepping data (6.6 hours per week) as they do analyzing it (7.2 hours per week).
Traditional data lake platforms require too much data transformation upfront. Time spent moving, migrating, pipelining, or transforming data increased to 7.1 hours per week for respondents who have a data lake. Additionally, 30% of respondents indicate that their end-user consumption/visualization tools aren’t directly connected to the lake, resulting in data duplication and data movement challenges that limit insights and time to value.
However, some platforms are helping to expedite data analysis. Thirty-eight percent of respondents with data lakes are able to respond to data requests within an hour, compared to 24% without data lakes.
There’s a disconnect on whether organizations are using all of the data available to them. Eighty-seven percent of respondents agree to some extent with the following statement: My department is using all of the data at its disposal to make informed business decisions. However, those who agreed with the statement are retaining less log data (only 1-6 months) than those who disagreed (7-12 months).
There is very little consistency in log data retention. Thirty percent of respondents retain log data for a month or less, while 24% retain it for years or for unlimited timeframes.
Many businesses aren’t saving enough log data to be useful in analyzing and preventing cyber security breaches. Forty-seven percent of respondents who retain less than 7 months of log data have experienced a breach in the last year. Meanwhile, only 24% of respondents who retain 7+ months of log data experienced a breach in the last year.
“Our research supports that by showcasing the challenges many IT teams are facing — and the limitations they experience with tools that aren’t able to scale with the amount of data being produced today. Without an easy, cost-effective, and reliable way to access and analyze all of the data at your disposal, your business is at risk.”