Volume and quality of training data are the largest barriers to applying machine learning
IDC predicts worldwide spending on artificial intelligence (AI) systems will reach $35.8 billion in 2019, and 84% of enterprises believe investing in AI will lead to greater competitive advantages (Statista). However, nearly eight out of 10 enterprise organizations currently engaged in AI and machine learning (ML) report that projects have stalled, and 96% of these companies have run into problems with data quality, data labeling required to train AI, and building model confidence, according to Alegion.
Data issues are causing enterprises to quickly burn through AI project budgets and face project hurdles. The new report, “Artificial Intelligence and Machine Learning Projects Obstructed by Data Issues” was conducted by Dimensional Research. The findings include feedback from 227 participants including data scientists and business stakeholders involved in active enterprise AI and ML projects, addressing the maturity of ML in the enterprise, today’s ML project challenges, and the tools and resources used in these projects.
“The single largest obstacle to implementing machine learning models into production is the volume and quality of the training data,” said Nathaniel Gates, CEO and co-founder of Alegion, a training data platform for AI and ML initiatives. “This research reinforces our own experience, that data science teams new to building ROI-driven systems try to tackle training data preparation in house, and get overwhelmed.”
Large businesses with more than 100,000 employees are most likely to have an AI strategy – but only 50% of them currently have one, according to MIT Sloan Management Review. Alegion’s survey reinforces this finding that AI is still nascent in the enterprise:
- 70% report that their first AI/ML investment was within the last 24 months
- Over half of enterprises report they have undertaken fewer than four AI and ML projects
- Only half of enterprises have released AI/ML projects into production.
To get AI systems off the ground, training data must be voluminous and accurately labeled and annotated. With AI becoming a growing enterprise priority, data science teams are under tremendous pressure to deliver projects but frequently are challenged to produce training data at the required scale and quality.
Survey respondents echoed these observations:
- 78% of their AI/ML projects stall at some stage before deployment
- 81% admit the process of training AI with data is more difficult than they expected
- 76% combat this challenge by attempting to label and annotate training data on their own
- 63% go so far as to try to build their own labeling and annotation automation technology
- 71% of teams report that they ultimately outsource training data and other ML project activities.