Companies getting serious about AI and analytics, 58% are evaluating data science platforms

+ Watch the recorded webinar: Inside a Docker Cryptojacking Exploit

New O’Reilly research found that 58 percent of today’s companies are either building or evaluating data science platforms – which are essential for companies that are keen on growing their data science teams and machine learning capabilities – while 85 percent of companies already have data infrastructure in the cloud.

evaluating data science platforms

Companies are building or evaluating solutions in foundational technologies needed to sustain success in analytics and AI. These include data integration and Extract, Transform and Load (ETL) (60 percent), data preparation and cleaning (52 percent), data governance (31 percent), metadata analysis and management (28 percent) and data lineage management (21 percent).

Companies are building data infrastructure in the cloud. Eighty-five percent indicated that they had data infrastructure in at least one of the seven top cloud providers, with two-thirds (63 percent) using Amazon Web Services (AWS). The results also showed that users of AWS, Microsoft Azure or Google Cloud Platform (GCP) tended to use multiple cloud providers.

The use of durable cloud storage is prevalent. Sixty-two percent of all respondents indicated they used at least one of the following: Amazon S3 or Glacier, Azure Storage, or Google Cloud Storage.

Data scientists and data engineers are in demand. When asked what skills their teams needed to strengthen, 44 percent said data science and 41 percent said data engineering.

Respondents used a variety of streaming and data processing technologies. Half of the respondents (49 percent) used either Apache Spark or Spark Streaming, while other popular tools included open source projects (Apache Kafka, Apache Hadoop) and their related managed services in the cloud (Elastic MapReduce, AWS Kinesis).

Business intelligence uses a mix of open source and managed services. When it comes to SQL, respondents favored open source tools (Spark SQL, Apache Hive) and managed services in the cloud (AWS RedShift, Google BigQuery).

Although 60 percent aren’t using serverless technologies, 30 percent are already using AWS Lambda. In fact, 38 percent indicated that they were using at least one serverless technology – a pattern that remained consistent across geographic regions.

“It is clear that in 2019 companies are planning to invest in implementing analytics, AI and automation tools,” said Ben Lorica, O’Reilly’s chief data scientist and chair of the Strata Data Conference. “However, in order to do so successfully, initial investments must be made in the foundational technologies and infrastructure needed to sustain success. Our research shows that a majority of companies understand this and are already building – or at the very least evaluating – platform solutions and tools to make this possible.”