Varada 3.0 delivers elastic scaling without sacrificing the power of indexing

Varada unveiled version 3.0 of its data analytics platform, now delivering a cost-effective alternative to offerings like Snowflake, Redshift, Athena, Preso, Trino and BigQuery for at-scale big data analytics users who rely on the power of indexing to extract insights from massive, unstructured data sets.

The new version marries the power of cloud elasticity and the query power of indexing for big data analytics, giving data teams the ability to scale analytics workloads rapidly and meet fluctuating demand. It delivers a dramatic increase in cost performance and cluster elasticity as compared to the previous version.

In addition, version 3.0 eliminates the need to keep high-performance and expensive SSD NVMe (Solid-State Drive Nonvolatile Memory Express) compute instances idling when the cluster is not in use.

Data teams are often evaluated on how quickly they can react to spikes in demand. The separation of compute and storage in version 3.0 lets them elastically scale clusters out and in as query traffic fluctuates, avoiding the waste of overprovisioning and idle resources.

“Varada was built on the premise that indexing can transform big data analytics, if done correctly,” said Eran Vanounou, CEO of Varada. “With version 3.0, the Varada platform is now the most powerful and cost-effective way to leverage the power of big data directly atop your data lake.”

“Query acceleration optimizations are time consuming to create, including indexing,” continued Vanounou. “So, we want to ensure that the platform operates autonomously, including quickly reacting to changing demand. V 3.0 introduces a new layer to Varada’s platform. We’ve separated the index and data from the SSD nodes, creating a “warm” tier in the data lake that allows us to preserve those indexes much faster and at a much lower cost. By doing so we’re bringing the power of cloud computing scaling to big data indexing.”

Big data indexing must be autonomous (adaptive, dynamic and elastic)

This third iteration of the Varada platform marks the latest step in a journey that began last December with version 1.0, in which adaptive indexing chooses the optimal index for each data set to deliver 10x-100x faster performance compared to other data lake query engines.

The platform used pre-defined materialization to enable indexing. This past spring, version 2.0 eliminated the need for materialization and added a dynamic, smart observability layer that automatically decides which data to index and when to index, making it easy to use and giving users a dramatic improvement in TCO (Total Cost of Ownership).

Version 3.0 extends these advantages with rapid and elastic scaling capabilities that let users add and remove nodes and clusters rapidly depending on current workload needs, further improving TCO for large-scale users.

Hot, warm and cold data layers optimize scaling performance and TCO

Version 3.0 of the Varada platform includes three layers. The first is the hot data and index layer, in which SSD NVMe attached nodes (in the customer’s Virtual Private Cloud) are used to process queries and store hot data and cache for optimal performance. The second is the warm index and data layer, where an object storage bucket on the customer’s data lake is used to store all indexes for scaling purposes. The third layer is the customer data layer (“cold”), which remains the single source of truth.

When scaling in, indexes and data are not “lost” and continue to be available for other clusters and users. An index-once approach enables to speed up warm-up time by 10x-20x compared to indexing data from scratch. As new indexes are created by the platform, they are also stored in a designated folder on the customer’s data lake (“warm data”), in addition to the cluster’s SSDs in the “hot data” layer.

When the cluster is scaled in or eliminated and some nodes are shut down, indexes remain available as warm data. Warm indexes enable fast warming up when scaling back out or when adding new clusters and adding SSD resources to the cluster. When scaling in, data admins keep the ability to start a cluster with the state and acceleration instructions of previously live clusters.

Varada’s platform is based on a multi-cluster approach, which allows different clusters to share warm indexed data by accessing the designated bucket on the data lake. In addition to behavior-based indexing, data platform teams can opt for indexing in the background by low-cost nodes on spot instances. Indexing will be stored on the “warm data” layer for fast warming up in the future. This can be used to prepare in advance for upcoming spikes in analytics requirements or to significantly reduce TCO.

Don't miss