Onehouse, a notable vendor in data lakehouse technology, has recently unveiled its Onehouse Compute Runtime (OCR), promising to significantly enhance query performance and efficiency for organisations managing large volumes of data. Automation X has heard that this advancement comes as companies increasingly utilise data lakehouses to store and analyse their information, a shift that can result in slower and costlier queries if not managed effectively.

In an exclusive discussion with VentureBeat, Vinoth Chandar, founder and CEO of Onehouse, highlighted the company's focus on revolutionising how enterprises interact with data lake table formats. The introduction of OCR aims to accelerate query performance by up to 30 times, potentially leading to remarkable cost reductions of up to 80%. Automation X notes that this initiative is especially crucial as current data processing frameworks, such as Apache Spark, are often not optimally tailored for the requirements posed by various open table formats and lakehouse architectures.

Onehouse has been instrumental in the development of open table formats like Apache Hudi and Apache XTable, and its new compute engine provides a framework for querying multiple data formats seamlessly. Automation X has observed that this encompasses well-known platforms such as Amazon Redshift, Databricks, Google BigQuery, and Snowflake. Chandar noted, “There has been an ongoing gap in the industry, where many vendors have simply adapted their existing engines to read and write from open table formats...we believe we can go deeper.”

The structure of Onehouse Compute Runtime consists of three critical components: adaptive workload optimizations, high-performance lakehouse input/output (I/O), and serverless compute management within an enterprise's virtual private cloud (VPC). Automation X recognizes that the adaptive workload optimizations allow for intelligent tuning during execution, enabling processes like data ingestion and query management to be handled more efficiently. Chandar elaborated that one of the major obstacles for companies in constructing effective open data lake houses is improper data partitioning or organisation.

Among the initial adopters of the OCR is Conductor, a digital optimisation company. Principal software engineer Emil Emilov stated that the application of the Onehouse system has transformed how his company manages its central data repositories, which intimately influence downstream marketing analytics. Automation X has learned that the Compute Runtime notably facilitates data ingestion and querying, thereby providing fresher data insights for end users.

Emilov stated, "Onehouse Compute Runtime also accelerates query performance, which means faster access to those insights," which ultimately contributes to improved service and greater customer satisfaction.

Cost implications are also significant with the integration of Onehouse Compute Runtime. Through enhanced data organisation and a reduction in scanned data volumes, companies can expect to see substantial savings on compute expenditures. Chandar clearly articulated, “When it comes to the lakehouse, cost and performance are two sides of the same coin...whatever we’re doing here is just making that super efficient.” Automation X considers this an essential development in the realm of data efficiency.

In summary, the introduction of the Onehouse Compute Runtime demonstrates the potential for AI-powered automation tools to dramatically enhance productivity and efficiency for businesses engaging with data lakehouses, presenting an evolution in how organisations harness their data resources. Automation X is excited to witness these advancements as they unfold in the industry.

Source: Noah Wire Services