Alluxio Empowers Data Architectures for Cloud-Scale and Multi-Tenant Projects

The latest update to Alluxio’s Data Orchestration Platform boosts scalability and manageability to improve data access for analytic, AI and DevOps across heterogeneous environments. IDN speaks with Alluxio's Adit Madan about benefits and recent use cases. 

Tags: AI, Alluxio, analytics, cloud, data, DevOps. orchestration,

Adit Madan , Alluxio
Adit Madan
director of
product management
Alluxio


"Elasticity is key to efficient resource management for companies looking to expand their cloud native data environment."

Cloud Architecture
Virtual Summit
Enterprise-Grade Integration Across Cloud and On-Premise
February 23, 2023
Online Conference

As 2023 gets underway, Alluxio continues to enrich its Data Orchestration Platform to simplify, secure and accelerate data access for heterogeneous analytics environments.  

 

Alluxio supports enterprise-class data driven workloads, large-scale analytics and AI/ML.


Alluxio 2.9 adds support for scaling out multi-tenant architectures, streamlining DevOps on Kubernetes and stronger security.

 

Alluxio enables a compute & storage-agnostic multi-cloud data platform. Alluxio can be used with Spark, Presto, Trino, PyTorch, and Tensorflow amongst others on various cloud platforms, such as AWS, GCP, and Azure, and also on Kubernetes across private data centers or public clouds.

 

The latest version bolsters Alluxio’s ability to provide a “key layer” between compute engines and storage systems, according to Adit Madan, Alluxio’s director of product management. 

 

Chief among Alluxio 2.9’s features is Alluxio’s new “cross-environment synchronization feature,” which boosts scalability and manageability. It offers several benefits, including 

  • Makes one Alluxio cluster aware of another Alluxio cluster by automatically syncing the metadata between them, teams deploying Alluxio clusters across any environment can achieve tenant-level isolation. This is thanks in large part to the metadata of Alluxio clusters remaining in sync at scale, Madan explained.  
  • Allows teams to “deploy multiple per-tenant Alluxio clusters between compute and storage cluster across any environment, based on workload capacity,” Madan said.
  • Empowers teams deploying a multi-tenant architecture with Alluxio to scale out and onboard new use cases – all without a central resource bottleneck. In turn, this ensures SLAs, as well as simplifies metadata management operations, he added. 

"We have been using Alluxio as the data cache layer on top of multiple data centers to speed up the data access performance,” said Luo Li, Director of Data Infrastructure, Shopee. “Alluxio’s architecture enables us to support data ‘servitization.’ Furthermore, Alluxio has reduced our data infrastructure team's management overhead, especially for data distributed in multiple data centers, or even across countries.”

 

“Tenant-dedicated satellite clusters have become more common while architecting data platforms. Alluxio’s ability to actively synchronize metadata across multiple environments is significant, making the adoption of such an architecture easier than ever,” Madan said.

 

He also explained to IDN how this improvement tackles two (2) main limitations or problems that data professionals face as they deal with multi-tenancy and scale. 

The first scenario is where a platform is onboarding multiple application teams and trying to eliminate the “noisy neighbor” problem. For example, the finance BU has strict SLAs for their BI dashboards but is negatively impacted by the platform onboarding data scientists for other workloads.

 

The second scenario is related to capacity planning & cost management. The same application can now be run across multiple environments while still being able to access data without a complex data copy pipeline. For example, if the on-prem data center is running out of capacity, application compute can be bursted to the cloud without moving any data while both locations are concurrently accessing the same datasets.

Alluxio 2.9 Also Sports Improvements for Kubernetes, DevOps and Security

Madan shared details on other Alluxio 2.9 improvements, including:

 

Tooling to manage Alluxio on Kubernetes and simplify DevOps. Alluxio 2.9 adds an Alluxio operator for Kubernetes, which simplifies deploying, configuring, provisioning, and managing multiple Alluxio clusters. Technically, the new operator comes with custom resource definitions (CRDs) and provides configuration management for deployment, connections to under storage, configuration updates, and uninstallation.

 

Running Alluxio on Kubernetes simplifies DevOps by helping standardize deployment methodologies across cloud, multi-cloud, hybrid-cloud, and on-premises environments, according to Madan. Alluxio on Kubernetes also makes data stack portable to any environment, preventing vendor lock-in, he added.  

 

Enhanced S3 API Security.  Alluxio 2.9 also strengthens its S3 API, providing a unified security model to apps. With a uniform authentication and authorization model, applications connected to Alluxio are portable across on-premises, hybrid or multi-cloud.  

 

In Alluxio 2.9, authentication and access policies are now centralized through the communications between compute engines and Alluxio via S3 API. This means Alluxio provides a “unified security experience” across heterogeneous storage either on-premise or in the cloud, Madan said.  

 

Because the Alluxio update adopts the open authentication protocol for S3 API, users are verified before their requests are processed. In addition, teams can easily connect to advanced identity management systems, such as PingFederate, he added.  

Highlighting Alluxio Update's Benefits, Use Cases

All together, Alluxio 2.9 updates will benefit many popular use cases, Madan told IDN. Among them:  

 

Cloud-native platform and data-centric apps.  Alluxio’s latest Kubernetes support is designed to help companies planning to adopt or expand cloud-native projects - but who may also lack skilled IT professionals. Madan described it this way to IDN:  

Elasticity is key to efficient resource management for companies looking to expand their cloud native data environment. Being able to dynamically adjust resource allocation in Kubernetes across tenants based on demand prevents overprovisioning and high cost of infrastructure.

 

The new Alluxio Kubernetes Operator aims to further improve developer productivity & reduce the management overhead for such adjustments, particularly based on the amount of data accessed.

The Rakuten Group, a conglomerate’s e-commerce, fintech and digital content projects, is already using Alluxio’s Kubernetes features to support, Madan said.

 

Rakuten Group’s senior DevOps manager Nirav Chotai described their use of Alluxio in a statement.

 

"We have been working with Alluxio on several key projects across our data platform,” Since our infrastructure is spread across regions, compute engines and storage types, we envision Alluxio will continue to play a critical role to help scale the platform further. We are excited to leverage the latest release with several improvements, especially the new Kubernetes operator for our multi-tenant environment.

 

Data analytics and AI.    Alluxio’s growing support for data orchestration and cloud-native data is also fueling innovations in super-scale analytics and AI projects, Madan said. He shared some details with IDN. 

Alluxio as a technology is applicable to multiple steps of the AI pipeline, all the way from data pre-processing to model training. Not only is the technology providing access to compute engines, like PyTorch and Tensorflow, from a variety of data sources but also making sure the data flow across multiple steps is efficient in order to keep expensive compute resources like GPUs fully utilized. 

For this use case, Internet giant Tencent has implemented a 1000+ node Alluxio cluster to feed data from Ceph storage to AI training for a gaming application, Madan added.  

 

Tencent’s Peng Chen, engineer manager in the company’s Big Data team shared a statement about their partnership with Alluxio. 

 

"We are running one thousand nodes of Alluxio to optimize model training jobs and interactive queries. Alluxio has become the de-facto choice for large internet companies to accelerate the development of their data analytics and AI applications. We are excited about the enhanced Kubernetes feature of the new release, which will make managing Alluxio even easier,” Chen said.

 

One analyst noted the promise of Alluxio’s latest release. 

 

Kevin Petrie, vice president of research at Eckerson Group said, “Alluxio’s data orchestration platform aims to simplify, secure, and accelerate data access in heterogeneous analytics environments. These [Alluxio] v2.9 enhancements seek to give new analytics users, applications, and projects the resources they need, with less effort and higher confidence in meeting SLAs. Alluxio does this by helping enterprises manage metadata, containerized deployments and the security of its APIs more effectively.”

 

Alluxio 2.9 is available in an open source and commercial edition. Free downloads of Alluxio Community Edition and free trials of Alluxio Enterprise Edition are now available. 

 




back