How to architect a large, multi-tenant, enterprise grade Hadoop Data Lake

My Background

I have been lucky enough to start working with Hadoop 12 years ago when it first hit the scene. I was just in the right place at the right time and participates in architecting and engineering some of the largest Hadoop cluster in the US. I have built multi-tenant hadoop clusters in the banking, telecommunications, utility and retail sectors. For the most part I have used the same architecture and thought I would share it with you so that perhaps you can learn something or pickup some tips and tricks.

Use Case

A company has a very large RDBMS-based architecture containing thousands of databases, hundreds of servers and many users and applications. The cost of licensing has grown exponentially, and it is difficult if not impossible to serve all the new analytics use cases which include machine learning. Data scientists are not able to run large machine learning jobs on big data due to the amount of resources required, there have been outages due to their attempts in the past. The cost of licenses, continually adding servers to the database cluster and the inability to provide a platform for deep learning leads the company to seek a true big data solution and decides to begin a multi-year strategy to build and migrate to a Hadoop cluster.

Corporate Datacenter Architecture

Hardware

The Hadoop cluster had 10 data-nodes, each with 128 cores, 8TB ram and 10-2TB SSD drives. So the total size of the cluster was very large, 1280 cores, 40TB ram, 20TB (4PB) of disk storage. So this is considers a large hadoop cluster.

The Yarn configuration was 1 core and 2GB yarn containers, so we could support 1280 containers.

Services

HDFS - distribute file system
Apache Hive - schema-on-read database
HBase - NoSQL Database
Yarn - Resource Manager
Apache Spark - Machine Learning, Analytics, ETL

Multi-tenant Configuration

We had approx 30 different business units that were called tenants on the platform. Each tenant was provided a yarn queue with quotes set based on the funding and requests from the business unit. We also had a generic queue for adhoc and admin workloads that didn't fit into a business unit. This adhoc queue could use the entire cluster but was the lowest priority so any workloads that came in would preempt the adhoc workload

Yarn Queues

Each queue is allocated a minimum resource allocation of 20% of cluster resources with a maximum of 100% of cluster resources when the cluster is not in use.

Preemption is enabled on all queues so workloads will be interrupted if any queue does not have 20% of cluster resources available.

You can also use this to provide a larger portion of the cluster to a business unit that provides more funding than another.

Queue Name	Minimum Resources	Maximum Resources	Preempt-able
finance	10%	100%	True
advertising	20%	100%	True
human resources	10%	100%	True
adhoc	20%	100%	True
administration	20%	100%	False

Tenant Onboarding

Typically you will ask new tenant to purchase a node to contribute to the cluster. The new tenant can immediately start using the cluster as it will take some time to procure hardware but as the next tenant joins contributing a node it will benefit all tenants.

You'll want to talk with the new tenant to understand their workloads. The new tenant could be all adhoc query and does not plan on executing any scheduled jobs. This tenant would need its own queue as the adhoc queue is more for single user exploration of the data.

The administration queue if for maintenance jobs, usage reports, etc. that don't really fit into a queue. It is also fully preempt-able as the jobs are generally low priority.

After you understand the workload you can create the resources needed by the business unit.

Standard Resources

Service	Resource	Description
HDFS	HDFS Directories for application data and code	hdfs://data/finance hdfs://app/finance hdfs://database/finance
Gateway Node	Linux File System for application data and code	file://data/finance file://app/finance
Apache Hive	Business unit databases	finance_inc finance_wrk finance_pub finance_grp
Active Directory	Service account to run applications with Kerberos	finance_srv_app

Gateway Nodes (Edge Nodes)

Gateway nodes are the entrypoint to the cluster. Hadoop commands are available and can be executed from the edge node. Enterprise scheduling software like Autosys, Tidal, etc. will be installed on the Gateway nodes for job execution.

Gateway Node Architecture

Minimum of 2 gateway nodes for redundancy and resource sharing. You can round-robin new tenants between the two gateway nodes to maximized resource usage.
You need 2+TB of shared disk (FAS, NAS, etc.) so that the file system is available on both gateways.
Be sure to set permissions on the file:///app/finance directory to limit access to the finance team.
Set resource quotas on the filesystem, cpu, etc. since it's a shared node you will have one team try to download terabytes of data, fill-up the disk and other tenant will have errors and a negative experience
End users will have access to login to gateway nodes in DEV, but not in PROD
End users should never login to a master or data node as they can impact cluster health by running adhoc jobs that are unaccounted for by YARN
This is why we do not co-mingle gateway services with master or data nodes

HDFS Architecture

Create hdfs space for each business unit
Set quotas on hdfs space, Container constraints are handled through the Yarn UI

Hive Architecture

Create several databases for each tenant
TENANT-NAME_inc, finance_inc: a database where incoming data arrives
TENANT-NAME_wrk, finance_wkr: a working database where ETL stages can be executed
TENANT-NAME_pub, finance_pub: a database where published data will be staged
TENANT-NAME_grp, finance_grp: a database where tenant team members can share data
USER-NAME_db: jsmith_db: a database owned by an individual with full access to CRUD tables and views.
User level databases are optional but appreciated. There is an administrative overhead but scripting should take care of that concern.

Hadoop Migration