About Mesos in INDIGO-Datacloud Project

Mesos Cluster

The present document describes how Apache Mesos is used by the INDIGO-DataCloud PaaS layer. INDIGO-DataCloud (start date: 01/04/2015, end date: 30/09/2017) is a project funded under the Horizon2020 framework program of the European Union and led by the National Institute for Nuclear Physics (INFN). It developed a data and computing platform targeting scientific communities, deployable on multiple hardware and provisioned over hybrid (private or public) e-infrastructures. The INDIGO solutions are being evolved in the context of other European projects like DEEP Hybrid-DataCloud, eXtreme-DataCloud and EOSC-Hub

Introduction

The INDIGO-DataCloud PaaS relies on Apache Mesos for:

managed service deployment
user applications execution

The instantiation of the high-available Mesos cluster is managed by the INDIGO Orchestrator in a fully automated way as soon as a user request described by a TOSCA template is submitted. Once the cluster is up and running, it can be re-used for successive requests.

Mesos is able to manage cluster resources (cpu, mem) providing isolation and sharing across distributed applications (frameworks)

Marathon and Chronos are two powerful frameworks that can be deployed on top of a Mesos Cluster.

Sophisticated two-level scheduling and efficient resource isolation are the key-features of the Mesos middleware that are exploited in the INDIGO PaaS, in order to run different workloads (long-running services, batch jobs, etc) on the same resources while preserving isolation and prioritizing their execution.

INDIGO PaaS uses:

Marathon to deploy, monitor and scale Long-Running services, ensuring that they are always up and running.
Chronos to run user applications (jobs), taking care of fetching input data, handling dependencies among jobs, rescheduling failed jobs.

Features

Automatic deployment through Ansible recipes embedded in TOSCA and HOT templates
- All the services run in docker containers;
High-availability of the cluster components:
- Leader election among master nodes managed by Zookeeper;
- HA Load-balancing;
  -Service discovery through Consul that provides also DNS functionality and health checks;
- services are automatically registered in Consul as soon as they are deployed on the cluster
The external access to the deployed services is ensured through load-balancers in HA (unique entrypoint: cluster Virtual IP)
Cluster elasticity and application auto-scaling through CLUES plugin
GPU support

INDIGO achievements

Ansible roles and TOSCA templates for cluster set-up featuring high-availability, service-discovery and load-balancing;
Integration with the INDIGO Orchestrator
- Job submission and service deployment requests are described through TOSCA templates
Definition of custom TOSCA types for describing Chronos jobs and Marathon application
Cluster elasticity through EC3/CLUES plugin
Zabbix monitoring probes for Mesos, Marathon and Chronos;

DEEP achievements

The Ansible roles and TOSCA templates have been extended in order to support the usage of GPUs.

Use-cases

The INDIGO components developed for Mesos (ansible roles, docker images, tosca custom-types and templates) have been used to support different uses-cases:

Lifewatch-Algaebloom for water quality modeling and analysis:
- this TOSCA template can be used to run processing jobs on a Mesos cluster through the Chronos framework;
Compact Muon Solenoid (CMS) analysis cluster on-demand:
- this TOSCA template can be used to deploy a complete cluster for the execution of HTCondor workload management system;
Dariah Zenodo-based repository in the cloud using Marathon:
- this TOSCA template can be used to deploy the DARIAH Zenodo-based repository in the cloud: all the services are run as Marathon apps.

Architecture

The core components are:

Mesos cluster manager for efficient resource isolation and sharing across distributed services
Chronos a distributed task scheduler
Marathon for cluster management of long running containerized services
ZooKeeper used for leader election of the Mesos master, leader detection of the Mesos master by masters, agents and scheduler drivers, persisting Marathon/Chronos state information
Consul for service discovery
Docker container runtime
marathon-lb for managing HAProxy, by consuming Marathon's app state
Keepalived used for the high-availability of the cluster load-balancers

These components are distributed on the cluster nodes as shown in the diagram below.

Master nodes
- On every master node the following (dockerized) components run: zookeeper, mesos master, consul server, marathon, chronos
Slave nodes
- On every slave node the following (dockerized) components run: mesos slave, consul agent
Load-balancers
- On the two load-balancers the following (dockerized) components run: keepalived and marathon-lb. keepalived ensures the high-availability of the load-balancer managing the cluster Virtual IP.

Releases

Release

Component version

indigo_1

Mesos 0.28.0 Marathon 1.1.1 Chronos 2.4.0

indigo_2

Mesos 1.1.0 Marathon 1.4.1 Chronos 3.0.2

deep_1

Mesos 1.5.0 Marathon 1.5.6 Chronos 3.0.2 patched for GPU support

Automated deployment with ansible playbook

You can use this guide to deploy a Mesos cluster on a set of hosts using the following indigo-dc ansible roles:

indigo-dc.zookeeper:
- source: https://github.com/indigo-dc/ansible-role-zookeeper
indigo-dc.consul:
- source: https://github.com/indigo-dc/ansible-role-consul
indigo-dc.mesos:
- source: https://github.com/indigo-dc/ansible-role-mesos
indigo-dc.chronos:
- source: https://github.com/indigo-dc/ansible-role-chronos
indigo-dc.marathon:
- source: https://github.com/indigo-dc/ansible-role-marathon
indigo-dc.marathon-lb:
- source: https://github.com/indigo-dc/ansible-role-marathon-lb
indigo-dc.keepalived:
- source: https://github.com/indigo-dc/ansible-role-keepalived

These ansible roles are published on Ansible Galaxy and can be installed through ansible-galaxy command: ansible-galaxy install indigo-dc.rolename

Automated deployment (provisioning and configuration) with TOSCA

You can use this TOSCA template for setting up a complete Mesos cluster on Cloud resources.

References

Apache mesos
- Web site: http://mesos.apache.org/
- Documentation: http://mesos.apache.org/documentation/latest/
- Releases: http://mesos.apache.org/downloads/
- Code repo: https://github.com/apache/mesos
- Issue Tracker: https://issues.apache.org/jira/browse/MESOS
Marathon
- Web site: https://mesosphere.github.io/marathon/
- Documentation: https://mesosphere.github.io/marathon/docs/
- Releases: https://github.com/mesosphere/marathon/releases
- Code repo: https://github.com/mesosphere/marathon
- Issue Tracker: https://github.com/mesosphere/marathon/issues
Chronos
- Web site: https://mesos.github.io/chronos/
- Documentation: https://mesos.github.io/chronos/docs/
- Releases: https://github.com/mesos/chronos/releases
- Code repo: https://github.com/mesos/chronos
- Issue tracker: https://github.com/mesos/chronos/issues

Last updated 4 years ago

Mesos Cluster

Table of Contents