![kubernetes](enix/kubernetes.svg) Dealing with complex Kubernetes scenarios
Alexandre Buisine [@alexbuisine](https://twitter.com/alexbuisine)
![enix](enix/enix.svg) ![ovh](resources/OVHcloud_stacked_logo_fullcolor_RGB.png)
# Introduction Sometimes, an **on-the-shelf** Managed Kubernetes service **is not enough**. In these scenarios, you need solid Kubernetes **expertise and experience**.
# This presentation will talk about fully managed clusters, **application included**. ... as well as tailor-made clusters to meet **advanced & specific needs**. We will take **3 real life use cases** to illustrate usual challenges, and best practices to address them, while implementing custom Kubernetes.
# Usual Kubernetes implementations caveats
**Budget** & capacity planning
,
on-prem vs **cloud** vs hybrid
,
**pitfalls**
, integration
, **security**
,
shift in CI/CD paradigm
,
cloud native dev methodology
,
**observability**
# 3 use cases We will walk you through three use-cases to illustrate **how we address** a wide range of associated **challenges** : 1. MIGRATION FROM ON-PREM TO OVHCLOUD 2. RUNNING A PRIVATE AND FULLY MANAGED ELASTICSEARCH 3. DAY 2 OPERATIONS IN A MISSION-CRITICAL COMPLEX KUBERNETES CLUSTER
# USE CASE #1 :
Migration
from On-Prem
to OVHcloud
# UC1 : The client Is already comfortable with containers but **learning Kubernetes** in Dev. Runs multiple instances of its applications on **dedicated bare metal** machines. Envisions Kubernetes as a way to **cut costs** and modernize its infrastructure. Wants to eventually accelerate app deployment to improve **time-to-market**.
# UC1 : Our approach Empower the customer to be **autonomous** from a Kubernetes standpoint. **Transition** existing software **to Cloud Native** best practices : generic containers, simplified configuration, observability. Define the **hardware capacity plan** on OVHcloud to match production needs. Ensure **connectivity** between **on-prem and OVHcloud** platforms.
# UC1 challenge :
Kubernetes proficiency Customer staff needed to **ramp up their skills** on Kubernetes to meet **production** needs. We **trained** the whole team. We favored **instant communication** (i.e. Slack) to **stay close** during the migration phase and the following months.
# UC1 challenge :
Network extension Some of the **assets** were **not migrated** to the Cloud (IAM + critical databases). To make it work, we: - **interconnected** On-Prem & OVHcloud, extending the network with secured VPNs - implemented an **ambassador pattern** by leveraging "ExternalName" Services
# UC1 challenge :
Cloud native principles Most of the **applications** were already based on containers. However, they were **not ready** for Kubernetes in **production**. We reviewed Dockerfiles to ensure **generic and configurable** artefacts. We reviewed Helm charts in depth to ensure compliance with **best practices**.
# UC1 : Results * Infrastructure footprint reduced by 80% * 9 application instances consolidated * 50+ Deployments * 5 persons trained * POC delivered in one week * Migration completed in one month
# USE CASE #2 :
A private and fully managed ElasticSearch
# UC2 : The client Already runs a large ElasticSearch cluster. Must address **additional needs**:
SIEM, Machine Learning, Wazuh. Can't afford to take the time to hire and train additional engineers just for that task. Wants to **outsource management** while keeping **ownership and control** over data (sovereignty requirements).
# UC2 : Our approach * Review **existing** setup + **new** needs * Select the right architecture : **ECK** * Size the right infrastructure :
**OVHcloud bare metal** HG-1U * VMs for the k8s control plane * **large physical machines** for k8s worker nodes (easy to scale up) * Design, implement, then **operate**
# UC2 challenge :
Robust ingest pipeline Requirement: **never drop messages** between Beats and ElasticSearch (some messages refer to financial transactions). Solution: **always-on buffering** implemented by highly available Kafka + Zookeeper. Allows maintenance without losing messages. Messages can be replayed if needed.
# UC2 challenge : Performance and capacity The ingest volume is modest
(3-4K messages per second). However, **data retention**
and **query volume** are **very high**. We need a solution with both high performance and high capacity!
# UC2 storage solution To maximize ElasticSearch performance, **local storage** on dedicated bare metal with massive **IOPS** & disk **capacity** is preferred. We chose **Topolvm** as a local persistent volume provisioner in order to guarantee **native** NVMe performance and disk **quota** manamagement.
# Storage matters If your software stack requires very high I/O throughput and very low I/O latency, you may prefer physical bare metal disks. OVHcloud offers a **wide range of bare metal solutions** with customizable NVMe disks. OVHcloud also offers **“IOPS cloud instances”** with 1 to 4 physical dedicated NVMe disk for **flexibility** and ultimate **performance**. ![ovh](resources/OVHcloud_stacked_logo_fullcolor_RGB.png)
# UC2 : Results * 200 hosts pushing messages * 4 Beats per host
(filebeat, metricbeat, auditbeat, wazuh) * 100 GB of logs per day * 3-4K messages per second
(with room to grow) * 30 TB capacity
## Fine, y'all can do **ElasticSearch** ...
what about other ones? We know how to deploy, scale, and manage: * SQL: PostgreSQL, TiDB, MariaDB, Vitess * NoSQL : MongoDB, Couchbase, CouchDB * Buses : RabbitMQ, Kafka, Nats.io
# USE CASE #3 :
DAY 2 OPERATIONS
IN A MISSION-CRITICAL COMPLEX KUBERNETES CLUSTER
# UC3 : The client Recently moved **.NET workloads** to containers, and **already operates Kubernetes**. Applications connect to third party data hubs and are very **sensitive to network latency**. This requires multi-regions deployment. High availability and 24/7 operations are mandatory, because outages cause significant **loss of revenue**.
# UC3 : The applications Some components require
**ad-hoc rollout strategies**: * **Singleton** components
(must never run two at the same time) * **Stateful** components
(state must be saved and carried over)
# UC3 : Our approach **Review** existing containers, CI/CD,
and Kubernetes ressources. **Implement** changes and add
**missing bricks** for Day 2 operations : * Unified authentication * Centralized logging * Federated metrology * Layer 7 reverse proxy with PKI management
# UC3 challenge : geographical location Most applications **rely** on minimal **network latency**, requiring deployment on various OVHcloud locations. We favored **remote worker nodes** on each POP to reduce **operational costs**. We chose **kube-router** CNI, with IP/IP tunnels when routed through VPNs.
# Network matters Benchmark your provider's network performance and pricing to avoid surprises. OVHcloud offers a **worldwide multidatacenter private network** operated on its own dark fiber. OVHcloud offers **free and unlimited ingress and egress traffic** for private & public transit (between nodes and to/from the internet). ![ovh](resources/OVHcloud_stacked_logo_fullcolor_RGB.png)
# UC3 challenge :
12 environments To consolidate monitoring, **alerting rules** and dashboards, we **federated logs and metrics**. - **Graylog** with a dedicated ElasticSearch cluster and its remote Beats. - Each environment has its own
**non-persistent** Prometheus ... - ... sending to a Prometheus **federator**
with **Thanos** archiving.
# UC3 challenge :
Deployment pipeline High security requirements mandate the deployment of separate **Harbor** registries **per environment**. Devs **cannot push** container images directly to production registries. **Production** registries **pull from staging** registries.
# UC3 : Results * 6 regions * 12 secured zones (6 staging + 6 prod) * Up to 40 critical applications per zone * All logging and metrology federated for centralized alerting
# Enix + OVHcloud :
A one-stop shop K8s service provider Mix and match the right **OVHcloud solutions** ... * **Managed** Kubernetes, Registry, and Logs DataPlatform services * The **largest IaaS choice** (Bare metal, public cloud instances, and storage) * Reversible **opensource standards** & **predictable pricing** in 12 geographies
# Enix + OVHcloud :
A one-stop shop K8s service provider ... with **Enix expertise** & **E2E custom approach** * Cloud Native **strategy** & **360 audit** * Premium Kubernetes **training** * Architecture, **integration** & best practices * Implementation & **day 2 operations**