K8s-pod-operator is a very powerful operator in Airflow.This story tries to explain different ways, we can leverage k8s-pod-operator in airflow.
The idea behind writing this story is to be able to test and understand sealed secretes using minikube setup.
We all know default secrets in K8s are based on base64 encoding, which can be decoded easily, therefore we need to use a strong encryption for our secrets. Sealed secrets provides a mechanism to add another layer of security on top of base64 encoding, therefore making it much more secure.
Before we dive into sealed secrets, we need to setup some pre-requisites:
Let’s get started.
Once you’ve installed minikube, let’s kick
minikube start --cpus 4 --memory 8192
The goal behind writing this post is to have a local airflow-2.0 working environment in kubernetes.
A working setup of airflow 2.0 on kubernetes locally
Steps to setup environment for airflow on minikube
How to customize airflow for your need
Primarily intended for development use, the basic Airflow architecture with the Local and Sequential executors is an excellent starting point for understanding the architecture of Apache Airflow.
In order to run apache-airflow-2.0 locally on kubernetes, following pre-requisites needs to be installed. I was using mac, but you can do it on any linux based OS.
We all know Spark is powerful in processing massive amount of data. But how do we leverage it’s power when legacy data lies in relational databases.
Apache spark provides dataframe Api for reading/writing from relational database using jdbc just like other sources(Kafka, Cassandra etc.).
Note: Make sure that you include jdbc driver dependency in your project for your database.
E.g in this case dependencies += “mysql” % “mysql-connector-java” % “8.0.18”
Let’s say that we have a cluster of 3 executors and each executor with 2 cores.
You can easily test this from spark-shell too if you don’t feel like setting…