K8s-pod-operator is a very powerful operator in Airflow.This story tries to explain different ways, we can leverage k8s-pod-operator in airflow.

k8s-pod-operator

k8s-pod-operator

Using k8s python api

Using pod_template_file

Using both(k8s-python api and pod_template_file)


K8s-Sealed-Secrets

The idea behind writing this story is to be able to test and understand sealed secretes using minikube setup.

We all know default secrets in K8s are based on base64 encoding, which can be decoded easily, therefore we need to use a strong encryption for our secrets. Sealed secrets provides a mechanism to add another layer of security on top of base64 encoding, therefore making it much more secure.

Pre-requisite

Before we dive into sealed secrets, we need to setup some pre-requisites:

  1. minikube
  2. base64

Let’s get started.

Once you’ve installed minikube, let’s kick

minikube start --cpus 4 --memory 8192

Once…


The goal behind writing this post is to have a local airflow-2.0 working environment in kubernetes.

This story is for you if you’re looking for:

A working setup of airflow 2.0 on kubernetes locally

Steps to setup environment for airflow on minikube

How to customize airflow for your need

This story is not for you if you’re looking for:

Airflow concepts

Kubernetes concepts

Basic Airflow architecture

Primarily intended for development use, the basic Airflow architecture with the Local and Sequential executors is an excellent starting point for understanding the architecture of Apache Airflow.

Pre-requisite

In order to run apache-airflow-2.0 locally on kubernetes, following pre-requisites needs to be installed. I was using mac, but you can do it on any linux based OS.

  1. Docker


We all know Spark is powerful in processing massive amount of data. But how do we leverage it’s power when legacy data lies in relational databases.

Apache spark provides dataframe Api for reading/writing from relational database using jdbc just like other sources(Kafka, Cassandra etc.).

Reading from relational Database using spark dataframe api.

Note: Make sure that you include jdbc driver dependency in your project for your database.

E.g in this case dependencies += “mysql” % “mysql-connector-java” % “8.0.18”

Let’s say that we have a cluster of 3 executors and each executor with 2 cores.

You can easily test this from spark-shell too if you don’t feel like setting…

sharad mishra

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store