Running Airflow-2.0 on Kubernetes locally using Minikube

sharad mishra
4 min readFeb 8, 2021

The goal behind writing this post is to have a local airflow-2.0 working environment in kubernetes.

This story is for you if you’re looking for:

A working setup of airflow 2.0 on kubernetes locally

Steps to setup environment for airflow on minikube

How to customize airflow for your need

This story is not for you if you’re looking for:

Airflow concepts

Kubernetes concepts

Basic Airflow architecture

Primarily intended for development use, the basic Airflow architecture with the Local and Sequential executors is an excellent starting point for understanding the architecture of Apache Airflow.

Pre-requisite

In order to run apache-airflow-2.0 locally on kubernetes, following pre-requisites needs to be installed. I was using mac, but you can do it on any linux based OS.

  1. Docker
  2. Minikube
  3. Helm
  4. Git

Note: Once docker is installed, please make sure to increase resources for docker to have at-least 4–8GB of memory and 4 cpus.

Start-minikube

Once minikube is installed.Start minikube with following command:

minikube start --cpus 4 --memory 8192

Fork-repository

Clone repository to your local

git clone https://github.com/imsharadmishra/airflow-2.0.git

Build-airflow-2.0-docker-image

I have built a custom airflow-2.0 image with basic tools e.g. procps, vim etc . These are handy for investigating issues in your container. Please make sure to remove them when you build production image.You can also build a custom image for your need, here is a very good document on how to built custom image: Airflow-documentation.

cd airflow-2.0/chart/dockerfiles/customairflow-2.0
docker build -t airflowcustom:2.0.1rc2 .

Install-chart for airflow-2.0

I have borrowed most of the components of this chart from official airflow repository. I have overridden the image used by chart to my custom made airflow image. Also I am overriding the default user that would be created in airflow backend database. This user we’ll use later while logging into airflow webui.

cd ../../
kubectl create namespace airflow
helm install airflow . \
--set images.airflow.repository=sharadmishra/airflowcustom \
--set images.airflow.tag=2.0.1rc2 \
--set webserver.defaultUser.enabled=true \
--set webserver.defaultUser.role=Admin \
--set webserver.defaultUser.username=airflow \
--set webserver.defaultUser.firstName=abc \
--set webserver.defaultUser.lastName=xyz \
--set webserver.defaultUser.email=abc@xyz.com \
--set webserver.defaultUser.password=airflow \
--set executor=KubernetesExecutor \
--namespace airflow

Dashboard-minikube

Start minikube dashboard by following command:

minikube dashboard

Airflow-WebUI

In order to login to airflow-webui, use the same credentials that we overridden during installation of helm chart e.g. in my example I have used

user: airflow

password: airflow

minikube service airflow-webserver --namespace airflow

Peek at Backend Database

Let’s take a look at airflow metadata that it maintains in database. You can choose to have any relational database as backend database e.g. mysql, postgres etc. In this case I have postgres as my backend database.

In order to access this database, exec into pod and access the database using

psql -Upostgres

User: postgres

Password: postgres

Trigger a DAG

Once the dag is complete, you can also take a look at the logs and details e.g. Task Duration, log, status of the task and dag.

Issues I encountered

  1. Unable to fetch logs from worker pods.
*** Trying to get logs (last 100 lines) from worker pod  ****** Unable to fetch logs from worker pod  ***
(400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Sun, 07 Feb 2021 06:50:24 GMT', 'Content-Length': '136'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"name must be provided","reason":"BadRequest","code":400}\n'

In order to resolve this issue I granted additional verbs to Airflow Pod Reader Role. Changes can be find here https://github.com/imsharadmishra/airflow-2.0/blob/ade81c777597cf577b7d44bb471f1fd79adcfed6/chart/templates/rbac/pod-log-reader-role.yaml#L53

2. DAGS are in running state indefinitely and not progressing. One of the reason for this issue is that airflow image is not compatible with chart. In order to troubleshoot the issue, I would suggest to keep an eye on logs of scheduler, webserver container.

References

  1. Airflow-youtube
  2. Airflow-summit-Presentation
  3. Airflow-documentation
  4. https://medium.com/@ipeluffo/running-apache-airflow-locally-on-kubernetes-minikube-31f308e3247a
  5. https://github.com/imsharadmishra/airflow-2.0

--

--