Running Airflow-2.0 on Kubernetes locally using Minikube
The goal behind writing this post is to have a local airflow-2.0 working environment in kubernetes.
This story is for you if you’re looking for:
A working setup of airflow 2.0 on kubernetes locally
Steps to setup environment for airflow on minikube
How to customize airflow for your need
This story is not for you if you’re looking for:
Airflow concepts
Kubernetes concepts
Basic Airflow architecture
Primarily intended for development use, the basic Airflow architecture with the Local and Sequential executors is an excellent starting point for understanding the architecture of Apache Airflow.
Pre-requisite
In order to run apache-airflow-2.0 locally on kubernetes, following pre-requisites needs to be installed. I was using mac, but you can do it on any linux based OS.
Note: Once docker is installed, please make sure to increase resources for docker to have at-least 4–8GB of memory and 4 cpus.
Start-minikube
Once minikube is installed.Start minikube with following command:
minikube start --cpus 4 --memory 8192
Fork-repository
Clone repository to your local
git clone https://github.com/imsharadmishra/airflow-2.0.git
Build-airflow-2.0-docker-image
I have built a custom airflow-2.0 image with basic tools e.g. procps, vim etc . These are handy for investigating issues in your container. Please make sure to remove them when you build production image.You can also build a custom image for your need, here is a very good document on how to built custom image: Airflow-documentation.
cd airflow-2.0/chart/dockerfiles/customairflow-2.0
docker build -t airflowcustom:2.0.1rc2 .
Install-chart for airflow-2.0
I have borrowed most of the components of this chart from official airflow repository. I have overridden the image used by chart to my custom made airflow image. Also I am overriding the default user that would be created in airflow backend database. This user we’ll use later while logging into airflow webui.
cd ../../
kubectl create namespace airflow
helm install airflow . \
--set images.airflow.repository=sharadmishra/airflowcustom \
--set images.airflow.tag=2.0.1rc2 \
--set webserver.defaultUser.enabled=true \
--set webserver.defaultUser.role=Admin \
--set webserver.defaultUser.username=airflow \
--set webserver.defaultUser.firstName=abc \
--set webserver.defaultUser.lastName=xyz \
--set webserver.defaultUser.email=abc@xyz.com \
--set webserver.defaultUser.password=airflow \
--set executor=KubernetesExecutor \
--namespace airflow
Dashboard-minikube
Start minikube dashboard by following command:
minikube dashboard
Airflow-WebUI
In order to login to airflow-webui, use the same credentials that we overridden during installation of helm chart e.g. in my example I have used
user: airflow
password: airflow
minikube service airflow-webserver --namespace airflow
Peek at Backend Database
Let’s take a look at airflow metadata that it maintains in database. You can choose to have any relational database as backend database e.g. mysql, postgres etc. In this case I have postgres as my backend database.
In order to access this database, exec into pod and access the database using
psql -Upostgres
User: postgres
Password: postgres
Trigger a DAG
Once the dag is complete, you can also take a look at the logs and details e.g. Task Duration, log, status of the task and dag.
Issues I encountered
*** Trying to get logs (last 100 lines) from worker pod ****** Unable to fetch logs from worker pod ***
(400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Sun, 07 Feb 2021 06:50:24 GMT', 'Content-Length': '136'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"name must be provided","reason":"BadRequest","code":400}\n'
In order to resolve this issue I granted additional verbs to Airflow Pod Reader Role. Changes can be find here https://github.com/imsharadmishra/airflow-2.0/blob/ade81c777597cf577b7d44bb471f1fd79adcfed6/chart/templates/rbac/pod-log-reader-role.yaml#L53
2. DAGS are in running state indefinitely and not progressing. One of the reason for this issue is that airflow image is not compatible with chart. In order to troubleshoot the issue, I would suggest to keep an eye on logs of scheduler, webserver container.