Installing Apache Airflow on Ubuntu 24.04: A Step-by-Step Guide

Apache Airflow is an open-source platform for managing workflows, particularly useful for automating data pipelines and processes like Extract, Transform, Load (ETL). It leverages Python-based Directed Acyclic Graphs (DAGs) to schedule tasks, handling dependencies to ensure seamless execution. This guide details how to install and configure Apache Airflow on Ubuntu 24.04, secure your setup, and verify functionality using a sample DAG.

Prerequisites

  • Access to an Ubuntu 24.04 server with a minimum of 4 GB RAM
  • A configured A record in your domain pointing to the server’s IP

Installing Apache Airflow on Ubuntu 24.04

Airflow is distributed as a Python package and can be installed via Pip. Follow these steps to set up Python, create a virtual environment, and install Airflow.

Step 1: Update Package Index

Step 2: Verify Python Installation

Expected output:

Python 3.12.3

Step 3: Install Python (If Needed)

$ sudo apt install python3

Step 4: Install Virtual Environment and Dependencies

$ sudo apt install python3-venv libpq-dev -y

Step 5: Create and Activate Virtual Environment

$ python3 -m venv airflow_env
$ source ~/airflow_env/bin/activate

The shell prompt should now indicate that the virtual environment is active:

(airflow_env) linuxuser@example:~$

Step 6: Install Apache Airflow with PostgreSQL Support

$ pip install apache-airflow[postgres] psycopg2

Step 7: Install PostgreSQL

$ sudo apt install postgresql postgresql-contrib
$ sudo systemctl start postgresql

Step 8: Configure PostgreSQL for Airflow

Example output:

psql (16.6 (Ubuntu 16.6-0ubuntu0.24.04.1))
Type “help” for help.
postgres=#

postgres=# CREATE USER airflow PASSWORD 'YourStrongPassword';
postgres=# CREATE DATABASE airflowdb;
postgres=# GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airflow;
postgres=# ALTER DATABASE airflowdb OWNER TO airflow;
postgres=# GRANT ALL ON SCHEMA public TO airflow;
postgres=# exit;

Step 9: Configure Airflow

$ nano ~/airflow/airflow.cfg

If the airflow directory doesn’t exist, run this command to generate the necessary structure:

$ airflow db init; airflow scheduler

Press Ctrl+C to stop the scheduler.

In airflow.cfg, update these values:

executor = LocalExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow:YourStrongPassword@localhost/airflowdb

Step 10: Initialize the Airflow Database

You should see output similar to:

DB: postgresql+psycopg2://airflow:***@localhost/airflow

[2025-01-05T23:58:36.808+0000] {migration.py:207} INFO – Context impl PostgresqlImpl.
[2025-01-05T23:58:36.809+0000] {migration.py:210} INFO – Will assume transactional DDL.
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running stamp_revision -> 5f2621c13b39
WARNI [airflow.models.crypto] empty cryptography key – values will not be stored encrypted.
Initialization done

Step 11: Create Admin User

$ airflow users create \
  --username admin \
  --password yourSuperSecretPassword \
  --firstname Admin \
  --lastname User \
  --role Admin \
  --email admin@example.com

Step 12: Start Airflow Services

Launch the web server on port 8080 and direct output to webserver.log:

$ nohup airflow webserver -p 8080 > webserver.log 2>&1 &

Start the scheduler and log output to scheduler.log:

$ nohup airflow scheduler > scheduler.log 2>&1 &

 

Set Up Nginx as a Reverse Proxy for Apache Airflow

Apache Airflow runs on port 8080 by default. To secure and expose it over HTTP or HTTPS, follow these steps using Nginx as a reverse proxy.

Step 1: Install Nginx

$ sudo apt install -y nginx

Step 2: Create Nginx Configuration for Airflow

$ sudo nano /etc/nginx/sites-available/airflow

Add this content to the configuration file (replace airflow.example.com with your actual domain):

server {
  listen 80;
  server_name airflow.example.com;

  location / {
    proxy_pass http://127.0.0.1:8080;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
  }
}

This configuration routes incoming requests on port 80 to Airflow’s default port 8080 via the specified domain.

Step 3: Enable the Virtual Host

$ sudo ln -s /etc/nginx/sites-available/airflow /etc/nginx/sites-enabled/

Step 4: Test and Reload Nginx

Expected output:

nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

$ sudo systemctl reload nginx

Step 5: Open Port 80 via Firewall

$ sudo ufw allow 80/tcp
$ sudo ufw reload

You can now open http://airflow.example.com in your browser to see the Airflow login page.

Use SSL to Secure Apache Airflow with HTTPS

To protect connections with SSL encryption, generate Let’s Encrypt certificates using Certbot and configure Nginx to use them.

Step 1: Install Certbot

$ sudo snap install --classic certbot

If Snap is missing, install it with the following command:

$ sudo apt install snapd -y

Step 2: Enable Certbot Globally

$ sudo ln -s /snap/bin/certbot /usr/bin/certbot

Step 3: Request an SSL Certificate

Replace airflow.example.com and admin@example.com with your domain and email.

$ sudo certbot --nginx --redirect -d airflow.example.com -m admin@example.com --agree-tos

Expected result:


Account registered.
Requesting a certificate for airflow.example.com
Successfully received certificate.
Certificate is saved at: /etc/letsencrypt/live/airflow.example.com/fullchain.pem
Key is saved at: /etc/letsencrypt/live/airflow.example.com/privkey.pem
This certificate expires on 2025-04-21.
These files will be updated when the certificate renews.
Certbot has set up a scheduled task to automatically renew this certificate.
Deploying certificate
Successfully deployed certificate for airflow.example.com to /etc/nginx/sites-enabled/airflow
Congratulations! You have successfully enabled HTTPS on https://airflow.example.com

Step 4: Test Automatic Renewal

$ sudo certbot renew --dry-run

Step 5: Restart Nginx

$ sudo systemctl restart nginx

 

Accessing the Apache Airflow Dashboard

To open the Airflow interface and begin working with DAGs on your server, follow the instructions below.

Launch your browser and go to:

https://airflow.example.com

Use the admin credentials you created earlier to sign in:

  • Username: admin
  • Password: yourSuperSecretPassword

Create and Execute a DAG in Apache Airflow

Use the following steps to build a simple Directed Acyclic Graph (DAG) and run it using the Apache Airflow interface.

Step 1: Set Up the DAG Directory

Step 2: Create a Sample Python DAG File

$ nano ~/airflow/dags/my_first_dag.py

Step 3: Add Code to Define the DAG

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime, timedelta

with DAG(
    'my_first_dag',
    start_date=datetime(2024, 1, 1),
    schedule_interval=timedelta(days=1),
    catchup=False
) as dag:
    def print_hello():
        print('Greetings from centron')

    hello_task = PythonOperator(
        task_id='hello_task',
        python_callable=print_hello
    )

The code above defines a simple DAG named my_first_dag that executes once per day and outputs a greeting message.

Step 4: Activate the DAG via the Web Interface

Open the Apache Airflow dashboard, go to the DAGs section, locate my_first_dag, enable it, and then manually trigger its execution.

Step 5: Monitor DAG Execution

Use the Graph View and Event Log tools in the Airflow UI to track task execution and diagnose issues.

Conclusion

You’ve successfully deployed Apache Airflow on Ubuntu 24.04, secured it using Nginx as a reverse proxy, and created a sample DAG to get started. Apache Airflow offers powerful workflow orchestration and can be customized to manage various automation needs in your environment.

For deeper insights and additional features, refer to the official Airflow documentation.

Source: vultr.com

Create a Free Account

Register now and get access to our Cloud Services.

Posts you might be interested in:

Moderne Hosting Services mit Cloud Server, Managed Server und skalierbarem Cloud Hosting für professionelle IT-Infrastrukturen

Install Ruby on Rails on Debian 12 – Complete Guide

This guide provides step-by-step instructions for installing and configuring the Cohere Toolkit on Ubuntu 24.04. It includes environment preparation, dependency setup, and key commands to run language models and implement Retrieval-Augmented Generation (RAG) workflows. Ideal for developers building AI applications or integrating large language models into their existing projects.

Moderne Hosting Services mit Cloud Server, Managed Server und skalierbarem Cloud Hosting für professionelle IT-Infrastrukturen

Install VeraCrypt on Ubuntu 24.04 for Secure Encryption

Security, Tutorial

This guide provides step-by-step instructions for installing and configuring the Cohere Toolkit on Ubuntu 24.04. It includes environment preparation, dependency setup, and key commands to run language models and implement Retrieval-Augmented Generation (RAG) workflows. Ideal for developers building AI applications or integrating large language models into their existing projects.