AirFlow Cron Management UI
1. Comprehensive Guide: Setting Up Apache Airflow on Ubuntu 24.04
This guide provides a full, step-by-step process for installing and configuring a stable Apache Airflow instance on an Ubuntu 24.04 AWS EC2 server. It uses a locally installed PostgreSQL database for the metadata backend and includes common troubleshooting solutions and best practices for a robust setup.2. Prerequisites
- An AWS account.
- A terminal or SSH client to connect to the EC2 instance.
- An SSH key pair configured in your AWS account.
3. Step 1: Launch and Configure AWS EC2 Instance
- Launch Instance: Navigate to the AWS EC2 console and launch a new instance.
- AMI: Select Ubuntu Server 24.04 LTS.
- Instance Type: Choose an instance with at least 4 GB of RAM. A t3.medium or larger is highly recommended.
- Key Pair: Assign your SSH key pair to the instance.
-
Security Group: Create a new security group with the following inbound rules:
- SSH (port 22): Source set to “My IP” for secure shell access.
- Custom TCP (port 8080): Source set to “My IP” to access the Airflow Web UI.
4. Step 2: Prepare the Ubuntu System
Update system packages and install Python & PostgreSQL dependencies:5. Step 3: Install and Configure PostgreSQL
5.1 Install PostgreSQL Server
5.2 Create Airflow Database and User
Switch to thepostgres user:
6. Step 4: Set Up and Install Apache Airflow
6.1 Set AIRFLOW_HOME Environment Variable
6.2 Create and Activate Python Virtual Environment
6.3 Install Airflow with Constraints
Using constraints prevents dependency conflicts.
7. Step 5: Configure and Initialize Airflow
7.1 Run db upgrade to generate airflow.cfg
7.2 Edit airflow.cfg
7.3 Run db upgrade again
8. Step 6: Create Admin User and Start Services
8.1 Create Admin User
8.2 Start Webserver and Scheduler
9. Step 7: Access the Airflow UI
Open your browser:10. Optional Production Setup: Using systemd
10.1 Create Webserver Service
10.2 Create Scheduler Service
10.3 Enable and Start Services
11. Part 4: Creating Your First DAGs
- Place DAG files in
~/airflow/dags/. - Restart services after creating/updating DAGs:
Example 1: Simple Cron Job DAG
File:~/airflow/dags/my_first_cron_dag.py
Example 2: DAG for Debugging Failures
File:~/airflow/dags/chaotic_testing_dag.py
12. Part 5: Troubleshooting Guide
Common Issues
-
DAG not appearing in UI
-
Restart Airflow services:
- Check for “Import Errors” in UI.
-
Restart Airflow services:
-
systemd status shows Active: activating (auto-restart)
- Cause: Wrong
ExecStartpath orUserin service file. - Solution: Correct paths, run
sudo systemctl daemon-reload.
- Cause: Wrong
-
Scheduler does not appear to be running
- Cause: Orphaned process on port 8793.
-
Solution:
-
PostgreSQL permission denied for schema public
-
Ensure airflow user owns database:
-
Ensure airflow user owns database:
-
Health check URL
- Endpoint:
/health - Returns 200 OK with JSON confirming metadatabase and scheduler health. Useful for AWS Target Group.
- Endpoint:
13. Important Links
- GitHub: Install and configure Apache Airflow on Ubuntu Server
- How to install Apache Airflow on Ubuntu/Debian Production Environment
This keeps all original content, commands, examples, DAGs, and troubleshooting steps intact while structuring the guide for clarity and readability.
