Apache Airflow is a platform to programmatically author, schedule, and monitor workflows. It is widely used for orchestrating complex workflows and managing data pipelines. This guide will help you set up Airflow with Celery Executor and configure authentication.
- Python 3.9 or higher installed on your system.
pip(Python package manager) installed.
Install Apache Airflow with the Celery Executor using the following command:
pip install "apache-airflow[celery]==3.0.0" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.0.0/constraints-3.9.txt"Set the AIRFLOW_HOME Environment Variable Before running Airflow, set the AIRFLOW_HOME environment variable to specify the directory where Airflow will store its configuration and data:
export AIRFLOW_HOME=~/airflowYou can run this line in your shell/terminal.
-
Update the
auth_managersetting in yourairflow.cfgfile to use the FAB (Flask App Builder) authentication manager:auth_manager = airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManagerNote: This configuration is a work in progress and may require additional setup.
-
Alternatively, for Simple Auth, you can enable all users as admins by adding the following line to
airflow.cfg:simple_auth_manager_all_admins = True -
Add the following dag_bundle_config_list configuration to include multiple DAG folders to
airflow.cfg:dag_bundle_config_list = [ { "name": "dags-folder", "classpath": "airflow.dag_processing.bundles.local.LocalDagBundle", "kwargs": {} }, { "name": "data-engineering-dags-folder", "classpath": "airflow.dag_processing.bundles.local.LocalDagBundle", "kwargs": { "path": "<REPOSITORY>/Data-Engineering/" } } ]
Replace with the absolute path to your repository.
To create an admin user, run the following command:
airflow users create --role Admin --username admin --email admin@example.com --firstname Admin --lastname User --password adminIf you are using Simple Auth, you can find the generated passwords in the following file:
less $AIRFLOW_HOME/simple_auth_manager_passwords.json.generated-
Start the Airflow standalone server:
airflow standalone
-
Open your browser and navigate to:
http://localhost:8080 -
Log in using the credentials you created (e.g.,
username: admin,password: admin).
This section provides a quick overview of Airflow syntax, based on the airflow_etl.py example.
Airflow requires importing the DAG class and operators for defining tasks:
from airflow import DAG
from airflow.providers.standard.operators.bash import BashOperator
from airflow.providers.standard.operators.python import PythonOperator
from datetime import datetimeThe DAG object defines the workflow. Key parameters include:
dag_id: A unique identifier for the DAG.start_date: The date when the DAG starts running.schedule: The schedule for running the DAG (e.g., daily, hourly).catchup: Whether to backfill missed runs.
Example:
with DAG(
'etl_dag',
start_date=datetime(2025, 5, 1),
schedule=None,
catchup=False
) as dag:
...Tasks are defined using operators. For example:
- BashOperator: Executes shell commands.
- PythonOperator: Executes Python functions.
ensure_directories_op = BashOperator(
task_id='ensure_directories_task',
bash_command='mkdir -p /path/to/output /path/to/raw /path/to/processed'
)def extract_task():
extract(SOURCE_URL, RAW_FILE)
extract_op = PythonOperator(
task_id='extract_task',
python_callable=extract_task
)Define the order of execution using the >> operator:
ensure_directories_op >> extract_op >> transform_op >> load_opTo trigger the DAG manually:
airflow dags trigger etl_dag- If you encounter issues with authentication, ensure that the
auth_managersetting inairflow.cfgmatches your desired authentication method. - For advanced setups, consider integrating Airflow with LDAP, OAuth, or other authentication providers.