What is Apache Airflow?
Apache Airflow is an open-source tool that helps you automate, schedule, and monitor workflows, a set of tasks that need to run in a specific order.
Think of it like this:
- You define tasks (e.g., sending an email, cleaning data).
- You schedule them (e.g., run every day at 6 PM).
- Airflow makes sure they run in order, retries if they fail, and shows logs and status.
What is the Airflow Scheduler?
- Reads your workflows (called DAGs)
- Checks if it’s time to run any task
- Sends the task to workers for execution
You don’t write code to create the Scheduler, but you write DAGs that the Scheduler reads.
Step-by-Step: How to Use Apache Airflow
1. Install Airflow
Use the official method with pip. Run this in your terminal:
pip install apache-airflow
Set up Airflow environment:
airflow db init
Create a user:
airflow users create \
--username PB_Divyansh \
--firstname Divyansh\
--lastname Gupta\
--role Admin \
--email [email protected] \
--password PBDivyansh@123
Start the services:
airflow webserver --port 8080
In a new terminal:
airflow scheduler
Now go to http://localhost:8080. This is your Airflow UI.
2. Create Your First DAG (Workflow)
Go to your DAGs folder (~/airflow/dags) and create a file:
daily_email_dag.py
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
# Function to be scheduled
def send_email():
print("✅ Email has been sent!")
# DAG settings
default_args = {
'owner': 'airflow',
'start_date': datetime(2024, 1, 1),
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
# Create the DAG
with DAG(
dag_id='daily_email_sender',
default_args=default_args,
schedule_interval='0 18 * * *', # Every day at 6 PM
catchup=False,
description='A DAG to send daily emails',
tags=['example'],
) as dag:
email_task = PythonOperator(
task_id='send_email_task',
python_callable=send_email
)
email_task
3. Understand the Code
Section |
What It Does |
send_email() |
A function that will run as your task |
PythonOperator |
Runs your function |
schedule_interval |
Tell Airflow to run this every day at 6 PM |
dag_id |
Unique ID for your workflow |
start_date |
When to start running |
4. See It in Action
- Go to http://localhost:8080.
- Find the DAG named daily_email_sender.
- Turn it ON (toggle switch).
- You can click "Trigger DAG" to run it manually or wait for the schedule.
- View logs to see the print output.
Common Schedule Examples
Schedule |
schedule_interval Value |
Every day at midnight |
'@daily' |
Every hour |
'@hourly' |
Every 10 minutes |
'*/10 * * * *' |
Every Monday |
'0 0 * * 1' |
No schedule, manual only |
None |
Conclusion
- Airflow makes automation simple.
- The Scheduler runs your tasks on time.
- You define everything in Python using DAGs.
- Airflow shows logs, retries on failure, and monitors workflows.
Happy coding !!