Cloud  

How to Schedule Automated Workflows Using Apache Airflow

What is Apache Airflow?

Apache Airflow is an open-source tool that helps you automate, schedule, and monitor workflows, a set of tasks that need to run in a specific order.

Think of it like this:

  • You define tasks (e.g., sending an email, cleaning data).
  • You schedule them (e.g., run every day at 6 PM).
  • Airflow makes sure they run in order, retries if they fail, and shows logs and status.

What is the Airflow Scheduler?

  • Reads your workflows (called DAGs)
  • Checks if it’s time to run any task
  • Sends the task to workers for execution

You don’t write code to create the Scheduler, but you write DAGs that the Scheduler reads.

Step-by-Step: How to Use Apache Airflow

1. Install Airflow

Use the official method with pip. Run this in your terminal:

pip install apache-airflow

Set up Airflow environment:

airflow db init

Create a user:

airflow users create \
    --username PB_Divyansh \
    --firstname Divyansh\
    --lastname Gupta\
    --role Admin \
    --email [email protected] \
    --password PBDivyansh@123

Start the services:

airflow webserver --port 8080

In a new terminal:

airflow scheduler

Now go to http://localhost:8080. This is your Airflow UI.

2. Create Your First DAG (Workflow)

Go to your DAGs folder (~/airflow/dags) and create a file:

daily_email_dag.py

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta

# Function to be scheduled
def send_email():
    print("✅ Email has been sent!")

# DAG settings
default_args = {
    'owner': 'airflow',
    'start_date': datetime(2024, 1, 1),
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}

# Create the DAG
with DAG(
    dag_id='daily_email_sender',
    default_args=default_args,
    schedule_interval='0 18 * * *',  # Every day at 6 PM
    catchup=False,
    description='A DAG to send daily emails',
    tags=['example'],
) as dag:

    email_task = PythonOperator(
        task_id='send_email_task',
        python_callable=send_email
    )

    email_task

3. Understand the Code

Section What It Does
send_email() A function that will run as your task
PythonOperator Runs your function
schedule_interval Tell Airflow to run this every day at 6 PM
dag_id Unique ID for your workflow
start_date When to start running

4. See It in Action

  • Go to http://localhost:8080.
  • Find the DAG named daily_email_sender.
  • Turn it ON (toggle switch).
  • You can click "Trigger DAG" to run it manually or wait for the schedule.
  • View logs to see the print output.

Common Schedule Examples

Schedule schedule_interval Value
Every day at midnight '@daily'
Every hour '@hourly'
Every 10 minutes '*/10 * * * *'
Every Monday '0 0 * * 1'
No schedule, manual only None

Conclusion

  • Airflow makes automation simple.
  • The Scheduler runs your tasks on time.
  • You define everything in Python using DAGs.
  • Airflow shows logs, retries on failure, and monitors workflows.

Happy coding !!