![]() Other minor tables, such as tables which store DAG code in different formats or information about import errors.įor many use cases you can access content from the metadata database in the Airflow UI or the stable REST API. ![]() Data about DAG and task runs which are generated by the scheduler.Information used in DAGs, like variables, connections and XComs.User login information and permissions.There are several types of metadata stored in the metadata database. With Airflow 2.3 the db downgrade command was added, providing an option to downgrade Airflow. For this reason, prior to Airflow 2.3 you should not downgrade your Airflow instance in place. The Astro CLI starts Airflow environments with a 1GB Postgres database.Ĭhanges to the Airflow metadata database configuration and its schema are very common and happen with almost every minor update. For reference, Apache Airflow uses a 2GB SQLite database by default, but this is intended for development purposes only. The size you need will depend heavily on the workloads running in your Airflow instance. Production environments typically use a managed database service, which includes features like autoscaling and automatic backups. You should also consider the size of your metadata database when setting up your Airflow environment. Astronomer uses Postgres for all of its Airflow environments, including local environments running with the Astro CLI and deployed environments on the cloud. While SQLite is the default on Apache Airflow, Postgres is by far the most common choice and is recommended for most use cases by the Airflow community. Any database supported by SQLAlchemy can theoretically be configured to host Airflow's metadata. See Airflow's components.Īirflow uses SQLAlchemy and Object Relational Mapping (ORM) in Python to connect with the metadata database from the application layer. To get the most out of this guide, you should have an understanding of: How to use the Airflow REST API to access the metadata database.Best practices for using the metadata database.Important content stored in the database.In this guide, you'll learn everything you need to know about the Airflow metadata database to ensure a healthy Airflow environment, including: As with any core Airflow component, having a backup and disaster recovery plan in place for the metadata database is essential. Losing data stored in the metadata database can both interfere with running DAGs and prevent you from accessing data for past DAG runs. It stores crucial information such as the configuration of your Airflow environment's roles and permissions, as well as all metadata for past and present DAG and task runs.Ī healthy metadata database is critical for your Airflow environment. The metadata database is a core component of Airflow. Understanding the Airflow metadata database
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |