setup airflow on MySQL
SQLite Database
https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#setting-up-a-sqlite-database
用于开发环境,有一些限制,只支持 序列执行器, 不能用作产品环境。
SQLite database can be used to run Airflow for development purpose as it does not require any database server (the database is stored in a local file). There are a few limitations of using the SQLite database (for example it only works with Sequential Executor) and it should NEVER be used for production.
MySQL Database
https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#setting-up-a-mysql-database
先建立数据库:
You need to create a database and a database user that Airflow will use to access this database. In the example below, a database
airflow_db
and user with usernameairflow_user
with passwordairflow_pass
will be createdCREATE DATABASE airflow_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; CREATE USER 'airflow_user' IDENTIFIED BY 'airflow_pass'; GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user';
设置数据库配置文件 explicit_defaults_for_timestamp=1
We rely on more strict ANSI SQL settings for MySQL in order to have sane defaults. Make sure to have specified
explicit_defaults_for_timestamp=1
option under[mysqld]
section in yourmy.cnf
file. You can also activate these options with the--explicit-defaults-for-timestamp
switch passed tomysqld
executable
安装python的客户端驱动库
pip install mysqlclient
We recommend using the
mysqlclient
driver and specifying it in your SqlAlchemy connection string.
设置DATABASE_URI
https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#database-uri
修改 airflow.cfg配置文件中的 sql_alchemy_conn
参数,或者配置环境变量AIRFLOW__CORE__SQL_ALCHEMY_CONN
。
Airflow uses SQLAlchemy to connect to the database, which requires you to configure the Database URL. You can do this in option
sql_alchemy_conn
in section[core]
. It is also common to configure this option withAIRFLOW__CORE__SQL_ALCHEMY_CONN
environment variable.
mysql+mysqldb://<user>:<password>@<host>[:<port>]/<dbname>
初始化数据库
https://stackoverflow.com/questions/61663681/configure-sql-server-in-airflow-with-sql-alchemy-conn
airflow initdb
另外一种客户端驱动
支持SSL免证书选项。
But we also support the
mysql-connector-python
driver, which lets you connect through SSL without any cert options provided.mysql+mysqlconnector://<user>:<password>@<host>[:<port>]/<dbname>
客户端驱动issue
如果驱动未安装或者安装不正确, 执行 airflow initdb 会报错
ModuleNotFoundError: No module named 'MySQLdb'
解决办法就是使用pip安装相应的驱动。
https://stackoverflow.com/questions/51245121/mysqldb-modulenotfounderror
Database Urls of SQLAlchemy
https://docs.sqlalchemy.org/en/14/core/engines.html#database-urls
The
create_engine()
function produces anEngine
object based on a URL. These URLs follow RFC-1738, and usually can include username, password, hostname, database name as well as optional keyword arguments for additional configuration. In some cases a file path is accepted, and in others a “data source name” replaces the “host” and “database” portions. The typical form of a database URL is:dialect+driver://username:password@host:port/database
Dialect names include the identifying name of the SQLAlchemy dialect, a name such as
sqlite
,mysql
,postgresql
,oracle
, ormssql
. The drivername is the name of the DBAPI to be used to connect to the database using all lowercase letters. If not specified, a “default” DBAPI will be imported if available - this default is typically the most widely known driver available for that backend.
The MySQL dialect uses mysql-python as the default DBAPI. There are many MySQL DBAPIs available, including MySQL-connector-python and OurSQL:
# default engine = create_engine('mysql://scott:tiger@localhost/foo') # mysqlclient (a maintained fork of MySQL-Python) engine = create_engine('mysql+mysqldb://scott:tiger@localhost/foo') # PyMySQL engine = create_engine('mysql+pymysql://scott:tiger@localhost/foo')
Excercise
https://github.com/fanqingsong/machine_learning_workflow_on_airflow
基于本地的调测环境,将此项目改造为基于MySQL数据库后台, 并在readme中记录过程。
相关文章
- MySQL触发器 trigger学习
- MySQL case when then 的使用方法
- MySQL 大数据量快速插入方法和语句优化
- 详解 Spotlight on MySQL监控MySQL服务器
- mysql的binlog恢复数据
- MySQL定义条件和处理程序
- cant connect to mysql server on
- mysql update select 从查询结果中更新数据
- Can't connect to MySQL server on localhost (10061)解决方法
- MySQL死锁分析一例
- MySQL中间件之ProxySQL(2):初试读写分离
- 转 MySQL active threads more than 40 on db3.***.com
- Docker容器启动时初始化Mysql数据库
- MySQL连接失败:10061错误,Can't connect to MySQL server on '127.0.0.1' (10061) (2003)
- MySQL TIMESTAMP(时间戳)详细解释
- (5.4)mysql高可用系列——MySQL异步复制(实践)
- (2.7)Mysql之SQL基础——表的操作与查看
- MySQL启动报错“Bind on TCP/IP port: Address already in use”
- 【mysql报错】Unsupported redo log format. The redo log was created with MariaDB 10.2.43.