项目作者: ahmadamoudi

项目描述 :
Data pipeline with Apache Airflow - Data Engineering Nanodegree (DEND) 5th Project
高级语言: Python
项目地址: git://github.com/ahmadamoudi/Data-Pipeline-with-Airflow-DEND.git


Data Pipeline with Airflow

A music streaming company, Sparkify, has decided that it is time to introduce more automation and monitoring to their data warehouse ETL pipelines and come to the conclusion that the best tool to achieve this is Apache Airflow.

Apache Airflow

Airflow is a platform to programmatically author, schedule, and monitor workflows.

Datasets

available in S3 contains JSON format:

  • Log data: s3://udacity-dend/log_data
  • Song data: s3://udacity-dend/song_data