Solutions

Manage Datasets from Multiple External APIs

Project screenshot

It is one thing to perform ETL on data originating from a single data source, but it is quite another to work with data coming in from different sources who have different APIs, different data formats, different error handling policies...and that's just the start. Check out this example of how to manage data coming in from separate origins, which in this case was different social media platforms. View on Github

ETL Matlab data into PySpark

Project screenshot

Read .mat files into PySpark, then transform and display the data, all with Python within Zeppelin. After performing ETL using Spark, data is made useable for drawing conclusions and noticing patterns and anomalies. View on Github

Create Data Dashboards in React using Chart.js

Project screenshot

It can be hard for clients to care about your data unless they can see it. Chart.js provides an easy to use, yet powerful API to display your data quickly and elegantly. Check out an example of how to use it to display Google Analytics data View on Github

Run Distributed Tasks Using Airflow

Project screenshot

Airflow makes it easy to manage complicated data pipelines and run them in a distributed cluster. Airflow does this by letting users create DAGs that run and track batch jobs as they run across multiple stages. In this demo, I hook up Airflow running on a Docker container to my podcast analysis tool. View on Github