Solutions using Batch ETL

Check out solutions that use Batch ETL here. Interested in something else? See the full list of technologies here.

Build a Data-Intensive, Full Stack App from Start to Finish

Project screenshot

The strength of a full-stack data engineer is the ability to bring the entire stack together, coordinating all of your microservices and features into a single product. Any developer can slap a new feature onto your project, but if new features aren't seamlessly integrated into your project as a whole, they will run inefficiently and slow down future development. See how everything can work together, from data pipeline to web app to data visualization and user-facing search functionality. View on Github

Extract Actionable Steps from your Data

Project screenshot

After collecting your data, you are going to want to use it. In this project I demonstrate how to run batch jobs that audit content marketing websites for broken links, missed opportunities, and rooms for growth, all derived from Google Analytics and displayed in easy to use UIs. View on Github

ETL Matlab data into PySpark

Project screenshot

Read .mat files into PySpark, then transform and display the data, all with Python within Zeppelin. After performing ETL using Spark, data is made useable for drawing conclusions and noticing patterns and anomalies. View on Github

Run Distributed Tasks Using Airflow

Project screenshot

Airflow makes it easy to manage complicated data pipelines and run them in a distributed cluster. Airflow does this by letting users create DAGs that run and track batch jobs as they run across multiple stages. In this demo, I hook up Airflow running on a Docker container to my podcast analysis tool. View on Github