Solutions

To better demonstrate the kind of work I have done and what I can do for your project, I pulled together some examples of solutions drawn from projects in my portfolio. You can also check out the source code for any of the projects on my Github.

Build a Data-Intensive, Full Stack App from Start to Finish

Project screenshot

The strength of a full-stack data engineer is the ability to bring the entire stack together, coordinating all of your microservices and features into a single product. Any developer can slap a new feature onto your project, but if new features aren't seamlessly integrated into your project as a whole, they will run inefficiently and slow down future development. See how everything can work together, from data pipeline to web app to data visualization and user-facing search functionality. View on Github

ETL from Cassandra using Spark

Project screenshot

Cassandra DB performs writes fast and leaves read-heavy work to 3rd-party integrations. For example, Elassandra solves this with Elasticsearch and Datastax solves this with Solr and Spark (or even Graph depending on the use case). Of course, we could also integrate Cassandra with these same tools using open source connectors and drivers. Check out an example of how to extract your Cassandra into Spark for an ETL pipeline. View on Github

Manage your Cassandra Logs using the ELK Stack

Project screenshot

Distributed apps quickly get to the place where trying to debug using tail -f becomes untenable. However, ignoring your logs isn't an option: to quote Jay Kreps' book I Heart Logs, "the humble log is an abstraction that lies at the heart of many systems, from NoSQL databases to cryptocurrencies. Even though most engineers don’t think much about them,...logs are worthy of your attention." The ELK Stack (Elasticsearch, Logstash, and Kibana) is a go-to tool for managing your logs and making them help you rather than just taking up hard drive space. Unfortunately, it does not yet have out-of-the-box log processing or dashboards for Cassandra. Check out a way to extract meaningful information from your Cassandra logs here. View on Github

Extract Actionable Steps from your Data

Project screenshot

After collecting your data, you are going to want to use it. In this project I demonstrate how to run batch jobs that audit content marketing websites for broken links, missed opportunities, and rooms for growth, all derived from Google Analytics and displayed in easy to use UIs. View on Github