Solutions

To better demonstrate the kind of work I have done and what I can do for your project, I pulled together some examples of solutions drawn from projects in my portfolio. You can also check out the source code for any of the projects on my Github.

Explore Relationships in Distributed Data Using DSE Graph

Project screenshot

For an enterprise level problem, you need an enterprise level solution. Datastax Enterprise (DSE) Graph allows you to traverse the data stored in your Cassandra DB, search it using Solr, visualize in DSE Studio and run analytics over Spark - all integrated within a single platform. View on Github

Build a Data-Intensive, Full Stack App from Start to Finish

Project screenshot

The strength of a full-stack data engineer is the ability to bring the entire stack together, coordinating all of your microservices and features into a single product. Any developer can slap a new feature onto your project, but if new features aren't seamlessly integrated into your project as a whole, they will run inefficiently and slow down future development. See how everything can work together, from data pipeline to web app to data visualization and user-facing search functionality. View on Github

ETL from Cassandra using Spark

Project screenshot

Cassandra DB performs writes fast and leaves read-heavy work to 3rd-party integrations. For example, Elassandra solves this with Elasticsearch and Datastax solves this with Solr and Spark (or even Graph depending on the use case). Of course, we could also integrate Cassandra with these same tools using open source connectors and drivers. Check out an example of how to extract your Cassandra into Spark for an ETL pipeline. View on Github

Traverse Your Data Using Graph

Project screenshot

Graph databases leverage the connections that exist in your data to provide further insight into what your data actually means. Janus Graph in particular can be connected to a Cassandra DB backend and Elasticsearch for indexing, making it an obvious choice when dealing with distributed data pipelines. View on Github