Solutions using Spark

Check out solutions that use Spark here. Interested in something else? See the full list of technologies here.

Explore Relationships in Distributed Data Using DSE Graph

Project screenshot

For an enterprise level problem, you need an enterprise level solution. Datastax Enterprise (DSE) Graph allows you to traverse the data stored in your Cassandra DB, search it using Solr, visualize in DSE Studio and run analytics over Spark - all integrated within a single platform. View on Github

Build a Data-Intensive, Full Stack App from Start to Finish

Project screenshot

The strength of a full-stack data engineer is the ability to bring the entire stack together, coordinating all of your microservices and features into a single product. Any developer can slap a new feature onto your project, but if new features aren't seamlessly integrated into your project as a whole, they will run inefficiently and slow down future development. See how everything can work together, from data pipeline to web app to data visualization and user-facing search functionality. View on Github

ETL from Cassandra using Spark

Project screenshot

Cassandra DB performs writes fast and leaves read-heavy work to 3rd-party integrations. For example, Elassandra solves this with Elasticsearch and Datastax solves this with Solr and Spark (or even Graph depending on the use case). Of course, we could also integrate Cassandra with these same tools using open source connectors and drivers. Check out an example of how to extract your Cassandra into Spark for an ETL pipeline. View on Github

ETL Matlab data into PySpark

Project screenshot

Read .mat files into PySpark, then transform and display the data, all with Python within Zeppelin. After performing ETL using Spark, data is made useable for drawing conclusions and noticing patterns and anomalies. View on Github