Check out solutions that use Python here. Interested in something else? See the full list of technologies here.
The strength of a full-stack data engineer is the ability to bring the entire stack together, coordinating all of your microservices and features into a single product. Any developer can slap a new feature onto your project, but if new features aren't seamlessly integrated into your project as a whole, they will run inefficiently and slow down future development. See how everything can work together, from data pipeline to web app to data visualization and user-facing search functionality. View on Github
Read .mat files into PySpark, then transform and display the data, all with Python within Zeppelin. After performing ETL using Spark, data is made useable for drawing conclusions and noticing patterns and anomalies. View on Github
Your data won't help you if you don't know how to use it. One way is to allow end-users (or admins) to search through your data. Elassandra integrates Elasticsearch with your Cassandra DB for near instant search results and a REST API. One way to access that REST API is by connecting to it through a React app, with a Flask app server in the middle to handle requests. Click here to get an idea for how I can build a similar solution for your app. View on Github
Airflow makes it easy to manage complicated data pipelines and run them in a distributed cluster. Airflow does this by letting users create DAGs that run and track batch jobs as they run across multiple stages. In this demo, I hook up Airflow running on a Docker container to my podcast analysis tool. View on Github