Distributed apps quickly get to the place where trying to debug using tail -f becomes untenable. However, ignoring your logs isn't an option. The ELK Stack (Elasticsearch, Logstash, and Kibana) is a go-to tool for managing your logs and making them help you rather than just taking up hard drive space. Unfortunately, it does not yet have out-of-the-box log processing or dashboards for Cassandra. Check out a way to extract meaningful information from your Cassandra logs here. View on Github
Data visualization is particularly valuable when it comes to graph processing. If you are using graph technology due to the connectivity within your data, then you will want to visualize the connectivity and show it to your clients. Vega, being built on D3.js, comes with a powerful API to do just that. View on Github
After collecting your data, you are going to want to use it. In this project I demonstrate how to run batch jobs that audit content marketing websites for broken links, missed opportunities, and rooms for growth, all derived from Google Analytics and displayed in easy to use UIs. View on Github
Kafka coordinates your data pipelines as a message broker that sits in the middle of your distributed infrastructure. Adding Kafka to your project can help make everything run smoothly and efficiently, with exactly-once guarantees, event playback, and streaming support out of the box. View on Github