In this post, we will use the Ford GoBike Real-Time System, StreamSets Data Collector, Apache Kafka and MapD to create a real-time data pipeline of bike availability in the Ford GoBike bikeshare ecosystem. We’ll walk through the architecture and configuration that enables this data pipeline and share a simple auto-updating dashboard within MapD Immerse.
Read MoreA summary of some of the software architecture at IFTTT.
Read MoreA summary of my thoughts on Dremio and Apache Arrow.
Read MoreSpanner is a globally consistent, synchronously transactional database created by Google. Engineers and researchers at Google wrote a paper on the project and this post covers that paper.
Read MoreBack in July, I presented the work I’ve been doing creating an open source version of Amazon Athena at Open West. You can find the slides at the end of this post as well as a link to the GitHub repo. The GitHub repo has all of the configurations you need and a walk through video of how to use it. This blog post serves as a written document of my journey and some of the gotchas I experienced along the way.
Read More