Blog — Jowanza Joseph

Building a Real-Time Bike-Share Data Pipeline with StreamSets, Kafka and MapD

In this post, we will use the Ford GoBike Real-Time System, StreamSets Data Collector, Apache Kafka and MapD to create a real-time data pipeline of bike availability in the Ford GoBike bikeshare ecosystem. We’ll walk through the architecture and configuration that enables this data pipeline and share a simple auto-updating dashboard within MapD Immerse.

Data EngineeringJowanza JosephSeptember 10, 2018Data Engineering

Thoughts on IFTTT

A summary of some of the software architecture at IFTTT.

Data EngineeringJowanza JosephNovember 13, 2017Data Engineering

Thoughts on Dremio

A summary of my thoughts on Dremio and Apache Arrow.

Data EngineeringJowanza JosephNovember 6, 2017Data Engineering

The Spanner Paper

Spanner is a globally consistent, synchronously transactional database created by Google. Engineers and researchers at Google wrote a paper on the project and this post covers that paper.

Data EngineeringJowanza JosephOctober 2, 2017Data Engineering

Jathena: An Open Source Amazon Athena

Back in July, I presented the work I’ve been doing creating an open source version of Amazon Athena at Open West. You can find the slides at the end of this post as well as a link to the GitHub repo. The GitHub repo has all of the configurations you need and a walk through video of how to use it. This blog post serves as a written document of my journey and some of the gotchas I experienced along the way.

Data EngineeringJowanza JosephSeptember 5, 2017Data Engineering