At the beginning of the year, I decided I wanted to speak at technical conferences and meetups. This decision was based on how much more I learn when I am tasked with explaining things to other people. It’s the same reason I like to blog and the same reason I liked to tutor when I was in college
Read MoreChoosing between Apache Kafka and Amazon Kinesis can be a difficult task. While some argue that they are inherently different systems, I think most of the differences are overstated. This post is highlights the differences I found most notable in the two platforms in aims to help those who are choosing between the two.
Read MoreFor the past few years, I’ve been an ardent believer in the promise of distributed in-memory data processing. The first time I took a Hive job, re-wrote it in Spark and saw it returned 10 times faster I was a believer
Read MoreThis post is a combination of notes I’ve taken on the subject over the last year. It’s about tools that make the trade more comfortable, some that make it more uncomfortable and stuff that the jury is still out on. I had a lot of fun writing this piece and while there isn’t any code it’s still technical.
Read MoreLike Spark, Flink is fairly overwhelming to get started with. This is mostly because of installations and run-time configurations. After a bunch of searching around and I was able to put together a decent starter SBT config for Flink. I used Intellij to work with Flink because of it’s complex API, the type hinting and other niceties come in pretty handy.
Read More