This post is about how I used Alluxio to reduce p99 and p50 query latencies and optimized the overall platform costs for a distributed querying application. I walk through the product and architecture decisions that lead to our final architecture, discuss the tradeoffs, share some statistics on the improvements, and discuss future improvements to the system.
Read MoreA walkthrough of migrating a streaming workflow from AWS Kinesis and Spark Streaming to Apache Pulsar and Pulsar Functions.
Read MoreSome books I haven’t been able to read yet.
Read MoreA tutorial on serving Spark MLlib models with Apache OpenWhisk and MLeap.
Read MoreMy set up and critique of quantified self.
Read More