HTML Tables in R
A few weeks ago I saw Hadley Wickam tweet about trying Rvest. He mentioned something about it being perfect for those who like BeautifulSoup in Python. It caught my attention because I spent a month during the summer working on a project that required using Beautiful Soup to do some large web scraping automation. With Rvest, the R ecosystem has a robust toolset to handle web scraping now.
This post isn’t about the entire package, rather one function, the html_table() function. This function turns an HTML table into an R dataframe, which you can imagine will be useful for any number of reasons. Sure you could write your own function that does that, but I thought it was great that it came out of the box with this functionality. In this post I just grab data from the npm homepage, specifically the downloads of Node packaged modules over the last day, month and week and plot it using ggvis.
I grabbed the CSS selector for the table of interest and then called html_table() on it. From there I basically had strings rather than integers because of the way that counter is implemented on NPM.org. I just removed the white space and changed it to a numeric type and I was on my way. From there plotting in ggvis was cake.
I’m looking forward to grabbing more table data in this manner. Dealing with white space, and other string issues are mostly trivial in R and this will open up a good set of data to be easily when working with R. Furthermore ,there is so much else in this package that I hope to cover in the coming weeks.
U.S. Data with Statebins
Statebins is one of the few visualization libraries I’m excited about these days. It took me a while to get around to trying it but I can say confidently that I’ll be using this one a lot in the future.
A few months ago I saw a post about State Bins, an R package emulating these Washington Post charts. I wanted to put a quick blog post together utilizing but the framework to make a chart. I grabbed this union data from CNN.com and was on my way.
The code is straight forward. You can edit colors using color brewer and edit where the titles go as well. I think there is a lot of potential for the package and like I said before I’ll be presenting US State data this way more often.
Riding The FrontRunner to Work
For over four week now I’ve been riding the Frontrunner to work. The Frontrunner is a train that travels from Provo, Utah to Ogden, Utah making stops along the way. I’ve included a map for those who are not familiar with the area. I decided to take the Frontrunner for three reasons. (1) I have a free pass from work so I don’t absorb any of the cost of taking it. (2) I live about 25 miles away from work and driving there and back is non-trivial. (3) I wanted to use my commuting time better than staring at the road ahead of me and listening to new music. After driving to and from work for a week I calculated that I spent about 70 minutes* commuting and none of that time was spent writing, reading or doing anything besides listening to music and NPR (I leave work at peak traffic hours adding a lot of time to it). I was willing to add 30 minutes overall to the commute if that would guarantee I had time to do miscellaneous tasks that are important to me but I seldom find time to do outside of work. This list includes writing in my journal, reading inspirational material, reading the news, note taking and personal emails. The Frontrunner would theoretically accomplish this. It was 30 minutes on the train and 5-10 waiting for the train plus 10 riding the bus from the station to campus (Adobe Technology Campus). I decided to give it a try for a week and it’s extended to over a month now.
My Feelings So Far
The morning commute is perfect. I am alert after breakfast and exercise and I write in my journal everyday, check personal emails and read the news and other material. I pay full attention to what I am doing, there are no distractions. Every morning I’m happy I took the FrontRunner to work because while it took longer than driving, the morning feels much more productive.
My original plan for evening commute was to write code. After a week of trying this I found that I’m not really good at writing code on the train. I often need the internet for downloading files or making an update and the wi-fi is not reliable on the Frontrunner (Though it is nice to have internet when compared to other mass transit systems). Instead I have taken to writing. If my writing is work related I’ll try to document code I’ve been writing so I can spend less time at my desk doing that. I will also write blog posts on the train. I blog about programming, general topics like this one, about Chromecast and also about Hip-Hop. It’s hard to find time to write but the train ride home turns out to be ideal for this. Though its just 30 minutes I spend the entire time writing because nothing else is going on. I tend not to write a ton in my blogs so 30 minutes can easily get me a first draft.
Taking the Frontrunner has saved me a good deal of money. I save a full tank of gas per week by not driving to work, thats $35-$40 a week. That may seem like a trivial amount of money but I’ve funneled that back into eating better and saving more. I wont lie and say that I’m excited about the people on the train who are sneezing and coughing without covering their mouths. Nor am I always thrilled about delays or the limits of where I can go after work. I am happy with how I’ve figured out a way to make the most of it. As the winter months come along I’ll see how I feel about it, but I think this is my commute for as long as I work where I do.
* Takes about 100 minutes round trip on the front runner. Door-to-door.
A Look into my music listening with dplyr and plot.ly
I listen to a good quantity and variety of music. I work in a field where listening to music on the job is OK, so it’s not uncommon that I will listen to 5 hours of music in a day while working. I don’t listen to music as much while driving, because I listen to a lot of audio books and like to catch the news when I can. In any event, I recently came across an IFTTT recipe that would take tracks that were scrobbled (recorded) by Last.fm and put store it in a Google Spread Sheet. I forgot that I had this running over the last 6 months and there were over 2000 entries in there. I decided to take a look at the data and see if there were any interesting trends I could find in the data. the data is simple, the date, the name of the band, the song, a link to the album cover and a link to the songs Last.fm page.
First thing I needed to do was handle the data formatting in the date column: Just looking at it I didn’t imagine: March 23, 2014 at 03:30PM was going to be a date format that was going to make R very happy. With some regular expressions in Google sheets I got that done relatively easily. Now I just wanted to do some quick magic with dplyr to look at this data couple different ways.
I’m surprised Canibus is number one, but I was preparing for this podcast and Canibus does have hefty music catalog.
Count over months:
Day of the week:
Not surprisingly I listen to the majority of music on weekdays. I also like do a bit of it on Sundays. I do often spend Sunday mornings at my desk writing blogs like this one.
Overall, it was fun to look at this data in this way. It’s not sophisticated by any means but it’s an interesting break down and not very much code to do it. I would recommend using the Last.FM API over this IFTTT method just because that’s a cleaner way of doing it and you get a richer dataset. That being said, you can find out a lot from this method without much code. Dplyr is also the best thing that happened to R in a while.
Here’s my favorite album over the course of the last few months: Note: Some of the music services I use don’t work with last.fm on mobile and hence this set of data is just my listening at the computer. I do spend the majority of my day there so it’s not a bad representation of what I listen to.
Minimalism: Putting My Stuff To Use
In recent months I’ve seen a number of articles with critiques of those who adopt a “minimalist” lifestyle. While the idea of minimalism is hardly welled defined, you can sum it up by saying the minimalist lifestyle is one of reducing the stuff you own in an attempt to simplify life and focus less on material things. The two main criticisms I’ve seen are that (1) there is an element of privilege with a minimalist attitude, and (2) it’s expensive to try to implement minimalism in any meaningful way. This blog covers most of the thoughts that have been echoed by folks on the issue, as well as highlighting positive aspects of it.
I have somewhat of a natural tendency towards this lifestyle. I don’t like clutter and I am all about the utility of the things I own. I think this comes in part from growing up poor (Grew up on the crime side, the New York Times side) and reading Utilitarianism by John Stuart Mill as a teenager.
That being said, I find great joy in many of the things I own. I love the wealth of pots and pans I have to cook in, I enjoy having diversity in the clothes and shoes I wear, I enjoy watching programs on my television, I enjoy playing my guitar (I might play it for 20 minutes a week these days), I love my 3 pairs of headphones, and I even love my two smart phones and 2 messenger bags. Each of them has a purpose, and I use them in their respective element. I’m not actively trying to reduce the stuff I have, or get to a place where I feel I’ve reached some imaginary optimal amount of stuff. I just want to make sure I use the stuff I have.
Over the years I have taken care to buy nice things and take care of them. That being said, my interests have changed and some things I own I will never use again. Why hold onto those things? Why hold onto the purple button up I never wear? Why keep the 6 bottles of cologne I haven’t used in years? Why keep stuff I bought with the idea that “it’ll be cool to have some day?” I don’t want stuff to have stuff, I want stuff to put to use. Stuff that I find ascetically pleasing, stuff that i’ll file a police report if they were stolen. That’s all I mean by minimalism, and maybe that is more utilitarianism than anything else.
Here’s to getting more stuff I love and giving stuff I don’t use to people who will love them.
As an R User I’ve used ggplot2 for years and so I was naturally reluctant to give ggvis the attention it deserves. GGVIS is the new kid on the block and while I’ve seen a few articles written about why it’s cool, I still haven’t seen many blog posts utilizing it.
I decided to redo a graph I created for a blog post about Freemason membership in order to highlight some features of GGVIS and some struggles I’ve experienced with it.
The syntax is similar to ggplot2, I used the pipe operator (%>%) instead of the (+) operator because I just wanted to try something new. The one syntactic thing I found was the use of the (:=) operator. I can’t remember seeing that operator in the context of creating graphics, but it took only a short time to get used to.
I think the output of ggvis is elegant in its current form and will become increasingly elegant with time. I love the ability to embed an SVG of a plot and I also love the ability to add interactivity to the plots. There’s room to grow but where it’s at now is impressive.
I had two unexpected hiccups when using ggvis. The first was the offset of the text on the Y axis. With the larger numbers the text of the y-axis were automatically covered up by the values on the axis. I had to use the offset option (see below) in order to fix that issue.
Without off set:
With off set:
The second hiccup I had was with using years on the x axis. When I tried to use a date format I got some pretty wonky results. I followed along the documentation but that threw an unknown type error. When I kept it as an integer it worked as expected but the x axis has commas in it instead of looking like a year.
Overall I like ggvis, I think it’s a great addition to the graphics capabilities in R and It is visualization for the next generation. I’m planning on using it in upcoming projects.