http://www.cs.berkeley.edu/~jey/ampcamp6/training/data-exploration-using-spark.html "Recall from above when we described the format of the data set" -- I don't see any such description of the data format above.