#phillyrb January 2012 Meeting
Awesome talk by Dustin about continuations in Ruby. Check out the mailing list for his slides, and don’t miss his talk at RedSnake Philly next month on Feb.21st!
MailCatcher is a private SMTP server/client you can you to test your email without spamming people! Check it out if you are making custom email applications.
Ruby on Big Data
What is “Big Data?” Sometimes it isn’t always about size, sometimes it can be about CPU-bound processes that need to be processed, like Natural Language Processing.
NoSQL storage is all about BASE:
- Basic Availability
- Soft-state
- Eventual consistency
Cassandra
Cassandra was taken from Dynamo (Amazon’s Paxos implementation) and Google’s BigTable, and mixed it together. Facebook then released it open source.
Cassandra’s Data Model
- Keyspaces
- Column Families
- Rows (Sorted by KEY!)
- Columns {Key: Value}
- Rows (Sorted by KEY!)
- Column Families
This is a sparely populated data model, that means that you are able to add keys at will.
Cassandra’s Hash Ring implements the Paxos hash ring model. This allows you to distribute keys to various nodes in the hash ring, to solve for data replication and fast connections.
You can have multiple consistency levels: one, quorum, and all.
- one: This will return right away, and replicate data later.
- quorum: This waits until there are n/2+1 nodes that have written your data, where n is the number of nodes.
- all: This waits until ALL nodes have written the data. This is the slowest, but most secure.
You can store anything you want in your column values. That is nice, so you can define your own schemas there without major constraints.
Hadoop
Hadoop is the Apache implementation of Google’s BigTable. To get info out of it, you have to write a map and reduce functions.
Solandra is a library that combines the Solr search library with Cassandra, so that your indexes are in Cassandra.
Why use Ruby for Big Data?
Because we LOVE Ruby!
Ruby is simple enough that you can give it to clients to write map/reduce jobs. This is NON-TRIVIAL in Java. A map/reduce in Java is about 500 lines of code, in Ruby, it is 22 lines.
Virgil
Virgil is a REST client for Cassandra! Virgil let’s you create Cassandra models with HTTP PUT calls.
Virgil also has a GUI to allow you to look into your Cassandra DB with about 200 lines of ExtJS code.
With Virgil you get both CRUD functions and Map/Reduce in Cassandra for the first time.
“Use real-time systems for batch processing.”
Typhoeus is a concurrent HTTP client the runs really fast. This is a great gem to use for massive HTTP calls, like adding info to Cassandra through Virgil.
Bridging the gap between Java and Ruby
Redbridge is the JRuby implementation of JSR 223, which is what bridges Ruby to Java. You can use that to hook into Java through JRuby.
Super Columns
It is an old (deprecated) way to add meta-data to Cassandra, but it is deprecated. Don’t use it!
Storm
Storm is a way to do real-time processing with streams of data. Twitter uses this to push out all their data.
Thank you to Brian and the other speakers for the great info!

