Lisp in Ruby, my two favorite languages together at last!


Lisp in Ruby, my two favorite languages together at last!
@mdb, @ALarkinDesign, @hexinteractive, and me reinventing ping pong after hours at #CIM
Innovating Ping Pong (by JohnRiv)
Angelique Martin hits the nail on the head. This is a great article for not only managers trying to hire software developers, but for software developers themselves.
The ideas of TDD, pairing, short iterations, and CI are something all modern developers should know and embrace.
Andrew Larkin (@ALarkinDesign), Comcast Interactive Media
NLP is a subset of artificial intelligence. We are trying to give an appearance of a machine that thinks.
NLP also mixes in cognitive science and linguistics to understand grammers.
NLP also needs statistics and probability to figure out.
Language does not always mean what we think it means. Human language has a lot of ambiguity, NLP has to do its best to limit that ambiguity.
NLP is a key component for better human-computer interaction. We tend to forget that WIMP is not a “natural” or intuitive way of interacting with the world.
There are different levels of interpreting text when dealing with natural language processing
The NLP pipeline goes: phonology -< Morphology -< Syntax -< Semantics -< Reasoning.
NLTK is a python library that you can use to interpret text!
NLTK has a huge corpora of text that have been organized and cut up for your use. Everything from presidential addresses to classical books. Amazing!
The brown corpus is VERY diverse. Check it out!
import nltk
from nltk.corpus import brown
brown.categories() # List out categories of pre-processed text in the brown corpus
brown.words(categories='news') # An array of ALL the words in the news category
brown.sents(categories='news') # An array of all the sentences in the news category
genre_words = [(genre, word)
for genre in ['news', 'romance']
for word in brown.words(categories=genre)
]
cfd = nltk.ConditionalFreqDist(genre_words) # All of the words in news and romance categories with their frequency in the text.
cfd.tabulate(samples=['Monday', 'Tuesday', 'Wednesday']) # A table of how often Monday, Tuesday, and Wednesday occurs in the news and romance categories.
WordNet is a like a thesaurus. It has a list of SynSets (Synonym Sets)
A SynSet provides a tree of specificity for words (e.g. a more specific “motor vehicle” is a “motorcar,” a less specific synonym is “artifact”)
A more specific synonym is called a hypernym (e.g. motorcar).
A less specific synonym is called a hyponym (e.g. artifact).
Using the lowest_common() function, you can compare SynSets to see how closely related two words are to each other.
tagged_sents = brown.tagged_sents(categories='news') # A list of tagged sentences.
size = int(len(tagged_sents) * 0.9)
train_sents = tagged_sents[:size] # Make a sample set of training sentences.
test_sents = tagged_sents[size:] # Make the rest a test set.
unigram_tagger = nltk.UnigramTagger(train_sents) # A unigram tagger looks at a single word and tries to assign meaning of that word, it does not look at the words around it.
unigram_tagger.evaluate(test_sents) # This tells us how accurate our tagger is.
t0 = ntlk.DefaultTagger('NN') # Default to nouns.
t1 = ntlk.UnigramTagger(train_sents, backoff=t0) # Tell the tagger that if it doesn't understand something, guess that it is a noun.
There are noun phrases and verb phrases that you can chunk text into.
Grammers and parsers take sentences and turn them into noun parts and verb parts.
NLP is a way to interpret language logically!
Check out the Stanford Online Course on Natural Language Processing. You can take it for free, it starts January 23rd!
Thank you the Andrew for the great talk!
Awesome talk by Dustin about continuations in Ruby. Check out the mailing list for his slides, and don’t miss his talk at RedSnake Philly next month on Feb.21st!
MailCatcher is a private SMTP server/client you can you to test your email without spamming people! Check it out if you are making custom email applications.
What is “Big Data?” Sometimes it isn’t always about size, sometimes it can be about CPU-bound processes that need to be processed, like Natural Language Processing.
NoSQL storage is all about BASE:
Cassandra was taken from Dynamo (Amazon’s Paxos implementation) and Google’s BigTable, and mixed it together. Facebook then released it open source.
Cassandra’s Data Model
This is a sparely populated data model, that means that you are able to add keys at will.
Cassandra’s Hash Ring implements the Paxos hash ring model. This allows you to distribute keys to various nodes in the hash ring, to solve for data replication and fast connections.
You can have multiple consistency levels: one, quorum, and all.
You can store anything you want in your column values. That is nice, so you can define your own schemas there without major constraints.
Hadoop is the Apache implementation of Google’s BigTable. To get info out of it, you have to write a map and reduce functions.
Solandra is a library that combines the Solr search library with Cassandra, so that your indexes are in Cassandra.
Because we LOVE Ruby!
Ruby is simple enough that you can give it to clients to write map/reduce jobs. This is NON-TRIVIAL in Java. A map/reduce in Java is about 500 lines of code, in Ruby, it is 22 lines.
Virgil is a REST client for Cassandra! Virgil let’s you create Cassandra models with HTTP PUT calls.
Virgil also has a GUI to allow you to look into your Cassandra DB with about 200 lines of ExtJS code.
With Virgil you get both CRUD functions and Map/Reduce in Cassandra for the first time.
“Use real-time systems for batch processing.”
Typhoeus is a concurrent HTTP client the runs really fast. This is a great gem to use for massive HTTP calls, like adding info to Cassandra through Virgil.
Redbridge is the JRuby implementation of JSR 223, which is what bridges Ruby to Java. You can use that to hook into Java through JRuby.
It is an old (deprecated) way to add meta-data to Cassandra, but it is deprecated. Don’t use it!
Storm is a way to do real-time processing with streams of data. Twitter uses this to push out all their data.
Thank you to Brian and the other speakers for the great info!
r38y:
In part one, I went over the essentials for looking professional on the internet. In this post, I want to help people who are new to the business world learn about some services that exist to make life easier. The best part? Most of these services are available for little to no cost.
…