Sentence probability and discounting surprises

I have recently begun experimenting with statistical models for representing the probability of a sentence or a phrase. As we know, the general idea is to have a model which – given a corpus – predicts how likely it is that a test sentence/phrase is likely to appear. In training our model, we usually assume that words we have “seen” are more likely to appear in the language that words which we haven’t “seen” yet. As a consequence, we would like our model to predict P(phrase containing seen words, therefore likely) > P(phrase containing unseen words, therefore unlikely) in most, if not all, cases. Continue reading “Sentence probability and discounting surprises”

Doing NLP with Ruby – Generating n-grams from a corpus

So the first post is about Natural Language Processing with Ruby. Since Python is so deeply entrenched as the langue de préférence for NLP thanks to such great toolkits as NLTK, you may ask why bother with something else. I don’t think this is the right place to engage in an ideological dispute over the superiority of one programming language over another – it’s clear that people work with whatever they like (or know) best. Continue reading “Doing NLP with Ruby – Generating n-grams from a corpus”

Hello, world!

I’ve decided to start this blog to share my experiences in AI and NLP. The idea is to help learners and practicioners like myself get through the elementary and slightly more advanced concepts in the coolest areas of computer science today.  Let’s discover them together!