My previous post was about a problem we often encounter when calculating the probability of a sentence. With out-of-vocabulary words the probability of our sentence is zero and we use smoothing and discounting tricks to get around this issue. Continue reading “Good-Turing discounting – what they don’t tell you in NLP books”
I have recently begun experimenting with statistical models for representing the probability of a sentence or a phrase. As we know, the general idea is to have a model which – given a corpus – predicts how likely it is that a test sentence/phrase is likely to appear. In training our model, we usually assume that words we have “seen” are more likely to appear in the language that words which we haven’t “seen” yet. As a consequence, we would like our model to predict P(phrase containing seen words, therefore likely) > P(phrase containing unseen words, therefore unlikely) in most, if not all, cases. Continue reading “Sentence probability and discounting surprises”
<UPDATED 18 Aug 2017>
I’ve uploaded the
first next version of Ruby Ngram Toolkit to my github account. The RNGTK is basically a class that makes ngram operations easier. Just include it in your project with
require_relative 'RNGTK.rb' and you’re good to go. Continue reading “Introducing RNGTK v. 0.94 – Ruby Ngram Toolkit”
So the first post is about Natural Language Processing with Ruby. Since Python is so deeply entrenched as the langue de préférence for NLP thanks to such great toolkits as NLTK, you may ask why bother with something else. I don’t think this is the right place to engage in an ideological dispute over the superiority of one programming language over another – it’s clear that people work with whatever they like (or know) best. Continue reading “Doing NLP with Ruby – Generating n-grams from a corpus”
I’ve decided to start this blog to share my experiences in AI and NLP. The idea is to help learners and practicioners like myself get through the elementary and slightly more advanced concepts in the coolest areas of computer science today. Let’s discover them together!