top of page

N-Grams in Natural Language Processing

What are N-Grams?

N-Grams refers to sequence of words depending upon what value of N we take. If N is one then it is called a Unigram, if N is 2 then it is called bigram and so on.

Why N-Grams?

N-Grams is one of the simplest model in NLP . It makes use of previous N-1 words to predict new word. It is mainly used in texting,grammar correction,machine translation, speech recognition(previous word improves prediction strength of next work).

How do we we calculate N-Grams?

Let’s say you have to compute the probability of word ‘proved’ occurring after “The earth is not round was”. How do you calculate mathematically? Conditional Probability, right?

P(proved|The earth is not round was)

And to calculate this we first calculate the count (proved ∩ The earth is not round was) and count(The earth is not round was) . Imagine calculating it over all the history books. No less than a horrible dream,right? So to resolve this we come up with bigram P(proved| was) & trigram model P(proved|round was)where we just use previous one word and previous two words respectively rather than taking previous complete sentence. This is accurate to very good extent and was proved by Markov assumption which states that probability of a word depends on probability of limited history.

The general equation for N-Gram model to calculate probability of next word in sequence is below

An intuitive way to estimate probabilities is called maximum likelihood estimation or MLE. We get maximum likelihood estimation the MLE estimate for the parameters of an n-gram model by getting counts from a corpus, and normalizing the counts so that they lie between 0 and 1.

Let’s consider a small corpus and calculate unigram, bigram probabilities for the same

  • The boy ate an ice cream.

  • The girl bought an ice cream.

  • The girl then ate the ice cream.

  • The boy bought a toy.

Now calculate unigram probability for below sentence

  • The boy bought an ice cream

So we need to calculate probability of each word in above corpus.

<The> <boy> <bought> <an> <ice> <cream>

=5/24 * 2/24 * 2/24 *2/24 *3/24 * 3/24


Now let’s calculate bigram probability for the same sentence

<The boy> <boy bought> <bought an> <an ice> <ice cream>

=Cnt(The ∩ boy)/Cnt(The) * Cnt(boy ∩ bought)/Cnt(boy) * Cnt(bought ∩ an)/Cnt(bought) * Cnt(an ∩ ice)/Cnt(an) * Cnt(ice ∩ cream)/Cnt(ice)

=2/5 * 1/2 * 1/2 * 2/2 * 3/3


So as we observed while calculating the bigram probability of (W₁ W₂) we just calculate count of (W₁ W₂) occurring together and divide it by frequency of W₁. The same concept can be extended to trigrams and so on.

Author: Sahil Pahuja

2 views0 comments


bottom of page