Back

Intuitively Implementing Levenshtein Distance

Mar 4, 2025

8 min

Have you ever wondered how tools like autocorrect can guess what word you meant to type or how Google manages to correct your search queries when you make a typo? While it may seem like magic, these processes are powered by simple yet effective algorithms. One popular variation of such an algorithm is known as the Levenshtein distance algorithm which measures how similar 2 strings are to each other. In this article, we’re going to be exploring how to approach string similarity in an intuitive manner and also how to implement Levenshtein distance from said intuition.

Intuition

The best way to start off thinking about string similarity is with an example. Consider the word “aegn.” Chances are, in an application, the user likely made a typo and intended to type “begin” rather than a completely unrelated word like “weaponized.” Why do we think this? Well the word “aegn” doesn’t exist, so it must be some sort of typo, and the word “begin” is visually much more similar to “aegn” than “weaponized.” While humans are very easily able to discern such differences from a glance, the key question in this situation is how a computer compute how similar 2 strings are to each other.

One simple approach is to simply compare how long the 2 strings are-the closer the lengths, the more similar the words might be. However, this approach quickly breaks down when comparing words of the same length, but with entirely different letters. For instance, using only length as a measure, the word “test” would appear just as similar to “fair” as the word “fair” is to itself which clearly isn’t correct.

A better approach involves going a step deeper and looking at the individual characters for the string itself. Specifically, seeing how many steps it takes to change one string to another. There are always going to be 2 major operations you need when changing a string, deleting a character and inserting a character. With these 2 operations you can change any string to another string. Take the words “farm” and “fair” for example. You can change “farm” to “fair” by deleting “m” and inserting “i” in between the “a” and “r” which would result in 2 total steps. An additional common operation to consider is substitution, where you insert and delete the character in the same position which for Levenshtein distance is going to count as 1 operation.

To find how similar each string is to an array of strings, you can simply calculate the minimum number of steps you would need to transform the string into every string in the array and then just return the string that takes the lowest number of steps. We can quickly verify this with an example. Take the string “aegn” and compare the similarity to the two strings “begin” and “weaponized.” To save you the time, the number of steps required to change “aegn” into “begin” is 2 while the number of steps required to change “aegn” into “weaponized” is 8. Intuitively we know that “begin” is much more similar and algorithmically we’ve just verified this.

Levenshtein distance

Now that we have an intuitive basis for understanding how to calculate similarity, the next step is creating an algorithm to implement it. Fortunately, some very smart people have already developed the Levenshtein distance algorithm, which does exactly what we need. As such, I’m going to skip over the process for developing the algorithm needed dive straight into implementation and understanding how Levenshtein distance works.

Levenshtein distance employs a technique called dynamic programming—a fancy term for an algorithm that stores and reuses previously computed results. There are several ways to implement this algorithm, but we’ll use an iterative approach with a 2D matrix. The idea is to construct a matrix where each cell represents the number of operations required to convert one substring into another.

Consider the words “farm” and “fair.” The initial matrix setup is going to look like the following:

fair
01234
f1
a2
r3
m4

Next, we fill in the matrix based on the following rule:

  • If the characters at the row and column indices match, no additional operations are needed (copy the value from the top-left diagonal cell).
  • Otherwise, take the minimum of the values from the top, left, and top-left diagonal cells, then add 1 to account for the edit operation (this step essentially takes the minimum of the insertion, deletion, and substitution that were previously calculated and stored in the matrix).

Filling in the matrix, we get:

fair
01234
f10123
a21012
r32111
m43222

The bottom-right cell (value 2) represents the minimum number of edits needed to transform “farm” into “fair.”

Now that we understand the algorithm, implementing it is trivial. Here’s a simple C++ implementation of the Levenshtein distance in less than 50 lines of code:

#include <algorithm>
#include <iostream>
#include <string>
#include <vector>

int main(int argc, char* argv[]) {
    if (argc < 3) {
        std::cerr << "Usage: " << argv[0] << " <string> <string>\n";
        return 1;
    }
    // initializing variables and 2d matrix
    std::string s1 = argv[1], s2 = argv[2];
    int m = s1.length(), n = s2.length();
    std::vector<std::vector<int>> dp(m+1, std::vector<int>(n+1, 0));
    for (int i = 0; i <= m; i++) {
        dp[i][0] = i;
    }
    for (int i = 0; i <= n; i++) {
        dp[0][i] = i;
    }
    // filling the matrix table
    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            if (s1[i-1] == s2[j-1]) {
                dp[i][j] = dp[i-1][j-1];
            } else {
                dp[i][j] = 1 + std::min({
                    dp[i-1][j-1],
                    dp[i-1][j],
                    dp[i][j-1]
                });
            }
        }
    }
    // outputting results
    std::cout << "Similarity Score: " << dp[m][n] << "\n";
    return 0;
}

Usages

Now that we’ve covered how the Levenshtein distance algorithm works, there are a ton of ways you can use it in your applications:

  • Search engines: Enhancing search queries by suggesting the most relevant matches.
  • Autocorrect and spell-checking: Identifying and correcting typos in real-time.
  • Product search: Helping users find products on a website even with typos.
  • File search tools: Quickly locating files on a system using fuzzy search techniques.

One popular tool that utilizes a similar fuzzy search approach that I highly recommend checking out is fzf. It uses some slightly different algorithms, but is nevertheless an incredibly useful tool that I use daily. You can also build your own variation of it using your own algorithms which you can find on my GitHub here (shameless plug LMAO).



Thanks for reading. Cya 👋.