Post-Bigram Word Finder

AS
Ackshaey Singh
Difficulty Easy
Time Complexity
O(n)
Space Complexity
O(n)
String
Google

Picture yourself in a Google interview. The engineer asks: "Given a text and two specific words, how would you find every word that appears after this word pair?"

At first, this might sound like a natural language processing problem, something you'd need a library for. But here's the thing: it's actually a straightforward string parsing exercise. No fancy algorithms, no complex data structures. Just arrays, loops, and careful boundary checking. Let's walk through it.

TL;DR

Split the text into a word array, then slide a three-word window across it. At each position, check if the first two words match the target bigram; if they do, capture the third word. The loop bound i < words.length - 2 prevents index-out-of-bounds errors. This runs in O(n) time and O(n) space, collecting all matches including duplicates. The algorithm is a direct application of the sliding window technique that transfers to dozens of string and array problems in 2026 coding interviews.

Why This Problem Matters

Text pattern matching shows up constantly in real software systems. Search engines need to understand query context. Autocomplete predicts your next word based on the previous two. Log parsers hunt for specific error sequences. All of these rely on recognizing word patterns, and bigrams (two-word sequences) are the building block.

I've watched candidates breeze through this problem in five minutes, then struggle for an hour on more complex string problems. The difference? They understood the sliding window technique deeply, not just memorized a solution. Get comfortable with this pattern, and you'll have a mental model that transfers to dozens of string and array problems.

Understanding Bigrams

A bigram is simply a sequence of two adjacent words. In the sentence "alice is a good girl", the bigrams are:

  • "alice is"
  • "is a"
  • "a good"
  • "good girl"

In this problem, we're looking for every word that follows a specific bigram. Think of it like a "what comes next?" pattern detector.

Problem Understanding

Here's what we need to accomplish:

Given:

  • text: A string of lowercase words separated by single spaces
  • first: The first word of our target bigram
  • second: The second word of our target bigram

Find: Every word that immediately follows the pattern "first second"

Example 1:

text = "alice is a good girl she is a good student"
first = "a"
second = "good"
Output: ["girl", "student"]

Why? The bigram "a good" appears twice:

  • "a good girl" → capture "girl"
  • "a good student" → capture "student"

Example 2:

text = "we will we will rock you"
first = "we"
second = "will"
Output: ["we", "rock"]

Notice that "we" appears in the results! The bigram "we will" occurs twice:

  • "we will we" → capture "we"
  • "we will rock" → capture "rock"

This shows an important insight: the same word can appear multiple times in our results if the pattern repeats.

Edge Cases to Consider

What if the pattern never appears? Return an empty array. What if "first second" is the last two words? There's no third word to capture, so we move on. Can patterns overlap? Absolutely. "we will we" contains "we will" starting at position 0, and we should check every position. The pattern might appear once, twice, or not at all. Our algorithm needs to handle all these scenarios gracefully.

Solution Approach: The Three-Window Technique

Think of a three-word window sliding across the text:

Loading visualization...

At each position, we ask three questions. Does the first word match? Does the second? If both say yes, we've found our bigram. Grab the third word.

Let's trace through Example 1 step by step with our interactive visualization:

Loading visualization...

In the animation above, you can see the three-word window sliding through the array:

  • Blue window: Currently checking this position (no match)
  • Green window: Found a match! The first two words match our bigram
  • Green ring: The captured word (result)

Use the "Next" and "Previous" buttons to step through each iteration and see exactly how the algorithm finds both instances of the bigram "a good".

Result: ["girl", "student"]

Why We Loop to length - 2

This boundary condition is where most candidates stumble. We need to access three consecutive words: words[i], words[i+1], and words[i+2].

Loading visualization...

If our array has length 5 (as shown above):

  • Valid indices: 0 through 4
  • Last valid i: 2 (so we can access i+2 = 4)
  • Loop condition: i < words.length - 2 (which is i < 3)

This pattern holds for any array length. If we looped to length - 1 or length, we'd crash when checking words[i+1] or words[i+2]. In my interviews, this is the detail that separates candidates who truly understand the algorithm from those who memorized a solution. Draw it out - two minutes sketching saves ten minutes debugging.

Implementation

Prefer a different language? Jump to solutions in Python, JavaScript, C++, Go, and Scala.

Here's our solution with detailed explanations:

import java.util.*;

public class Solution {
  /**
   * Finds all occurrences of the word that follows a given bigram in the text.
   *
   * @param text the input text containing words separated by single spaces
   * @param first the first word of the bigram
   * @param second the second word of the bigram
   * @return an array of words that immediately follow the specified bigram
   */
  public String[] findOcurrences(String text, String first, String second) {
    // Step 1: Split the text into words using space as delimiter
    String[] words = text.split(" ");

    // Step 2: Create a dynamic list to collect matching words
    List<String> result = new ArrayList<>();

    // Step 3: Iterate through the array with our three-word window
    // We stop at length - 2 to safely access words[i], words[i+1], and words[i+2]
    for (int i = 0; i < words.length - 2; i++) {
      // Step 4: Check if current position matches our target bigram
      if (words[i].equals(first) && words[i + 1].equals(second)) {
        // Step 5: Capture the word that follows the bigram
        result.add(words[i + 2]);
      }
    }

    // Step 6: Convert our dynamic list to a fixed-size array
    return result.toArray(new String[result.size()]);
  }
}

Walking Through the Code

The first thing we do is break the text apart. Java's split() method is perfect for this. Give it a space, and it chops our string into an array of words:

String[] words = text.split(" ");
// "alice is a good girl" → ["alice", "is", "a", "good", "girl"]

Why an array? Because we need random access. We're going to be checking words[i], words[i+1], and words[i+2] repeatedly, and arrays make that instant.

Now we need somewhere to collect our matches. We don't know how many we'll find, so a fixed array won't work:

List<String> result = new ArrayList<>();

This gives us the flexibility to add matches as we discover them. Think of it like a basket where we'll toss in each word that follows our bigram pattern.

Here's where the magic happens. Our loop condition looks unusual at first glance:

for (int i = 0; i < words.length - 2; i++)

Why length - 2? Because we're about to access three positions: i, i+1, and i+2. If we went all the way to length - 1, we'd crash when checking the third position. This boundary keeps us safe.

Inside the loop, we check if we've found our target bigram:

if (words[i].equals(first) && words[i + 1].equals(second))

Notice we use .equals(), not ==. In Java, == compares object references, but we want to compare the actual text content. Both conditions must be true (first word matches and second word matches) before we capture anything.

When we find a match, we grab the word that follows:

result.add(words[i + 2]);

That third position is guaranteed to exist because of our loop boundary. No index errors possible.

Finally, we convert our dynamic list to a fixed-size array for the return type:

return result.toArray(new String[result.size()]);

If no matches were found, this naturally returns an empty array String[0], which is exactly what we want.

Complexity Analysis

Time Complexity: O(n)

We visit each word exactly once, performing the same checks regardless of whether we find a match. Split takes one pass through the string. The loop takes another pass through the array. Even the final conversion to an array is linear in the result size (which can't exceed n).

What about the string comparisons? Some candidates worry these might be expensive. In practice, word lengths in natural language are bounded (typically under 20 characters). We can treat each comparison as constant time for complexity analysis.

Add it all up: O(n) + O(n) + O(k) where k ≤ n. That simplifies to O(n) overall.

Space Complexity: O(n)

The dominant factor here is the words array, which holds every word from the input. That's unavoidable because we need to access these words randomly. Our result list could theoretically grow to size n if every triplet matches (imagine "a b c a b c a b c..."). The return array is the same size as the result list, so no additional asymptotic space there.

Bottom line: we're using O(n) space, and there's no way around it with this approach. In real-world scenarios, the result is typically much smaller than the input, but we analyze worst-case.

Common Pitfalls

Having reviewed hundreds of solutions to this problem, here are the mistakes I see most often:

1. Off-by-One Errors in Loop Boundary

// ❌ WRONG - ArrayIndexOutOfBoundsException!
for (int i = 0; i < words.length; i++) {
  if (words[i].equals(first) && words[i + 1].equals(second)) {
    result.add(words[i + 2]);  // Crash when i = length - 2 or length - 1
  }
}

// ✅ CORRECT
for (int i = 0; i < words.length - 2; i++) {
  // ...
}

Why it fails: When i = words.length - 1, accessing words[i + 1] throws an exception. When i = words.length - 2, accessing words[i + 2] throws an exception.

2. Using == Instead of .equals() for Strings

// ❌ WRONG - compares object references, not content
if (words[i] == first && words[i + 1] == second) {
  // ...
}

// ✅ CORRECT - compares string content
if (words[i].equals(first) && words[i + 1].equals(second)) {
  // ...
}

In Java, == checks if two variables reference the same object in memory. For strings, we almost always want .equals(), which compares the actual character sequences. I can spot candidates who've really practiced this problem versus those who memorized a solution. The difference? They can explain why we use .equals() without hesitation.

3. Forgetting to Handle Empty Results

// ❌ WRONG - returns null instead of empty array
if (result.isEmpty()) {
  return null;
}

// ✅ CORRECT - returns empty array
return result.toArray(new String[result.size()]);
// This naturally returns String[0] if result is empty

The problem specification expects an array, not null. An empty array [] correctly represents "no matches found."

4. Incorrect toArray() Usage

// ❌ WRONG - returns Object[] instead of String[]
return result.toArray();

// ✅ CORRECT - explicitly specifies String[] type
return result.toArray(new String[result.size()]);

Without the type parameter, toArray() returns Object[], which won't compile when the method signature expects String[].

5. Not Considering Duplicate Results

Some candidates try to use a HashSet thinking duplicates should be removed. But the problem explicitly wants all occurrences, including duplicates:

"we will we will rock you" → ["we", "rock"]
// "we" appears once in results because "we will" occurs twice

Interview Tips

When solving this problem in an interview:

1. Start with Clarifying Questions

  • "Should the result preserve the order of occurrences?" (Yes)
  • "Can the same word appear multiple times if the pattern repeats?" (Yes)
  • "What should I return if there are no matches?" (Empty array)
  • "Are the words guaranteed to be lowercase?" (Check constraints)
  • "Could first and second be the same word?" (Not explicitly forbidden)

💡 Pro tip: Always clarify the boundary conditions before coding. It shows you're thinking about edge cases, and you might learn something that simplifies your solution.

2. Explain Your Approach

"I'll split the text into words, then use a sliding window of three words. At each position, I'll check if the first two words match our target bigram, and if so, capture the third word. This gives us O(n) time complexity."

3. Walk Through an Example

Always trace through a simple example on paper or whiteboard:

"a b c d" with first="a", second="b"
↓
["a", "b", "c", "d"]
  i=0: a, b, c → MATCH → capture "c"
  i=1: b, c, d → no match
Result: ["c"]

4. Discuss Edge Cases

  • "If the bigram appears at the end, there's no third word to capture, which is why I loop to length - 2"
  • "If the text is shorter than three words, the loop won't execute and we'll return an empty array"

5. Mention Possible Follow-Ups

Show breadth by suggesting extensions:

  • "We could generalize this to n-grams (sequences of n words)"
  • "For very large texts, we might want to use a streaming approach"
  • "If we needed to search for many bigrams, we could preprocess with a HashMap"

Real-World Applications

You might wonder when you'd actually need this algorithm outside of interviews. Here are scenarios where bigram pattern matching shows up:

Log File Analysis: Your production logs show "error connecting to database connection timeout" repeatedly. You need to find what happens immediately after "connection timeout". Does it retry? Does it fail silently? Bigram matching gives you that answer.

Chatbot Context: When a user says "I want to order a pizza," the system needs to understand intent. By finding words that follow "order a", you can predict likely completions: "pizza", "burger", "salad". This powers contextual suggestions.

Code Analysis: Static analysis tools search for patterns like "static void" to find all static methods in a codebase. The word that follows tells you the method name.

Text Prediction: Your phone keyboard uses bigrams constantly. Type "thank you" and watch: it suggests "for", "very", or "so" based on millions of examples of what people typically write after "thank you".

Key Takeaways

  • Split the text into an array of words for O(1) random access, then slide a three-word window across it checking for bigram matches at each position.
  • The loop bound i < words.length - 2 is the critical detail; it guarantees safe access to words[i], words[i+1], and words[i+2] without index-out-of-bounds errors.
  • Use .equals() (not ==) for string comparison in Java; == compares object references while .equals() compares actual content.
  • Collect all matches including duplicates in an ArrayList, then convert to an array; do not use a HashSet because the problem requires preserving all occurrences.
  • This three-word window pattern generalizes to n-gram matching, making it a foundational technique for text processing and NLP problems in 2026 coding interviews.

Practice Makes Perfect

Once you're comfortable with this problem, push yourself with these variations:

  • N-gram extraction: Can you generalize this to find sequences of N consecutive words, not just two?
  • Bigram frequency: Instead of finding what follows, count how often each bigram appears in the text.
  • Trigram completion: Given two words, predict the most common third word based on historical patterns.
  • Pattern matching with wildcards: Find bigrams where you only know one of the two words.

Each of these builds on the sliding window technique you've learned here. The pattern stays the same, only the details change.

Ready to level up your coding interview skills? This problem and over 1,500 others are waiting for you on Firecode, where 50,000+ engineers have successfully prepared for technical interviews at top companies. Whether you're targeting Google, Amazon, or any other tech company, consistent practice with problems like these will build the pattern recognition and problem-solving skills that interviewers look for.

Key Takeaway

💡 The sliding window technique transforms pattern matching from complex to elegant. Remember: split the input, slide a fixed-size window across it, and watch your boundary conditions. This pattern shows up in dozens of string and array problems - master it once, use it everywhere.

The next time you encounter a sequence-matching problem, think "three-word window" and you'll be halfway to the solution. Happy coding! 🔍


Solutions in Other Languages

The sliding window technique works identically across all languages. Below are clean, production-ready implementations in Python, JavaScript, C++, Go, and Scala.

Python Solution

Python's clean syntax makes the sliding window pattern especially readable:

from typing import List

class Solution:
    def find_occurrences(self, text: str, first: str, second: str) -> List[str]:
        # Split text into words
        words = text.split()

        # Initialize result list
        result = []

        # Sliding window: check each consecutive triple
        for i in range(len(words) - 2):
            if words[i] == first and words[i + 1] == second:
                result.append(words[i + 2])

        return result

Key Python patterns:

  • split() with no argument splits on whitespace
  • List comprehension alternative: [words[i + 2] for i in range(len(words) - 2) if words[i] == first and words[i + 1] == second]
  • Type hints (List[str]) for better code documentation

JavaScript Solution

JavaScript offers concise array methods for this pattern:

class Solution {
  findOcurrences(text, first, second) {
    // Split text into array of words
    const words = text.split(' ');

    // Initialize result array
    const result = [];

    // Sliding window: check each consecutive triple
    for (let i = 0; i < words.length - 2; i++) {
      if (words[i] === first && words[i + 1] === second) {
        result.push(words[i + 2]);
      }
    }

    return result;
  }
}

Key JavaScript patterns:

  • Use === for strict equality (not ==)
  • const for immutable references, let for loop variables
  • Array push() method for adding elements
  • Functional alternative: words.slice(0, -2).filter((_, i) => words[i] === first && words[i + 1] === second).map((_, i) => words[i + 2])

C++ Solution

C++ requires explicit memory management but offers strong type safety:

#include <vector>
#include <string>
#include <sstream>

class Solution {
public:
  std::vector<std::string> findOcurrences(std::string text, std::string first, std::string second) {
    // Initialize result vector
    std::vector<std::string> result;

    // Split text into words using stringstream
    std::vector<std::string> words;
    std::istringstream iss(text);
    std::string word;

    while (iss >> word) {
      words.push_back(word);
    }

    // Sliding window: check each consecutive triple
    for (size_t i = 0; i + 2 < words.size(); ++i) {
      if (words[i] == first && words[i+1] == second) {
        result.push_back(words[i+2]);
      }
    }

    return result;
  }
};

Key C++ patterns:

  • std::istringstream for tokenizing strings by whitespace
  • size_t for array indices (unsigned type)
  • Loop condition i + 2 < words.size() avoids unsigned underflow
  • push_back() for vector append operation

Go Solution

Go's simple syntax and built-in string utilities make this straightforward:

package solution

import "strings"

func (s *Solution) FindOcurrences(text string, first string, second string) []string {
    // Split text into slice of words
    words := strings.Split(text, " ")

    // Initialize result slice
    res := make([]string, 0)

    // Sliding window: check each consecutive triple
    for i := 0; i < len(words)-2; i++ {
        if words[i] == first && words[i+1] == second {
            res = append(res, words[i+2])
        }
    }

    return res
}

Key Go patterns:

  • := for short variable declaration with type inference
  • strings.Split() from standard library
  • make([]string, 0) creates empty slice with zero length
  • append() for adding elements to slices

Scala Solution

Scala combines functional and object-oriented paradigms:

class Solution {
  def findOcurrences(text: String, first: String, second: String): Array[String] = {
    // Split text into array of words
    val words = text.split(" ")

    // Initialize mutable result buffer
    val result = scala.collection.mutable.ListBuffer[String]()

    // Sliding window: check each consecutive triple
    for (i <- 0 until words.length - 2) {
      if (words(i) == first && words(i + 1) == second) {
        result += words(i + 2)
      }
    }

    // Convert to array for return type
    result.toArray
  }
}

Key Scala patterns:

  • val for immutable values (like final in Java)
  • until for exclusive range (0 until 5 is 0,1,2,3,4)
  • ListBuffer for efficient mutable collection building
  • Functional alternative: words.sliding(3).collect { case Array(w1, w2, w3) if w1 == first && w2 == second => w3 }.toArray

All implementations share the same core algorithm:

  1. Split text into words/tokens
  2. Loop from i=0 to length-2 (three-word window)
  3. Check if words[i] and words[i+1] match the bigram
  4. If match, capture words[i+2]
  5. Return collected results

The key difference is language-specific idioms for string splitting, array/list operations, and type systems. Choose the language that best fits your project's requirements!

Frequently Asked Questions

What is a bigram in the context of text processing?
A bigram is a sequence of two adjacent words in a text. For example, in the sentence 'the quick brown fox', the bigrams are 'the quick', 'quick brown', and 'brown fox'. Bigrams are the simplest form of n-grams and are widely used in natural language processing for text prediction, spell checking, and search relevance ranking.
How does the sliding window approach work for bigram matching?
The algorithm splits the text into an array of words, then slides a three-word window across the array. At each position i, it checks if words[i] and words[i+1] match the target bigram. If both match, it captures words[i+2] as a result. The window advances one position at a time until it reaches the end of the array.
What is the time complexity of the post-bigram word finder?
The time complexity is O(n) where n is the length of the input text. The string split operation takes one pass through the text, and the sliding window loop takes one pass through the resulting word array. String comparisons are bounded by word length, which is effectively constant for natural language text.
What edge cases should you handle in the bigram word finder?
Handle these cases: the input text has fewer than three words (loop does not execute, return empty array), the bigram appears as the last two words with no following word (prevented by the loop bound i < length - 2), the bigram never appears (return empty array), and the bigram appears multiple times (collect all following words, including duplicates).
Why does the algorithm split the string into an array first?
Splitting into an array enables O(1) random access to any word by index, which is necessary for checking three consecutive words (words[i], words[i+1], words[i+2]) at each position. Without splitting, you would need to scan through the string character by character to find word boundaries, making the code more complex with no performance benefit.
How are bigrams used in natural language processing?
Bigrams power text prediction (your phone keyboard predicts the next word based on the previous two), machine translation (preserving word-pair context during translation), spam detection (flagging suspicious word patterns), sentiment analysis (capturing phrases like 'not good' vs 'very good'), and search engine ranking (matching multi-word query intent).
What problems are similar to post-bigram word finding?
Similar problems include n-gram extraction (generalizing to sequences of n words), substring pattern matching (KMP or Rabin-Karp algorithms), sliding window maximum/minimum on arrays, and finding repeated word sequences in text. All of these share the core pattern of scanning a fixed-size window across a sequence and checking conditions at each position.
What implementation tips help avoid common mistakes?
Always use the loop bound i < words.length - 2 to prevent array index out of bounds. In Java, use .equals() instead of == for string comparison. Return an empty array (not null) when no matches are found. Use a dynamic list (ArrayList) to collect results since the match count is unknown in advance, then convert to an array at the end.
What are real-world applications of bigram pattern matching?
Log file analysis uses bigram matching to find what happens after specific error sequences. Autocomplete systems predict your next word based on the previous two. Static code analysis tools search for patterns like 'static void' to catalog method signatures. Chatbots use bigram context to understand user intent and generate relevant responses.
What is the best interview strategy for solving the bigram word finder?
Clarify the requirements first: ask about duplicate handling, return order, and empty input. Explain the three-word sliding window approach and its O(n) time complexity before coding. Trace through a small example on the whiteboard showing the window position at each step. After coding, test with edge cases: text shorter than three words, no matches, and overlapping bigram occurrences.