Post-Bigram Word Finder
Picture yourself in a Google interview. The engineer asks: "Given a text and two specific words, how would you find every word that appears after this word pair?"
At first, this might sound like a natural language processing problem, something you'd need a library for. But here's the thing: it's actually a straightforward string parsing exercise. No fancy algorithms, no complex data structures. Just arrays, loops, and careful boundary checking. Let's walk through it.
TL;DR
Split the text into a word array, then slide a three-word window across it. At each position, check if the first two words match the target bigram; if they do, capture the third word. The loop bound i < words.length - 2 prevents index-out-of-bounds errors. This runs in O(n) time and O(n) space, collecting all matches including duplicates. The algorithm is a direct application of the sliding window technique that transfers to dozens of string and array problems in 2026 coding interviews.
Why This Problem Matters
Text pattern matching shows up constantly in real software systems. Search engines need to understand query context. Autocomplete predicts your next word based on the previous two. Log parsers hunt for specific error sequences. All of these rely on recognizing word patterns, and bigrams (two-word sequences) are the building block.
I've watched candidates breeze through this problem in five minutes, then struggle for an hour on more complex string problems. The difference? They understood the sliding window technique deeply, not just memorized a solution. Get comfortable with this pattern, and you'll have a mental model that transfers to dozens of string and array problems.
Understanding Bigrams
A bigram is simply a sequence of two adjacent words. In the sentence "alice is a good girl", the bigrams are:
- "alice is"
- "is a"
- "a good"
- "good girl"
In this problem, we're looking for every word that follows a specific bigram. Think of it like a "what comes next?" pattern detector.
Problem Understanding
Here's what we need to accomplish:
Given:
text: A string of lowercase words separated by single spacesfirst: The first word of our target bigramsecond: The second word of our target bigram
Find: Every word that immediately follows the pattern "first second"
Example 1:
text = "alice is a good girl she is a good student"
first = "a"
second = "good"
Output: ["girl", "student"]
Why? The bigram "a good" appears twice:
- "a good girl" → capture "girl"
- "a good student" → capture "student"
Example 2:
text = "we will we will rock you"
first = "we"
second = "will"
Output: ["we", "rock"]
Notice that "we" appears in the results! The bigram "we will" occurs twice:
- "we will we" → capture "we"
- "we will rock" → capture "rock"
This shows an important insight: the same word can appear multiple times in our results if the pattern repeats.
Edge Cases to Consider
What if the pattern never appears? Return an empty array. What if "first second" is the last two words? There's no third word to capture, so we move on. Can patterns overlap? Absolutely. "we will we" contains "we will" starting at position 0, and we should check every position. The pattern might appear once, twice, or not at all. Our algorithm needs to handle all these scenarios gracefully.
Solution Approach: The Three-Window Technique
Think of a three-word window sliding across the text:
Loading visualization...
At each position, we ask three questions. Does the first word match? Does the second? If both say yes, we've found our bigram. Grab the third word.
Let's trace through Example 1 step by step with our interactive visualization:
Loading visualization...
In the animation above, you can see the three-word window sliding through the array:
- Blue window: Currently checking this position (no match)
- Green window: Found a match! The first two words match our bigram
- Green ring: The captured word (result)
Use the "Next" and "Previous" buttons to step through each iteration and see exactly how the algorithm finds both instances of the bigram "a good".
Result: ["girl", "student"]
Why We Loop to length - 2
This boundary condition is where most candidates stumble. We need to access three consecutive words: words[i], words[i+1], and words[i+2].
Loading visualization...
If our array has length 5 (as shown above):
- Valid indices: 0 through 4
- Last valid
i: 2 (so we can access i+2 = 4) - Loop condition:
i < words.length - 2(which isi < 3)
This pattern holds for any array length. If we looped to length - 1 or length, we'd crash when checking words[i+1] or words[i+2]. In my interviews, this is the detail that separates candidates who truly understand the algorithm from those who memorized a solution. Draw it out - two minutes sketching saves ten minutes debugging.
Implementation
Prefer a different language? Jump to solutions in Python, JavaScript, C++, Go, and Scala.
Here's our solution with detailed explanations:
import java.util.*;
public class Solution {
/**
* Finds all occurrences of the word that follows a given bigram in the text.
*
* @param text the input text containing words separated by single spaces
* @param first the first word of the bigram
* @param second the second word of the bigram
* @return an array of words that immediately follow the specified bigram
*/
public String[] findOcurrences(String text, String first, String second) {
// Step 1: Split the text into words using space as delimiter
String[] words = text.split(" ");
// Step 2: Create a dynamic list to collect matching words
List<String> result = new ArrayList<>();
// Step 3: Iterate through the array with our three-word window
// We stop at length - 2 to safely access words[i], words[i+1], and words[i+2]
for (int i = 0; i < words.length - 2; i++) {
// Step 4: Check if current position matches our target bigram
if (words[i].equals(first) && words[i + 1].equals(second)) {
// Step 5: Capture the word that follows the bigram
result.add(words[i + 2]);
}
}
// Step 6: Convert our dynamic list to a fixed-size array
return result.toArray(new String[result.size()]);
}
}
Walking Through the Code
The first thing we do is break the text apart. Java's split() method is perfect for this. Give it a space, and it chops our string into an array of words:
String[] words = text.split(" ");
// "alice is a good girl" → ["alice", "is", "a", "good", "girl"]
Why an array? Because we need random access. We're going to be checking words[i], words[i+1], and words[i+2] repeatedly, and arrays make that instant.
Now we need somewhere to collect our matches. We don't know how many we'll find, so a fixed array won't work:
List<String> result = new ArrayList<>();
This gives us the flexibility to add matches as we discover them. Think of it like a basket where we'll toss in each word that follows our bigram pattern.
Here's where the magic happens. Our loop condition looks unusual at first glance:
for (int i = 0; i < words.length - 2; i++)
Why length - 2? Because we're about to access three positions: i, i+1, and i+2. If we went all the way to length - 1, we'd crash when checking the third position. This boundary keeps us safe.
Inside the loop, we check if we've found our target bigram:
if (words[i].equals(first) && words[i + 1].equals(second))
Notice we use .equals(), not ==. In Java, == compares object references, but we want to compare the actual text content. Both conditions must be true (first word matches and second word matches) before we capture anything.
When we find a match, we grab the word that follows:
result.add(words[i + 2]);
That third position is guaranteed to exist because of our loop boundary. No index errors possible.
Finally, we convert our dynamic list to a fixed-size array for the return type:
return result.toArray(new String[result.size()]);
If no matches were found, this naturally returns an empty array String[0], which is exactly what we want.
Complexity Analysis
Time Complexity: O(n)
We visit each word exactly once, performing the same checks regardless of whether we find a match. Split takes one pass through the string. The loop takes another pass through the array. Even the final conversion to an array is linear in the result size (which can't exceed n).
What about the string comparisons? Some candidates worry these might be expensive. In practice, word lengths in natural language are bounded (typically under 20 characters). We can treat each comparison as constant time for complexity analysis.
Add it all up: O(n) + O(n) + O(k) where k ≤ n. That simplifies to O(n) overall.
Space Complexity: O(n)
The dominant factor here is the words array, which holds every word from the input. That's unavoidable because we need to access these words randomly. Our result list could theoretically grow to size n if every triplet matches (imagine "a b c a b c a b c..."). The return array is the same size as the result list, so no additional asymptotic space there.
Bottom line: we're using O(n) space, and there's no way around it with this approach. In real-world scenarios, the result is typically much smaller than the input, but we analyze worst-case.
Common Pitfalls
Having reviewed hundreds of solutions to this problem, here are the mistakes I see most often:
1. Off-by-One Errors in Loop Boundary
// ❌ WRONG - ArrayIndexOutOfBoundsException!
for (int i = 0; i < words.length; i++) {
if (words[i].equals(first) && words[i + 1].equals(second)) {
result.add(words[i + 2]); // Crash when i = length - 2 or length - 1
}
}
// ✅ CORRECT
for (int i = 0; i < words.length - 2; i++) {
// ...
}
Why it fails: When i = words.length - 1, accessing words[i + 1] throws an exception. When i = words.length - 2, accessing words[i + 2] throws an exception.
2. Using == Instead of .equals() for Strings
// ❌ WRONG - compares object references, not content
if (words[i] == first && words[i + 1] == second) {
// ...
}
// ✅ CORRECT - compares string content
if (words[i].equals(first) && words[i + 1].equals(second)) {
// ...
}
In Java, == checks if two variables reference the same object in memory. For strings, we almost always want .equals(), which compares the actual character sequences. I can spot candidates who've really practiced this problem versus those who memorized a solution. The difference? They can explain why we use .equals() without hesitation.
3. Forgetting to Handle Empty Results
// ❌ WRONG - returns null instead of empty array
if (result.isEmpty()) {
return null;
}
// ✅ CORRECT - returns empty array
return result.toArray(new String[result.size()]);
// This naturally returns String[0] if result is empty
The problem specification expects an array, not null. An empty array [] correctly represents "no matches found."
4. Incorrect toArray() Usage
// ❌ WRONG - returns Object[] instead of String[]
return result.toArray();
// ✅ CORRECT - explicitly specifies String[] type
return result.toArray(new String[result.size()]);
Without the type parameter, toArray() returns Object[], which won't compile when the method signature expects String[].
5. Not Considering Duplicate Results
Some candidates try to use a HashSet thinking duplicates should be removed. But the problem explicitly wants all occurrences, including duplicates:
"we will we will rock you" → ["we", "rock"]
// "we" appears once in results because "we will" occurs twice
Interview Tips
When solving this problem in an interview:
1. Start with Clarifying Questions
- "Should the result preserve the order of occurrences?" (Yes)
- "Can the same word appear multiple times if the pattern repeats?" (Yes)
- "What should I return if there are no matches?" (Empty array)
- "Are the words guaranteed to be lowercase?" (Check constraints)
- "Could
firstandsecondbe the same word?" (Not explicitly forbidden)
💡 Pro tip: Always clarify the boundary conditions before coding. It shows you're thinking about edge cases, and you might learn something that simplifies your solution.
2. Explain Your Approach
"I'll split the text into words, then use a sliding window of three words. At each position, I'll check if the first two words match our target bigram, and if so, capture the third word. This gives us O(n) time complexity."
3. Walk Through an Example
Always trace through a simple example on paper or whiteboard:
"a b c d" with first="a", second="b"
↓
["a", "b", "c", "d"]
i=0: a, b, c → MATCH → capture "c"
i=1: b, c, d → no match
Result: ["c"]
4. Discuss Edge Cases
- "If the bigram appears at the end, there's no third word to capture, which is why I loop to
length - 2" - "If the text is shorter than three words, the loop won't execute and we'll return an empty array"
5. Mention Possible Follow-Ups
Show breadth by suggesting extensions:
- "We could generalize this to n-grams (sequences of n words)"
- "For very large texts, we might want to use a streaming approach"
- "If we needed to search for many bigrams, we could preprocess with a HashMap"
Real-World Applications
You might wonder when you'd actually need this algorithm outside of interviews. Here are scenarios where bigram pattern matching shows up:
Log File Analysis: Your production logs show "error connecting to database connection timeout" repeatedly. You need to find what happens immediately after "connection timeout". Does it retry? Does it fail silently? Bigram matching gives you that answer.
Chatbot Context: When a user says "I want to order a pizza," the system needs to understand intent. By finding words that follow "order a", you can predict likely completions: "pizza", "burger", "salad". This powers contextual suggestions.
Code Analysis: Static analysis tools search for patterns like "static void" to find all static methods in a codebase. The word that follows tells you the method name.
Text Prediction: Your phone keyboard uses bigrams constantly. Type "thank you" and watch: it suggests "for", "very", or "so" based on millions of examples of what people typically write after "thank you".
Key Takeaways
- Split the text into an array of words for O(1) random access, then slide a three-word window across it checking for bigram matches at each position.
- The loop bound
i < words.length - 2is the critical detail; it guarantees safe access towords[i],words[i+1], andwords[i+2]without index-out-of-bounds errors. - Use
.equals()(not==) for string comparison in Java;==compares object references while.equals()compares actual content. - Collect all matches including duplicates in an ArrayList, then convert to an array; do not use a HashSet because the problem requires preserving all occurrences.
- This three-word window pattern generalizes to n-gram matching, making it a foundational technique for text processing and NLP problems in 2026 coding interviews.
Practice Makes Perfect
Once you're comfortable with this problem, push yourself with these variations:
- N-gram extraction: Can you generalize this to find sequences of N consecutive words, not just two?
- Bigram frequency: Instead of finding what follows, count how often each bigram appears in the text.
- Trigram completion: Given two words, predict the most common third word based on historical patterns.
- Pattern matching with wildcards: Find bigrams where you only know one of the two words.
Each of these builds on the sliding window technique you've learned here. The pattern stays the same, only the details change.
Ready to level up your coding interview skills? This problem and over 1,500 others are waiting for you on Firecode, where 50,000+ engineers have successfully prepared for technical interviews at top companies. Whether you're targeting Google, Amazon, or any other tech company, consistent practice with problems like these will build the pattern recognition and problem-solving skills that interviewers look for.
Key Takeaway
💡 The sliding window technique transforms pattern matching from complex to elegant. Remember: split the input, slide a fixed-size window across it, and watch your boundary conditions. This pattern shows up in dozens of string and array problems - master it once, use it everywhere.
The next time you encounter a sequence-matching problem, think "three-word window" and you'll be halfway to the solution. Happy coding! 🔍
Solutions in Other Languages
The sliding window technique works identically across all languages. Below are clean, production-ready implementations in Python, JavaScript, C++, Go, and Scala.
Python Solution
Python's clean syntax makes the sliding window pattern especially readable:
from typing import List
class Solution:
def find_occurrences(self, text: str, first: str, second: str) -> List[str]:
# Split text into words
words = text.split()
# Initialize result list
result = []
# Sliding window: check each consecutive triple
for i in range(len(words) - 2):
if words[i] == first and words[i + 1] == second:
result.append(words[i + 2])
return result
Key Python patterns:
split()with no argument splits on whitespace- List comprehension alternative:
[words[i + 2] for i in range(len(words) - 2) if words[i] == first and words[i + 1] == second] - Type hints (
List[str]) for better code documentation
JavaScript Solution
JavaScript offers concise array methods for this pattern:
class Solution {
findOcurrences(text, first, second) {
// Split text into array of words
const words = text.split(' ');
// Initialize result array
const result = [];
// Sliding window: check each consecutive triple
for (let i = 0; i < words.length - 2; i++) {
if (words[i] === first && words[i + 1] === second) {
result.push(words[i + 2]);
}
}
return result;
}
}
Key JavaScript patterns:
- Use
===for strict equality (not==) constfor immutable references,letfor loop variables- Array
push()method for adding elements - Functional alternative:
words.slice(0, -2).filter((_, i) => words[i] === first && words[i + 1] === second).map((_, i) => words[i + 2])
C++ Solution
C++ requires explicit memory management but offers strong type safety:
#include <vector>
#include <string>
#include <sstream>
class Solution {
public:
std::vector<std::string> findOcurrences(std::string text, std::string first, std::string second) {
// Initialize result vector
std::vector<std::string> result;
// Split text into words using stringstream
std::vector<std::string> words;
std::istringstream iss(text);
std::string word;
while (iss >> word) {
words.push_back(word);
}
// Sliding window: check each consecutive triple
for (size_t i = 0; i + 2 < words.size(); ++i) {
if (words[i] == first && words[i+1] == second) {
result.push_back(words[i+2]);
}
}
return result;
}
};
Key C++ patterns:
std::istringstreamfor tokenizing strings by whitespacesize_tfor array indices (unsigned type)- Loop condition
i + 2 < words.size()avoids unsigned underflow push_back()for vector append operation
Go Solution
Go's simple syntax and built-in string utilities make this straightforward:
package solution
import "strings"
func (s *Solution) FindOcurrences(text string, first string, second string) []string {
// Split text into slice of words
words := strings.Split(text, " ")
// Initialize result slice
res := make([]string, 0)
// Sliding window: check each consecutive triple
for i := 0; i < len(words)-2; i++ {
if words[i] == first && words[i+1] == second {
res = append(res, words[i+2])
}
}
return res
}
Key Go patterns:
:=for short variable declaration with type inferencestrings.Split()from standard librarymake([]string, 0)creates empty slice with zero lengthappend()for adding elements to slices
Scala Solution
Scala combines functional and object-oriented paradigms:
class Solution {
def findOcurrences(text: String, first: String, second: String): Array[String] = {
// Split text into array of words
val words = text.split(" ")
// Initialize mutable result buffer
val result = scala.collection.mutable.ListBuffer[String]()
// Sliding window: check each consecutive triple
for (i <- 0 until words.length - 2) {
if (words(i) == first && words(i + 1) == second) {
result += words(i + 2)
}
}
// Convert to array for return type
result.toArray
}
}
Key Scala patterns:
valfor immutable values (likefinalin Java)untilfor exclusive range (0 until 5is 0,1,2,3,4)ListBufferfor efficient mutable collection building- Functional alternative:
words.sliding(3).collect { case Array(w1, w2, w3) if w1 == first && w2 == second => w3 }.toArray
All implementations share the same core algorithm:
- Split text into words/tokens
- Loop from
i=0tolength-2(three-word window) - Check if
words[i]andwords[i+1]match the bigram - If match, capture
words[i+2] - Return collected results
The key difference is language-specific idioms for string splitting, array/list operations, and type systems. Choose the language that best fits your project's requirements!