Find the first occurrence of a substring

AS
Ackshaey Singh
Difficulty Medium
Time Complexity
O(n * m)
Space Complexity
O(1)
Two pointers String matching String
Google Uber Adobe Yahoo Amazon Warnermedia Bloomberg Meta

You're in a Google phone screen and the interviewer asks, "Given two strings, find where the first one appears inside the second." It sounds almost too simple, but this problem has a surprising amount of depth. It tests your ability to handle string indices carefully, think about edge cases, and reason about loop boundaries. This problem is also known as "Find the Index of the First Occurrence in a String" on other platforms, and it comes up frequently at companies like Google, Uber, Adobe, and Amazon.

TL;DR

Slide a window of length m (needle length) across the haystack string, comparing the substring at each position to needle. If a match is found, return the starting index. If no position matches, return -1. This runs in O(n * m) time and O(1) space, where n is the haystack length and m is the needle length. The key detail is setting the correct loop boundary: iterate only while ihaystackLength - needleLength, since any position beyond that cannot contain a full match.

Why This Problem Matters

Substring search is one of the fundamental operations in computer science. Every text editor's "Find" feature, every search engine's query matcher, and every compiler's lexer relies on some form of pattern matching. While production systems use optimized algorithms like KMP or Rabin-Karp, the brute force approach taught here is the foundation they all build upon. Nail this, and you have a solid base for understanding more advanced string algorithms.

Understanding the Problem

Given two strings, needle and haystack, find the index where needle first appears in haystack. If needle is not present, return -1.

strStr("haystack", "stack") => 3
strStr("hello", "ll")       => 2
strStr("aaaaa", "bba")      => -1

Constraints:

  • Both strings contain only lowercase English letters
  • Lengths are between 1 and 10,000

Edge Cases

  1. Empty needle: Return 0 (the empty string is found at the start of any string)
  2. Needle longer than haystack: Return -1 immediately
  3. Exact match: needle == haystack returns 0
  4. Needle at the end: strStr("haystack", "ack") returns 5
  5. Partial matches before the real one: strStr("mississippi", "issip") returns 4

That last case is particularly tricky. The substring "issi" appears starting at index 1, but "issip" does not match there because the fifth character is 's', not 'p'. The real match starts at index 4.

Solution Approach

The strategy is straightforward: slide a comparison window across the haystack and check if the substring at each position matches the needle.

For each valid starting position i from 0 to haystackLength - needleLength:

  1. Extract the substring haystack[i..i+needleLength]
  2. Compare it to needle
  3. If they match, return i
  4. If no position produces a match, return -1

Walkthrough: Finding "ll" in "hello"

The needle has length 2, so we check positions 0 through 3 (since 5 - 2 = 3).

Loading visualization...

At position 0, we compare "he" against "ll". No match. At position 1, "el" versus "ll". Still no match. At position 2, "ll" versus "ll". That matches, so we return 2.

Walkthrough: "bba" Not Found in "aaaaa"

When the needle does not exist in the haystack, we exhaust all valid positions and return -1.

Loading visualization...

Every window of length 3 in "aaaaa" contains only 'a' characters, so "bba" never matches.

Handling Partial Matches

Finding "issip" in "mississippi" is a good test of correctness. Several positions produce partial matches where the first few characters align but the full substring does not.

Loading visualization...

The algorithm does not need special logic for partial matches. It simply compares the full substring at each position. If the comparison fails, it moves on to the next starting index.

Implementation

Prefer a different language? Jump to solutions in other languages.

public class Solution {
  public int strStr(String haystack, String needle) {
    // An empty needle is trivially found at index 0
    if (needle.isEmpty()) {
      return 0;
    }

    int haystackLength = haystack.length();
    int needleLength = needle.length();

    // Check each valid starting position
    for (int i = 0; i <= haystackLength - needleLength; i++) {
      // Compare the substring starting at i with needle
      if (haystack.substring(i, i + needleLength).equals(needle)) {
        return i;
      }
    }

    // No match found
    return -1;
  }
}

The loop condition i <= haystackLength - needleLength is the most important detail. If the remaining characters in haystack are fewer than needle's length, there is no point checking. This bounds the loop correctly and prevents index-out-of-bounds errors.

Why Not Use Built-in Methods?

In a real codebase, you would use haystack.indexOf(needle) in Java or haystack.find(needle) in Python. But in an interview, the whole point is to demonstrate that you understand how substring search works under the hood. Using built-in methods defeats the purpose of the question.

Complexity Analysis

Time Complexity: O(n * m)

For each of the n - m + 1 starting positions, we compare up to m characters. The worst case occurs with inputs like haystack = "aaaaaa" and needle = "aab", where every position requires comparing almost all characters of needle before the mismatch at the end.

In practice, mismatches tend to occur on the first or second character, so average performance is usually closer to O(n). But the worst case is O(n * m), and that is what you should report in an interview.

Space Complexity: O(1)

We only use a few integer variables. In Java, substring() does create a new String object of length m, but no auxiliary data structures are needed. The algorithm is an in-place scan.

Can We Do Better?

Yes. The KMP (Knuth-Morris-Pratt) algorithm preprocesses the needle into a failure function that allows it to skip redundant comparisons, bringing the time down to O(n + m). Rabin-Karp uses rolling hash functions for O(n) average-case performance. For most interviews, the brute force O(n * m) solution is expected first. Mention KMP or Rabin-Karp as follow-ups to show depth of knowledge.

Common Pitfalls

  1. Off-by-one in the loop bound: Using i < haystackLength - needleLength instead of i <= haystackLength - needleLength misses the case where needle appears at the very end. For strStr("abc", "c"), the correct answer is 2, which requires checking position 3 - 1 = 2.

  2. Forgetting the empty needle case: The expected behavior is to return 0 for an empty needle. Some candidates skip this check and get incorrect results or exceptions.

  3. Integer overflow on length subtraction: In C++ with size_t (unsigned), if haystack is shorter than needle, then haystack.size() - needle.size() wraps around to a huge positive number. Adding a guard if (haystack.size() < needle.size()) return -1; before the loop prevents this.

  4. Using == instead of .equals() in Java: The == operator compares object references, not string content. Always use .equals() for string comparison.

Interview Tips

When this problem comes up in an interview:

  1. Start with examples: Write out strStr("hello", "ll") and trace through it by hand. This helps you catch boundary issues before coding.

  2. State the loop bound explicitly: Say "I iterate from 0 to haystackLength - needleLength inclusive, because any later start position can't fit the full needle." This shows you have thought about correctness.

  3. Handle edge cases first: Check for empty needle and needle longer than haystack before entering the loop.

  4. Mention advanced algorithms: After presenting the brute force solution, briefly mention KMP or Rabin-Karp. You do not need to implement them unless asked, but knowing they exist demonstrates breadth.

  5. Discuss the complexity honestly: Do not claim O(n) for brute force. Acknowledge the O(n * m) worst case and explain why average performance is typically better.

Key Takeaways

  • The brute force substring search slides a window of length m across the haystack, comparing at each position. The loop bound ihaystackLength - needleLength is critical for correctness.
  • Time complexity is O(n * m) worst case, but average performance is often closer to O(n) because mismatches tend to occur early in the comparison.
  • Always handle the empty needle edge case (return 0) and guard against needle being longer than haystack (return -1).
  • In C++, watch out for unsigned integer underflow when computing haystack.size() - needle.size(). Add a length check before the loop.
  • For follow-up discussion, KMP achieves O(n + m) by preprocessing the pattern, and Rabin-Karp uses rolling hashes for O(n) average time. Mention these to show depth.

Related Problems and Practice

Once you are comfortable with basic substring search, try these related problems to strengthen your string matching skills:

  • Longest Substring Without Repeated Characters (sliding window variant)
  • Shared Prefix Finder (comparing multiple strings simultaneously)
  • Pattern-Driven String Matcher (regex-style matching)

This problem and hundreds of others are available on Firecode, where over 50,000 engineers have prepared for technical interviews at top companies. Consistent practice with problems like this builds the pattern recognition that makes interview day feel routine.

Solutions in Other Languages

Python

class Solution:
    def str_str(self, haystack: str, needle: str) -> int:
        if not needle:
            return 0

        needle_length = len(needle)
        haystack_length = len(haystack)

        for i in range(haystack_length - needle_length + 1):
            if haystack[i:i + needle_length] == needle:
                return i

        return -1

JavaScript

class Solution {
  strStr(haystack, needle) {
    if (needle === "") return 0;

    const needleLength = needle.length;
    const haystackLength = haystack.length;

    for (let i = 0; i <= haystackLength - needleLength; i++) {
      if (haystack.substring(i, i + needleLength) === needle) {
        return i;
      }
    }

    return -1;
  }
}

TypeScript

class Solution {
  strStr(haystack: string, needle: string): number {
    if (needle === "") return 0;

    const needleLength = needle.length;
    const haystackLength = haystack.length;

    for (let i = 0; i <= haystackLength - needleLength; i++) {
      if (haystack.substring(i, i + needleLength) === needle) {
        return i;
      }
    }

    return -1;
  }
}

C++

#include <string>

class Solution {
public:
  int strStr(std::string haystack, std::string needle) {
    if (needle.empty()) return 0;
    if (haystack.size() < needle.size()) return -1;

    for (size_t i = 0; i <= haystack.size() - needle.size(); ++i) {
      if (haystack.substr(i, needle.size()) == needle) {
        return i;
      }
    }

    return -1;
  }
};

Note the guard if (haystack.size() < needle.size()) return -1; before the loop. Since size() returns an unsigned type, subtracting a larger value from a smaller one wraps around instead of going negative.

Go

func strStr(haystack string, needle string) int {
    if len(needle) == 0 {
        return 0
    }
    if len(haystack) < len(needle) {
        return -1
    }

    for i := 0; i <= len(haystack)-len(needle); i++ {
        if haystack[i:i+len(needle)] == needle {
            return i
        }
    }

    return -1
}

Scala

class Solution {
  def strStr(haystack: String, needle: String): Int = {
    if (needle.isEmpty) return 0

    val haystackLength = haystack.length
    val needleLength = needle.length

    for (i <- 0 to (haystackLength - needleLength)) {
      if (haystack.substring(i, i + needleLength) == needle) {
        return i
      }
    }

    -1
  }
}

Kotlin

class Solution {
  fun strStr(haystack: String, needle: String): Int {
    if (needle.isEmpty()) return 0

    val haystackLength = haystack.length
    val needleLength = needle.length

    for (i in 0..(haystackLength - needleLength)) {
      if (haystack.substring(i, i + needleLength) == needle) {
        return i
      }
    }

    return -1
  }
}

Swift

class Solution {
    func strStr(_ haystack: String, _ needle: String) -> Int {
        if needle.isEmpty { return 0 }

        let haystackLength = haystack.count
        let needleLength = needle.count

        if needleLength > haystackLength { return -1 }

        for i in 0...(haystackLength - needleLength) {
            let startIdx = haystack.index(haystack.startIndex, offsetBy: i)
            let endIdx = haystack.index(startIdx, offsetBy: needleLength)
            if String(haystack[startIdx..<endIdx]) == needle {
                return i
            }
        }

        return -1
    }
}

Swift's string indexing uses String.Index rather than integers, which makes the code more verbose. The logic is identical to the other implementations.

Ruby

class Solution
  def str_str(haystack, needle)
    return 0 if needle.empty?

    haystack_length = haystack.length
    needle_length = needle.length

    (0..haystack_length - needle_length).each do |i|
      if haystack[i, needle_length] == needle
        return i
      end
    end

    -1
  end
end

Rust

pub fn str_str(haystack: String, needle: String) -> i32 {
    if needle.is_empty() {
        return 0;
    }

    let haystack_length = haystack.len();
    let needle_length = needle.len();

    for i in 0..=haystack_length - needle_length {
        if &haystack[i..i + needle_length] == needle.as_str() {
            return i as i32;
        }
    }

    -1
}

C#

public class Solution {
    public int strStr(string haystack, string needle) {
        if (needle.Length == 0) return 0;

        var haystackLength = haystack.Length;
        var needleLength = needle.Length;

        for (int i = 0; i <= haystackLength - needleLength; i++) {
            if (haystack.Substring(i, needleLength) == needle) {
                return i;
            }
        }

        return -1;
    }
}

Dart

class Solution {
  int strStr(String haystack, String needle) {
    if (needle.isEmpty) return 0;

    int haystackLength = haystack.length;
    int needleLength = needle.length;

    for (int i = 0; i <= haystackLength - needleLength; i++) {
      if (haystack.substring(i, i + needleLength) == needle) {
        return i;
      }
    }

    return -1;
  }
}

PHP

class Solution {
    public function strStr(string $haystack, string $needle): int {
        if (strlen($needle) === 0) return 0;

        $haystackLength = strlen($haystack);
        $needleLength = strlen($needle);

        for ($i = 0; $i <= $haystackLength - $needleLength; $i++) {
            if (substr($haystack, $i, $needleLength) === $needle) {
                return $i;
            }
        }

        return -1;
    }
}

Frequently Asked Questions

What is the time complexity of the brute force substring search?
The brute force approach runs in O(n * m) time where n is the length of haystack and m is the length of needle. For each of the n - m + 1 starting positions, we compare up to m characters. In practice, mismatches tend to occur early, so average-case performance is often much better than the worst case.
What is the space complexity of the sliding window substring search?
The approach uses O(1) extra space because we only store a few integer variables for indices and lengths. In languages where substring creates a new string (like Java), substring comparison itself may use O(m) space temporarily, but no auxiliary data structures are needed.
How does this problem differ from the KMP or Rabin-Karp algorithms?
The brute force approach re-examines characters after a mismatch, leading to O(n * m) worst-case time. KMP preprocesses the pattern into a failure function to skip redundant comparisons, achieving O(n + m) time. Rabin-Karp uses rolling hash functions to compare substrings in O(1) average time per position. For interview purposes, the brute force solution is usually sufficient unless the interviewer asks for optimization.
What happens when needle is an empty string?
By convention, an empty needle is found at index 0. This matches the behavior of Java's indexOf, Python's find, and most standard library implementations. The empty string is trivially a prefix of any string.
Why do we only iterate up to haystack.length - needle.length?
Once the remaining characters in haystack are fewer than the length of needle, a match is impossible. Iterating beyond that point would either produce index-out-of-bounds errors or waste comparisons. The upper bound i = haystackLength - needleLength ensures every comparison window fits entirely within haystack.
Can this problem be solved with two pointers?
Yes, but the approach is essentially equivalent to the sliding window. One pointer tracks the position in haystack, and you compare characters one by one against needle. On a mismatch, you reset the needle pointer and advance the haystack pointer. The brute force substring comparison is cleaner for interviews because it expresses the same logic with less bookkeeping.
What are common edge cases to test for substring search?
Key edge cases include: empty needle (return 0), needle longer than haystack (return -1), needle equals haystack (return 0), needle at the very end of haystack, single-character strings, and repeated patterns where partial matches occur before the real match (like finding 'issip' in 'mississippi').
How often does this problem appear in coding interviews?
This is one of the most frequently asked string problems, especially at Google, Uber, Adobe, and Amazon. It often appears as a warm-up question or as part of a larger string processing problem. Interviewers typically expect the O(n * m) solution first and may follow up by asking about KMP or Rabin-Karp.