Sentence Formation with Word Dictionary
You're in a technical interview at Microsoft. The interviewer gives you a string and a dictionary of words, then asks: "Find every valid sentence you can form from this string." This is Sentence Formation with Word Dictionary, also known as Word Break II on platforms like LeetCode and NeetCode. It looks manageable at first, but the search space can grow exponentially, and a brute-force approach will time out on anything beyond trivial inputs. The trick is combining backtracking with memoization to explore all possibilities without redundant work.
TL;DR
Given a string s and a dictionary of words, find all ways to segment s into space-separated dictionary words. Use backtracking to try each dictionary word as a prefix, recurse on the remaining substring, and combine the results into complete sentences. Memoization caches results for previously seen substrings so you never solve the same subproblem twice. The worst-case time complexity is O(2^n) because the number of valid segmentations can itself be exponential. Space complexity is O(n * m) where n is the string length and m is the number of valid sentences.
Why This Problem Matters
Word Break II is a hard problem that regularly appears at Microsoft, Amazon, TikTok, Bloomberg, and Meta. It tests your ability to enumerate all valid solutions (not just check existence or find one), handle recursive decomposition with memoization, and reason about exponential search spaces. If you can solve this cleanly, you've demonstrated a level of comfort with backtracking and caching that transfers directly to other combinatorial problems.
Understanding the Problem
Given a string s and a list of dictionary words, insert spaces into s to form sentences where every word belongs to the dictionary. Return all valid sentences. Words can be reused.
Here's the dictionary for our first example, s = "catsanddog":
Loading visualization...
From this dictionary, we can segment "catsanddog" in two ways:
- "cat sand dog" - splitting as
cat|sand|dog - "cats and dog" - splitting as
cats|and|dog
Notice how the same characters participate in different word boundaries. "cats" overlaps with "cat" + "s", and "sand" overlaps with "s" + "and". This ambiguity is what makes the problem interesting.
A second example: s = "pineapplepenapple" with dictionary ["apple", "pen", "applepen", "pine", "pineapple"] produces three valid sentences:
- "pine apple pen apple"
- "pineapple pen apple"
- "pine applepen apple"
And when no valid segmentation exists? wordBreak("catsandog", ["cats", "dog", "sand", "and", "cat"]) returns an empty list because "og" at the end doesn't match any dictionary word.
Building Intuition
Thinking Recursively
Consider what happens when you look at the start of "catsanddog". Two dictionary words match as prefixes: "cat" and "cats". Each match gives you a different remaining string to segment. If you keep matching prefixes recursively, you build a tree of possibilities:
Loading visualization...
The left branch matches "cat" first, leaving "sanddog". From there, "sand" matches, leaving "dog". Finally "dog" matches, and the empty string signals a complete segmentation. The right branch matches "cats" first, leaving "anddog". Then "and" matches, leaving "dog", which was already computed by the left branch. That's where memoization kicks in.
Assembling Sentences Bottom-Up
Each recursive call returns a list of sentences that can be formed from the remaining string. The current word is prepended to each of those sentences:
Loading visualization...
The base case is the empty string, which returns [""]. This signals "I've used up the whole input, and the segmentation is valid." The word "dog" combines with "" to produce ["dog"]. Then "sand" combines with ["dog"] to produce ["sand dog"]. And "cat" combines with ["sand dog"] to produce ["cat sand dog"].
The Role of Memoization
Why Cache Results?
Without memoization, the algorithm recomputes results for the same substring every time it's encountered. In our "catsanddog" example, both branches eventually need to segment "dog". With memoization, "dog" is solved once and the result is stored.
After fully processing "catsanddog", the memo cache holds these entries:
Loading visualization...
Each entry maps a substring to its list of valid sentences. When the right branch reaches "dog", it finds the cached result immediately instead of recursing again.
Where Memoization Really Pays Off
The savings become dramatic with longer strings. In "pineapplepenapple", the substring "apple" appears as a subproblem multiple times (after matching "pine" + "apple..." and again after matching other combinations). The substring "penapple" is also reused:
Loading visualization...
Without caching, each of these subproblems would trigger its own recursive tree. With caching, they're computed once and returned instantly on subsequent lookups. For strings with many overlapping word boundaries, this turns an impractical algorithm into a fast one.
The Algorithm at a Glance
Loading visualization...
- Build a HashSet from the dictionary for O(1) word lookup
- Try each dictionary word as a prefix of the current string
- Recurse on the remaining substring after removing the matched prefix
- Combine the matched word with each sentence returned by the recursive call
- Memoize the results before returning, so future calls with the same substring skip recomputation
Implementation
Prefer a different language? Jump to solutions in other languages.
import java.util.*;
public class Solution {
public List<String> wordBreak(String s, List<String> wordDict) {
Set<String> wordSet = new HashSet<>(wordDict);
Map<String, List<String>> memo = new HashMap<>();
return backtrack(s, wordSet, memo);
}
private List<String> backtrack(String s, Set<String> wordSet,
Map<String, List<String>> memo) {
if (memo.containsKey(s)) {
return memo.get(s);
}
List<String> result = new ArrayList<>();
if (s.isEmpty()) {
result.add("");
return result;
}
for (String word : wordSet) {
if (s.startsWith(word)) {
List<String> sublist = backtrack(s.substring(word.length()), wordSet, memo);
for (String sub : sublist) {
result.add(word + (sub.isEmpty() ? "" : " ") + sub);
}
}
}
memo.put(s, result);
return result;
}
}
The wordBreak method converts the dictionary to a HashSet for fast lookups, initializes an empty memo HashMap, and kicks off the recursion. The backtrack method first checks the memo, then handles the empty-string base case, then iterates over every word in the set. For each word that matches as a prefix, it recurses on the remainder and combines results. Before returning, it stores the computed sentences in the memo.
The ternary sub.isEmpty() ? "" : " " handles the spacing: when the sub-sentence is empty (meaning the matched word consumed the entire remaining string), no trailing space is added.
Complexity Analysis
Time Complexity: O(2^n)
The worst-case time is exponential because the number of valid segmentations can be exponential. Consider s = "aaaa" with dict = ["a", "aa", "aaa", "aaaa"]. The string can be partitioned as "a a a a", "aa a a", "a aa a", "a a aa", "aa aa", "aaa a", "a aaa", and "aaaa". The count grows like Fibonacci numbers as the string gets longer.
Memoization eliminates redundant subproblem computation but cannot reduce the total time below the output size. If there are m valid sentences, you must at minimum enumerate all of them.
Space Complexity: O(n * m)
The memo stores results for up to n unique substrings, where n is the string length. Each entry can hold a list of partial sentences. The total stored sentences across all memo entries can be proportional to m (the number of complete sentences), and each sentence can be up to n characters long. The recursion stack adds O(n) in the worst case but is dominated by the memo storage.
Edge Cases
No Valid Segmentation
When no combination of dictionary words covers the entire string, every recursive path hits a dead end:
Loading visualization...
In "catsandog", both "cat" + "sandog" and "cats" + "andog" eventually reach "og", which doesn't match any dictionary word. The algorithm returns an empty list. Memoization caches these empty results too, preventing repeated exploration of the same dead ends.
Other Edge Cases
- Single-word match:
s = "firecode"withdict = ["fire", "code", "firecode"]returns both["fire code", "firecode"] - Repeated characters:
s = "aaaaaaa"withdict = ["aaaa", "aaa"]has overlapping segmentations at multiple boundaries - Single character:
s = "a"withdict = ["a"]returns["a"] - Dictionary word reuse: Words can appear multiple times in a sentence, like
"apple"appearing twice in"pine apple pen apple"
Common Pitfalls
-
Forgetting memoization: Pure backtracking without caching will time out on inputs where subproblems overlap. This is the single most common mistake.
-
Using a List instead of a Set for the dictionary: Linear search through the dictionary at every recursion level multiplies the running time by the dictionary size. Always convert to a HashSet first.
-
Off-by-one in space handling: The sentence combination step needs to conditionally add a space. Forgetting the empty-string check produces sentences with trailing spaces.
-
Returning the wrong base case: The base case should return
[""](a list with one empty string), not[](an empty list). An empty list means "no valid segmentation," while[""]means "the empty string is a valid completion." -
Modifying shared state: If your language passes collections by reference, be careful not to mutate a list that's already stored in the memo.
Interview Tips
When you encounter this problem in an interview:
Start with Word Break I. Many interviewers will ask Word Break I first (just return true/false). Solve that cleanly with DP, then explain how Word Break II extends it to enumerate all solutions.
Explain the transition from DP to backtracking. Word Break I works with a simple boolean DP array. Word Break II needs to store actual sentences, which makes bottom-up DP awkward. Top-down backtracking with memoization is the natural fit here, and interviewers appreciate when you articulate why.
Talk through the memoization key. You can use either the remaining substring or the starting index as the memo key. Both work. Index-based memoization avoids creating substring copies in languages where that's expensive (like Java), while string-based memoization is more intuitive.
Mention the exponential output. Acknowledge that no algorithm can do better than exponential time in the worst case because the output itself can be that large. This shows you understand the fundamental limits of the problem, not just the implementation.
Key Takeaways
- Backtracking with memoization is the standard approach: try each dictionary word as a prefix, recurse on the remainder, and cache results to avoid redundant work.
- The base case
[""]for the empty string is what signals a successful segmentation. Returning[]instead would silently break every valid path. - Converting the dictionary to a HashSet is essential. Without O(1) lookups, the solution becomes impractical for large dictionaries.
- The worst-case time complexity is O(2^n) because the number of valid segmentations can be exponential. Memoization helps in practice but doesn't change the theoretical bound.
- The memo key can be either the remaining substring or the starting index. Both are valid; pick whichever feels natural in your language.
Practice on Firecode
Word Break II brings together recursion, memoization, and string manipulation into a single problem that's genuinely satisfying to solve. Once you've nailed the backtracking pattern here, you'll find it shows up in many other places: generating permutations, solving Sudoku, building expression evaluators.
This problem and hundreds of others are available on Firecode, where over 50,000 engineers have prepared for technical interviews and landed offers at top tech companies. If you're preparing for interviews at Microsoft, Amazon, or Meta, problems like this one are exactly the type of challenge you'll face.
Related problems to try next:
- Word Break I (check if a valid segmentation exists)
- Palindrome Partitioning (segment into palindromic substrings)
- Letter Combinations of a Phone Number (another backtracking enumeration problem)
Frequently Asked Questions
What is the difference between Word Break I and Word Break II?
What is the time complexity of Word Break II?
Why does memoization help if the worst-case time complexity is still exponential?
Can this problem be solved with bottom-up dynamic programming?
How does the HashSet improve the Word Break II solution?
What happens when no valid segmentation exists?
How does the base case work in the recursive approach?
What is the space complexity and what drives it?
How often does Word Break II appear in 2026 coding interviews?
What are real-world applications of word segmentation algorithms?
Solutions in Other Languages
Python
Python uses index-based memoization, iterating over possible end positions to extract substrings.
from typing import List
class Solution:
def word_break(self, s: str, word_dict: List[str]) -> List[str]:
word_set = set(word_dict)
memo = {}
def backtrack(start: int) -> List[str]:
if start == len(s):
return [""]
if start in memo:
return memo[start]
sentences = []
for end in range(start + 1, len(s) + 1):
word = s[start:end]
if word in word_set:
for subsentence in backtrack(end):
if subsentence:
sentences.append(word + " " + subsentence)
else:
sentences.append(word)
memo[start] = sentences
return sentences
return backtrack(0)
JavaScript
class Solution {
wordBreak(s, wordDict) {
const wordSet = new Set(wordDict);
const memo = new Map();
const backtrack = (start) => {
if (memo.has(start)) return memo.get(start);
if (start === s.length) return [''];
const sentences = [];
for (let end = start + 1; end <= s.length; end++) {
const word = s.substring(start, end);
if (wordSet.has(word)) {
const restSentences = backtrack(end);
for (const sentence of restSentences) {
sentences.push(word + (sentence ? ' ' + sentence : ''));
}
}
}
memo.set(start, sentences);
return sentences;
};
return backtrack(0);
}
}
module.exports = Solution;
TypeScript
class Solution {
wordBreak(s: string, wordDict: string[]): string[] {
const wordSet = new Set(wordDict);
const memo = new Map<number, string[]>();
const backtrack = (start: number): string[] => {
if (memo.has(start)) return memo.get(start)!;
if (start === s.length) return [''];
const sentences: string[] = [];
for (let end = start + 1; end <= s.length; end++) {
const word = s.substring(start, end);
if (wordSet.has(word)) {
const restSentences = backtrack(end);
for (const sentence of restSentences) {
sentences.push(word + (sentence ? ' ' + sentence : ''));
}
}
}
memo.set(start, sentences);
return sentences;
};
return backtrack(0);
}
}
export { Solution };
C++
The C++ solution uses string-based memoization and also checks if the entire string is a valid word.
#include <vector>
#include <string>
#include <unordered_set>
#include <unordered_map>
class Solution {
public:
std::vector<std::string> wordBreak(std::string s,
std::vector<std::string> &wordDict) {
std::unordered_set<std::string> wordSet(wordDict.begin(), wordDict.end());
std::unordered_map<std::string, std::vector<std::string>> memo;
return backtrack(s, wordSet, memo);
}
private:
std::vector<std::string> backtrack(const std::string &s,
const std::unordered_set<std::string> &wordSet,
std::unordered_map<std::string, std::vector<std::string>> &memo) {
if (memo.find(s) != memo.end()) {
return memo[s];
}
std::vector<std::string> result;
if (wordSet.find(s) != wordSet.end()) {
result.push_back(s);
}
for (size_t i = 1; i < s.size(); ++i) {
std::string left = s.substr(0, i);
if (wordSet.find(left) != wordSet.end()) {
std::string right = s.substr(i);
std::vector<std::string> subResult = backtrack(right, wordSet, memo);
for (const std::string &sub : subResult) {
result.push_back(left + " " + sub);
}
}
}
memo[s] = result;
return result;
}
};
Go
package solution
func (sol *Solution) WordBreak(s string, wordDict []string) []string {
wordSet := make(map[string]bool)
for _, word := range wordDict {
wordSet[word] = true
}
memo := make(map[string][]string)
return wordBreakHelper(s, wordSet, memo)
}
func wordBreakHelper(s string, wordSet map[string]bool,
memo map[string][]string) []string {
if res, found := memo[s]; found {
return res
}
if len(s) == 0 {
return []string{""}
}
var result []string
for i := 1; i <= len(s); i++ {
prefix := s[:i]
if wordSet[prefix] {
suffix := s[i:]
suffixWays := wordBreakHelper(suffix, wordSet, memo)
for _, way := range suffixWays {
if way == "" {
result = append(result, prefix)
} else {
result = append(result, prefix+" "+way)
}
}
}
}
memo[s] = result
return result
}
type Solution struct{}
Scala
class Solution {
def wordBreak(s: String, wordDict: List[String]): List[String] = {
import scala.collection.mutable
val wordSet = wordDict.toSet
val memo = mutable.Map[String, List[String]]()
def backtrack(remaining: String): List[String] = {
if (memo.contains(remaining)) return memo(remaining)
if (remaining.isEmpty) return List("")
val result = mutable.ListBuffer[String]()
for (end <- 1 to remaining.length) {
val word = remaining.substring(0, end)
if (wordSet.contains(word)) {
val subSentences = backtrack(remaining.substring(end))
for (subSentence <- subSentences) {
val space = if (subSentence.isEmpty) "" else " "
result += word + space + subSentence
}
}
}
memo(remaining) = result.toList
result.toList
}
backtrack(s)
}
}
Kotlin
class Solution {
fun wordBreak(s: String, wordDict: List<String>): List<String> {
val wordSet = wordDict.toSet()
val memo = mutableMapOf<String, List<String>>()
return backtrack(s, wordSet, memo)
}
private fun backtrack(s: String, wordSet: Set<String>,
memo: MutableMap<String, List<String>>): List<String> {
if (s in memo) return memo[s]!!
val result = mutableListOf<String>()
if (s.isEmpty()) {
result.add("")
return result
}
for (word in wordSet) {
if (s.startsWith(word)) {
val sublist = backtrack(s.substring(word.length), wordSet, memo)
for (sub in sublist) {
result.add(word + (if (sub.isEmpty()) "" else " ") + sub)
}
}
}
memo[s] = result
return result
}
}
Swift
import Foundation
class Solution {
func wordBreak(_ s: String, _ wordDict: [String]) -> [String] {
let wordSet = Set(wordDict)
var memo = [String: [String]]()
return backtrack(s, wordSet, &memo)
}
private func backtrack(_ s: String, _ wordSet: Set<String>,
_ memo: inout [String: [String]]) -> [String] {
if let cached = memo[s] { return cached }
var result = [String]()
if s.isEmpty { return [""] }
for word in wordSet {
if s.hasPrefix(word) {
let remaining = String(s.dropFirst(word.count))
let sublist = backtrack(remaining, wordSet, &memo)
for sub in sublist {
result.append(word + (sub.isEmpty ? "" : " ") + sub)
}
}
}
memo[s] = result
return result
}
}
Rust
use std::collections::{HashMap, HashSet};
pub struct Solution;
impl Solution {
pub fn word_break(&self, s: String, word_dict: Vec<String>) -> Vec<String> {
let word_set: HashSet<String> = word_dict.into_iter().collect();
let mut memo: HashMap<String, Vec<String>> = HashMap::new();
Self::backtrack(&s, &word_set, &mut memo)
}
fn backtrack(s: &str, word_set: &HashSet<String>,
memo: &mut HashMap<String, Vec<String>>) -> Vec<String> {
if let Some(cached) = memo.get(s) {
return cached.clone();
}
let mut result: Vec<String> = Vec::new();
if s.is_empty() {
result.push(String::new());
return result;
}
for word in word_set {
if s.starts_with(word.as_str()) {
let sub_sentences = Self::backtrack(&s[word.len()..], word_set, memo);
for sub in &sub_sentences {
if sub.is_empty() {
result.push(word.clone());
} else {
result.push(format!("{} {}", word, sub));
}
}
}
}
memo.insert(s.to_string(), result.clone());
result
}
}
C#
using System.Collections.Generic;
public class Solution {
public List<string> WordBreak(string s, List<string> wordDict) {
var wordSet = new HashSet<string>(wordDict);
var memo = new Dictionary<string, List<string>>();
return Backtrack(s, wordSet, memo);
}
private List<string> Backtrack(string s, HashSet<string> wordSet,
Dictionary<string, List<string>> memo) {
if (memo.ContainsKey(s)) return memo[s];
var result = new List<string>();
if (s.Length == 0) {
result.Add("");
return result;
}
foreach (var word in wordSet) {
if (s.StartsWith(word)) {
var sublist = Backtrack(s.Substring(word.Length), wordSet, memo);
foreach (var sub in sublist) {
result.Add(word + (sub.Length == 0 ? "" : " ") + sub);
}
}
}
memo[s] = result;
return result;
}
}
Dart
class Solution {
List<String> wordBreak(String s, List<String> wordDict) {
Set<String> wordSet = wordDict.toSet();
Map<String, List<String>> memo = {};
return _backtrack(s, wordSet, memo);
}
List<String> _backtrack(String s, Set<String> wordSet,
Map<String, List<String>> memo) {
if (memo.containsKey(s)) return memo[s]!;
List<String> result = [];
if (s.isEmpty) {
result.add("");
return result;
}
for (String word in wordSet) {
if (s.startsWith(word)) {
List<String> sublist = _backtrack(
s.substring(word.length), wordSet, memo);
for (String sub in sublist) {
result.add(word + (sub.isEmpty ? "" : " ") + sub);
}
}
}
memo[s] = result;
return result;
}
}
PHP
PHP uses index-based memoization with array_flip for O(1) dictionary lookups.
<?php
class Solution {
public function wordBreak(string $s, array $wordDict): array {
$wordSet = array_flip($wordDict);
$memo = [];
return $this->backtrack($s, $wordSet, 0, $memo);
}
private function backtrack(string $s, array $wordSet,
int $start, array &$memo): array {
if (isset($memo[$start])) return $memo[$start];
if ($start === strlen($s)) return [""];
$sentences = [];
for ($end = $start + 1; $end <= strlen($s); $end++) {
$word = substr($s, $start, $end - $start);
if (isset($wordSet[$word])) {
$subSentences = $this->backtrack($s, $wordSet, $end, $memo);
foreach ($subSentences as $subSentence) {
$sentences[] = $word . ($subSentence === "" ? "" : " " . $subSentence);
}
}
}
$memo[$start] = $sentences;
return $sentences;
}
}
Ruby
require 'set'
class Solution
def word_break(s, word_dict)
word_set = Set.new(word_dict)
memo = {}
backtrack(s, word_set, memo)
end
def backtrack(s, word_set, memo)
return memo[s] if memo.key?(s)
result = []
return [''] if s.empty?
word_set.each do |word|
if s.start_with?(word)
sublist = backtrack(s[word.length..-1], word_set, memo)
sublist.each do |sub|
result << (sub.empty? ? word : "#{word} #{sub}")
end
end
end
memo[s] = result
result
end
end