Anagram clustering challenge
Your Amazon interviewer slides a list of words across the table: "eat", "tea", "tan", "ate", "nat", "bat". "Group these so that anagrams sit together," they say. This problem, also known as Group Anagrams on other platforms, is one of the most frequently asked hash map questions in technical interviews. Companies from Yandex to Google to Apple rely on it to test whether you can design efficient lookup strategies with the right data structures.
TL;DR
Sort each string's characters to produce a canonical key. Two strings are anagrams if and only if they produce the same sorted key. Use a hash map where keys are sorted strings and values are lists of originals. Return all the lists. This runs in O(n * k * log(k)) time and O(n * k) space, where n is the number of strings and k is the maximum string length.
Why This Problem Matters
Group Anagrams sits at the intersection of hashing, string manipulation, and algorithmic design. It tests whether you can identify the right invariant (sorted characters) and use it as a hash key. The pattern of "transform input into a canonical form, then group by that form" appears in duplicate detection, data deduplication, and search indexing. Master this problem and you will have a reusable technique for any grouping task.
Understanding the Problem
Given an array of strings strs, group them so that all anagrams appear in the same sublist. Two strings are anagrams if one can be rearranged to form the other.
Example: ["eat","tea","tan","ate","nat","bat"] produces three groups:
["eat","tea","ate"](all rearrangements of the letters a, e, t)["tan","nat"](both contain a, n, t)["bat"](no other string shares b, a, t)
The order of groups and the order within each group do not matter.
Edge Cases
- Empty strings:
["",""]groups both into[["",""]]since empty is an anagram of empty. - Single characters:
["a","b","c"]produces three singleton groups. - All anagrams:
["abc","bca","cab"]produces one group with all three. - No anagrams: Each string is unique, so every group has one element.
The Key Insight: Canonical Form
The core observation is that two strings are anagrams when they contain the same characters in the same frequencies. We need a way to reduce each string to a single "fingerprint" that is identical for all anagrams.
Sorting the characters of a string does exactly this. "eat", "tea", and "ate" all sort to "aet":
Loading visualization...
This sorted string becomes the hash map key. Every string that produces the same sorted key belongs in the same group.
Walking Through the Algorithm
Let's trace the full example. For each string, we sort its characters and use the result as a map key:
Loading visualization...
After processing all six strings, the map contains three keys, each pointing to a list of anagrams:
Loading visualization...
We return the three lists: [["eat","tea","ate"], ["tan","nat"], ["bat"]].
Implementation
Prefer a different language? Jump to solutions in other languages.
import java.util.*;
public class Solution {
public List<List<String>> groupAnagrams(String[] strs) {
if (strs == null || strs.length == 0) {
return new ArrayList<>();
}
Map<String, List<String>> anagramMap = new HashMap<>();
for (String str : strs) {
char[] charArray = str.toCharArray();
Arrays.sort(charArray);
String sortedStr = new String(charArray);
if (!anagramMap.containsKey(sortedStr)) {
anagramMap.put(sortedStr, new ArrayList<>());
}
anagramMap.get(sortedStr).add(str);
}
return new ArrayList<>(anagramMap.values());
}
}
The loop processes each string once. For each string, it sorts the characters (O(k log k) where k is the string length), builds the key, and appends to the right bucket. At the end, we collect all buckets.
The Character-Count Alternative
Sorting works well, but if your strings are very long, the O(k log k) per-string cost adds up. An alternative is to count character frequencies and use the count array as the key:
Loading visualization...
For each string, create a length-26 array where index 0 counts 'a', index 1 counts 'b', and so on. Convert this array to a string like "1#0#0#0#1#...#1#" and use it as the hash key. This runs in O(k) per string instead of O(k log k), bringing the total to O(n * k).
In interviews, the sorting approach is usually sufficient and easier to explain. Mention the counting approach as an optimization when asked about follow-ups.
Handling Empty Strings
Empty strings are a common edge case that trips up candidates. An empty string sorted is still an empty string, so the algorithm groups them naturally:
Loading visualization...
No special handling required. The empty string is a valid hash key.
Complexity Analysis
Time Complexity: O(n * k * log(k))
For each of the n strings, sorting its k characters costs O(k log k). Hash map insertions and lookups are O(k) on average (for hashing the key). The dominant term is the sorting: O(n * k * log(k)).
With the character-count approach, this drops to O(n * k).
Space Complexity: O(n * k)
The hash map stores all n strings as values, and each key is at most k characters. The output is the same size. Total space is O(n * k).
Common Pitfalls
- Modifying the original string: Some languages have immutable strings. Make sure you sort a copy, not the original, or you will corrupt the input.
- Using the wrong key type: In languages like Python, lists are not hashable. Convert the sorted characters to a string or tuple before using as a dict key.
- Forgetting empty strings: An empty string is a valid anagram of another empty string. Do not skip or filter them out.
- Assuming fixed output order: The order of groups depends on hash map iteration order, which varies by language and runtime. The problem accepts any order.
Interview Tips
- Start with the invariant: "Anagrams have the same characters in the same frequencies. Sorting gives a canonical form."
- Draw the map: Sketch how "eat", "tea", "ate" all produce key "aet" and land in the same bucket.
- State complexity early: "O(n * k * log(k)) time, O(n * k) space."
- Mention the alternative: "We could use character counts instead of sorting for O(n * k) time, but sorting is simpler to implement."
- Test with edge cases: Empty strings, single characters, all identical strings.
Key Takeaways
- Sorting each string's characters produces a canonical key that is identical for all anagrams.
- A hash map with sorted keys as entries and lists of originals as values groups all anagrams in a single pass through the input.
- The sorting approach runs in O(n * k * log(k)) time and O(n * k) space, which is efficient for typical interview constraints.
- The character-count alternative avoids sorting and achieves O(n * k) time, useful when strings are very long.
- This "canonical form + hash map" technique applies to any problem where you need to group items by equivalence.
Related Problems
Once you have this grouping pattern down, try these related problems:
- Anagram Checker: The simpler version, checking if two specific strings are anagrams.
- Bracket Harmony Check: Another hash-based problem testing structural matching.
- Two Sum: The classic hash map lookup problem that pairs well with this one.
These problems and hundreds more are available on Firecode, where you can practice with instant feedback and spaced repetition. Whether you are targeting your first offer at Amazon or a senior role at Google, building fluency with hash map patterns like this one gives you a reliable toolkit for interview day.
Solutions in Other Languages
Python
from collections import defaultdict
class Solution:
def group_anagrams(self, strs):
anagrams = defaultdict(list)
for s in strs:
sorted_str = ''.join(sorted(s))
anagrams[sorted_str].append(s)
return list(anagrams.values())
Python's defaultdict(list) avoids the "check-then-create" pattern. The sorted() function returns a list of characters, which ''.join() converts back to a string.
JavaScript
groupAnagrams(strs) {
const map = new Map();
for (const str of strs) {
const sortedStr = str.split('').sort().join('');
if (!map.has(sortedStr)) {
map.set(sortedStr, []);
}
map.get(sortedStr).push(str);
}
return Array.from(map.values());
}
TypeScript
groupAnagrams(strs: string[]): string[][] {
const map = new Map<string, string[]>();
for (const str of strs) {
const sortedStr = str.split('').sort().join('');
if (!map.has(sortedStr)) {
map.set(sortedStr, []);
}
map.get(sortedStr)!.push(str);
}
return Array.from(map.values());
}
C++
std::vector<std::vector<std::string>> groupAnagrams(
std::vector<std::string> &strs) {
std::unordered_map<std::string, std::vector<std::string>> anagramGroups;
for (const auto &str : strs) {
std::string sortedStr = str;
std::sort(sortedStr.begin(), sortedStr.end());
anagramGroups[sortedStr].push_back(str);
}
std::vector<std::vector<std::string>> result;
for (auto &group : anagramGroups) {
result.push_back(std::move(group.second));
}
return result;
}
C++ std::sort modifies in place, so we copy the string first. The std::move avoids copying when building the result.
Go
func (s *Solution) GroupAnagrams(strs []string) [][]string {
anagramMap := make(map[string][]string)
for _, str := range strs {
runes := []rune(str)
sort.Slice(runes, func(i, j int) bool {
return runes[i] < runes[j]
})
sortedStr := string(runes)
anagramMap[sortedStr] = append(anagramMap[sortedStr], str)
}
result := make([][]string, 0, len(anagramMap))
for _, group := range anagramMap {
result = append(result, group)
}
return result
}
Go lacks a built-in string sort, so we convert to a rune slice and use sort.Slice.
Scala
def groupAnagrams(strs: Array[String]): List[List[String]] = {
strs.groupBy(str => str.sorted).values.map(_.toList).toList
}
Scala's groupBy and sorted make this a one-liner.
Kotlin
fun groupAnagrams(strs: Array<String>): List<List<String>> {
val anagramMap = mutableMapOf<String, MutableList<String>>()
for (str in strs) {
val sortedStr = str.toCharArray().sorted().joinToString("")
anagramMap.getOrPut(sortedStr) { mutableListOf() }.add(str)
}
return anagramMap.values.toList()
}
Kotlin's getOrPut handles the "create if missing" pattern cleanly.
Swift
func groupAnagrams(_ strs: [String]) -> [[String]] {
if strs.isEmpty { return [] }
var anagramMap: [String: [String]] = [:]
for str in strs {
let sortedStr = String(str.sorted())
anagramMap[sortedStr, default: []].append(str)
}
return Array(anagramMap.values)
}
Swift's default: subscript syntax avoids the nil check.
Rust
pub fn group_anagrams(&self, strs: Vec<String>) -> Vec<Vec<String>> {
if strs.is_empty() { return Vec::new(); }
let mut anagram_map: HashMap<String, Vec<String>> = HashMap::new();
for s in strs.iter() {
let mut chars: Vec<char> = s.chars().collect();
chars.sort();
let sorted_str: String = chars.into_iter().collect();
anagram_map.entry(sorted_str)
.or_insert_with(Vec::new)
.push(s.clone());
}
anagram_map.into_values().collect()
}
Rust's entry API handles the insert-or-update pattern without double lookups.
C#
public List<List<string>> GroupAnagrams(string[] strs) {
if (strs == null || strs.Length == 0)
return new List<List<string>>();
var anagramMap = new Dictionary<string, List<string>>();
foreach (var str in strs) {
var charArray = str.ToCharArray();
Array.Sort(charArray);
var sortedStr = new string(charArray);
if (!anagramMap.ContainsKey(sortedStr))
anagramMap[sortedStr] = new List<string>();
anagramMap[sortedStr].Add(str);
}
return new List<List<string>>(anagramMap.Values);
}
Dart
List<List<String>> groupAnagrams(List<String> strs) {
Map<String, List<String>> anagramMap = {};
for (String str in strs) {
String sortedStr = (str.split('')..sort()).join();
anagramMap.putIfAbsent(sortedStr, () => []);
anagramMap[sortedStr]!.add(str);
}
return anagramMap.values.toList();
}
Dart's cascade operator (..sort()) sorts the list in place and returns the list, keeping the chain readable.
PHP
public function groupAnagrams(array $strs): array {
if (empty($strs)) return [];
$anagramMap = [];
foreach ($strs as $str) {
$chars = str_split($str);
sort($chars);
$sortedStr = implode('', $chars);
if (!isset($anagramMap[$sortedStr])) {
$anagramMap[$sortedStr] = [];
}
$anagramMap[$sortedStr][] = $str;
}
return array_values($anagramMap);
}
Ruby
def group_anagrams(strs)
return [] if strs.nil? || strs.empty?
anagram_map = {}
strs.each do |str|
sorted_str = str.chars.sort.join
(anagram_map[sorted_str] ||= []) << str
end
anagram_map.values
end
Ruby's ||= operator initializes the array on first access, and << appends in place.