Anagram clustering challenge

AS
Ackshaey Singh
Difficulty Hard
Time Complexity
O(n * k * log(k))
Space Complexity
O(n * k)
Array Hash table String Sorting
Yandex Amazon Zoho Yahoo Google Apple ServiceNow EPAM Systems PayPal Microsoft BP Meta Oracle Nvidia Salesforce Anduril J.P. Morgan Visa

Your Amazon interviewer slides a list of words across the table: "eat", "tea", "tan", "ate", "nat", "bat". "Group these so that anagrams sit together," they say. This problem, also known as Group Anagrams on other platforms, is one of the most frequently asked hash map questions in technical interviews. Companies from Yandex to Google to Apple rely on it to test whether you can design efficient lookup strategies with the right data structures.

TL;DR

Sort each string's characters to produce a canonical key. Two strings are anagrams if and only if they produce the same sorted key. Use a hash map where keys are sorted strings and values are lists of originals. Return all the lists. This runs in O(n * k * log(k)) time and O(n * k) space, where n is the number of strings and k is the maximum string length.

Why This Problem Matters

Group Anagrams sits at the intersection of hashing, string manipulation, and algorithmic design. It tests whether you can identify the right invariant (sorted characters) and use it as a hash key. The pattern of "transform input into a canonical form, then group by that form" appears in duplicate detection, data deduplication, and search indexing. Master this problem and you will have a reusable technique for any grouping task.

Understanding the Problem

Given an array of strings strs, group them so that all anagrams appear in the same sublist. Two strings are anagrams if one can be rearranged to form the other.

Example: ["eat","tea","tan","ate","nat","bat"] produces three groups:

  • ["eat","tea","ate"] (all rearrangements of the letters a, e, t)
  • ["tan","nat"] (both contain a, n, t)
  • ["bat"] (no other string shares b, a, t)

The order of groups and the order within each group do not matter.

Edge Cases

  • Empty strings: ["",""] groups both into [["",""]] since empty is an anagram of empty.
  • Single characters: ["a","b","c"] produces three singleton groups.
  • All anagrams: ["abc","bca","cab"] produces one group with all three.
  • No anagrams: Each string is unique, so every group has one element.

The Key Insight: Canonical Form

The core observation is that two strings are anagrams when they contain the same characters in the same frequencies. We need a way to reduce each string to a single "fingerprint" that is identical for all anagrams.

Sorting the characters of a string does exactly this. "eat", "tea", and "ate" all sort to "aet":

Loading visualization...

This sorted string becomes the hash map key. Every string that produces the same sorted key belongs in the same group.

Walking Through the Algorithm

Let's trace the full example. For each string, we sort its characters and use the result as a map key:

Loading visualization...

After processing all six strings, the map contains three keys, each pointing to a list of anagrams:

Loading visualization...

We return the three lists: [["eat","tea","ate"], ["tan","nat"], ["bat"]].

Implementation

Prefer a different language? Jump to solutions in other languages.

import java.util.*;

public class Solution {
  public List<List<String>> groupAnagrams(String[] strs) {
    if (strs == null || strs.length == 0) {
      return new ArrayList<>();
    }

    Map<String, List<String>> anagramMap = new HashMap<>();

    for (String str : strs) {
      char[] charArray = str.toCharArray();
      Arrays.sort(charArray);
      String sortedStr = new String(charArray);

      if (!anagramMap.containsKey(sortedStr)) {
        anagramMap.put(sortedStr, new ArrayList<>());
      }
      anagramMap.get(sortedStr).add(str);
    }

    return new ArrayList<>(anagramMap.values());
  }
}

The loop processes each string once. For each string, it sorts the characters (O(k log k) where k is the string length), builds the key, and appends to the right bucket. At the end, we collect all buckets.

The Character-Count Alternative

Sorting works well, but if your strings are very long, the O(k log k) per-string cost adds up. An alternative is to count character frequencies and use the count array as the key:

Loading visualization...

For each string, create a length-26 array where index 0 counts 'a', index 1 counts 'b', and so on. Convert this array to a string like "1#0#0#0#1#...#1#" and use it as the hash key. This runs in O(k) per string instead of O(k log k), bringing the total to O(n * k).

In interviews, the sorting approach is usually sufficient and easier to explain. Mention the counting approach as an optimization when asked about follow-ups.

Handling Empty Strings

Empty strings are a common edge case that trips up candidates. An empty string sorted is still an empty string, so the algorithm groups them naturally:

Loading visualization...

No special handling required. The empty string is a valid hash key.

Complexity Analysis

Time Complexity: O(n * k * log(k))

For each of the n strings, sorting its k characters costs O(k log k). Hash map insertions and lookups are O(k) on average (for hashing the key). The dominant term is the sorting: O(n * k * log(k)).

With the character-count approach, this drops to O(n * k).

Space Complexity: O(n * k)

The hash map stores all n strings as values, and each key is at most k characters. The output is the same size. Total space is O(n * k).

Common Pitfalls

  1. Modifying the original string: Some languages have immutable strings. Make sure you sort a copy, not the original, or you will corrupt the input.
  2. Using the wrong key type: In languages like Python, lists are not hashable. Convert the sorted characters to a string or tuple before using as a dict key.
  3. Forgetting empty strings: An empty string is a valid anagram of another empty string. Do not skip or filter them out.
  4. Assuming fixed output order: The order of groups depends on hash map iteration order, which varies by language and runtime. The problem accepts any order.

Interview Tips

  • Start with the invariant: "Anagrams have the same characters in the same frequencies. Sorting gives a canonical form."
  • Draw the map: Sketch how "eat", "tea", "ate" all produce key "aet" and land in the same bucket.
  • State complexity early: "O(n * k * log(k)) time, O(n * k) space."
  • Mention the alternative: "We could use character counts instead of sorting for O(n * k) time, but sorting is simpler to implement."
  • Test with edge cases: Empty strings, single characters, all identical strings.

Key Takeaways

  • Sorting each string's characters produces a canonical key that is identical for all anagrams.
  • A hash map with sorted keys as entries and lists of originals as values groups all anagrams in a single pass through the input.
  • The sorting approach runs in O(n * k * log(k)) time and O(n * k) space, which is efficient for typical interview constraints.
  • The character-count alternative avoids sorting and achieves O(n * k) time, useful when strings are very long.
  • This "canonical form + hash map" technique applies to any problem where you need to group items by equivalence.

Related Problems

Once you have this grouping pattern down, try these related problems:

  • Anagram Checker: The simpler version, checking if two specific strings are anagrams.
  • Bracket Harmony Check: Another hash-based problem testing structural matching.
  • Two Sum: The classic hash map lookup problem that pairs well with this one.

These problems and hundreds more are available on Firecode, where you can practice with instant feedback and spaced repetition. Whether you are targeting your first offer at Amazon or a senior role at Google, building fluency with hash map patterns like this one gives you a reliable toolkit for interview day.

Solutions in Other Languages

Python

from collections import defaultdict

class Solution:
    def group_anagrams(self, strs):
        anagrams = defaultdict(list)
        for s in strs:
            sorted_str = ''.join(sorted(s))
            anagrams[sorted_str].append(s)
        return list(anagrams.values())

Python's defaultdict(list) avoids the "check-then-create" pattern. The sorted() function returns a list of characters, which ''.join() converts back to a string.

JavaScript

groupAnagrams(strs) {
  const map = new Map();
  for (const str of strs) {
    const sortedStr = str.split('').sort().join('');
    if (!map.has(sortedStr)) {
      map.set(sortedStr, []);
    }
    map.get(sortedStr).push(str);
  }
  return Array.from(map.values());
}

TypeScript

groupAnagrams(strs: string[]): string[][] {
  const map = new Map<string, string[]>();
  for (const str of strs) {
    const sortedStr = str.split('').sort().join('');
    if (!map.has(sortedStr)) {
      map.set(sortedStr, []);
    }
    map.get(sortedStr)!.push(str);
  }
  return Array.from(map.values());
}

C++

std::vector<std::vector<std::string>> groupAnagrams(
    std::vector<std::string> &strs) {
  std::unordered_map<std::string, std::vector<std::string>> anagramGroups;
  for (const auto &str : strs) {
    std::string sortedStr = str;
    std::sort(sortedStr.begin(), sortedStr.end());
    anagramGroups[sortedStr].push_back(str);
  }
  std::vector<std::vector<std::string>> result;
  for (auto &group : anagramGroups) {
    result.push_back(std::move(group.second));
  }
  return result;
}

C++ std::sort modifies in place, so we copy the string first. The std::move avoids copying when building the result.

Go

func (s *Solution) GroupAnagrams(strs []string) [][]string {
    anagramMap := make(map[string][]string)
    for _, str := range strs {
        runes := []rune(str)
        sort.Slice(runes, func(i, j int) bool {
            return runes[i] < runes[j]
        })
        sortedStr := string(runes)
        anagramMap[sortedStr] = append(anagramMap[sortedStr], str)
    }
    result := make([][]string, 0, len(anagramMap))
    for _, group := range anagramMap {
        result = append(result, group)
    }
    return result
}

Go lacks a built-in string sort, so we convert to a rune slice and use sort.Slice.

Scala

def groupAnagrams(strs: Array[String]): List[List[String]] = {
  strs.groupBy(str => str.sorted).values.map(_.toList).toList
}

Scala's groupBy and sorted make this a one-liner.

Kotlin

fun groupAnagrams(strs: Array<String>): List<List<String>> {
    val anagramMap = mutableMapOf<String, MutableList<String>>()
    for (str in strs) {
        val sortedStr = str.toCharArray().sorted().joinToString("")
        anagramMap.getOrPut(sortedStr) { mutableListOf() }.add(str)
    }
    return anagramMap.values.toList()
}

Kotlin's getOrPut handles the "create if missing" pattern cleanly.

Swift

func groupAnagrams(_ strs: [String]) -> [[String]] {
    if strs.isEmpty { return [] }
    var anagramMap: [String: [String]] = [:]
    for str in strs {
        let sortedStr = String(str.sorted())
        anagramMap[sortedStr, default: []].append(str)
    }
    return Array(anagramMap.values)
}

Swift's default: subscript syntax avoids the nil check.

Rust

pub fn group_anagrams(&self, strs: Vec<String>) -> Vec<Vec<String>> {
    if strs.is_empty() { return Vec::new(); }
    let mut anagram_map: HashMap<String, Vec<String>> = HashMap::new();
    for s in strs.iter() {
        let mut chars: Vec<char> = s.chars().collect();
        chars.sort();
        let sorted_str: String = chars.into_iter().collect();
        anagram_map.entry(sorted_str)
            .or_insert_with(Vec::new)
            .push(s.clone());
    }
    anagram_map.into_values().collect()
}

Rust's entry API handles the insert-or-update pattern without double lookups.

C#

public List<List<string>> GroupAnagrams(string[] strs) {
    if (strs == null || strs.Length == 0)
        return new List<List<string>>();
    var anagramMap = new Dictionary<string, List<string>>();
    foreach (var str in strs) {
        var charArray = str.ToCharArray();
        Array.Sort(charArray);
        var sortedStr = new string(charArray);
        if (!anagramMap.ContainsKey(sortedStr))
            anagramMap[sortedStr] = new List<string>();
        anagramMap[sortedStr].Add(str);
    }
    return new List<List<string>>(anagramMap.Values);
}

Dart

List<List<String>> groupAnagrams(List<String> strs) {
  Map<String, List<String>> anagramMap = {};
  for (String str in strs) {
    String sortedStr = (str.split('')..sort()).join();
    anagramMap.putIfAbsent(sortedStr, () => []);
    anagramMap[sortedStr]!.add(str);
  }
  return anagramMap.values.toList();
}

Dart's cascade operator (..sort()) sorts the list in place and returns the list, keeping the chain readable.

PHP

public function groupAnagrams(array $strs): array {
    if (empty($strs)) return [];
    $anagramMap = [];
    foreach ($strs as $str) {
        $chars = str_split($str);
        sort($chars);
        $sortedStr = implode('', $chars);
        if (!isset($anagramMap[$sortedStr])) {
            $anagramMap[$sortedStr] = [];
        }
        $anagramMap[$sortedStr][] = $str;
    }
    return array_values($anagramMap);
}

Ruby

def group_anagrams(strs)
  return [] if strs.nil? || strs.empty?
  anagram_map = {}
  strs.each do |str|
    sorted_str = str.chars.sort.join
    (anagram_map[sorted_str] ||= []) << str
  end
  anagram_map.values
end

Ruby's ||= operator initializes the array on first access, and << appends in place.

Frequently Asked Questions

What is the time complexity of the Group Anagrams solution?
The sort-based approach runs in O(n * k * log(k)) time, where n is the number of strings and k is the maximum string length. For each of the n strings, sorting its k characters takes O(k log k). The character-count approach runs in O(n * k) by counting letter frequencies instead of sorting.
What is the space complexity of the Group Anagrams solution?
The space complexity is O(n * k) where n is the number of strings and k is the maximum string length. The hash map stores all n strings as values, and each sorted key is at most k characters long. The output itself requires O(n * k) space as well.
How does the sorted-key approach identify anagrams?
Two strings are anagrams if and only if they contain the same characters in the same frequencies. Sorting the characters of each string produces a canonical form. For example, 'eat', 'tea', and 'ate' all sort to 'aet'. Strings that share the same sorted key are anagrams and get grouped together.
What is the character-count alternative and when should you use it?
Instead of sorting each string, count the frequency of each character (26 letters) and use that count as the hash key. This runs in O(k) per string instead of O(k log k), making the overall complexity O(n * k). Use this approach when strings are very long and the O(k log k) sorting cost matters.
How do you handle empty strings in Group Anagrams?
Empty strings are valid input. Sorting an empty string produces an empty string, so all empty strings in the input share the same key and get grouped together. The algorithm handles this naturally without special cases.
Can Group Anagrams be solved without a hash map?
You could sort each string, then sort the entire array of (sorted-key, original) pairs, and group consecutive entries with the same key. This avoids a hash map but runs in O(n * k * log(n)) time due to the outer sort, which is slower than the hash map approach for large n.
Why is the output order unspecified for Group Anagrams?
The problem asks you to group anagrams, not to order the groups. A hash map does not guarantee insertion order in all languages, so the order of groups in the output may vary. This is acceptable because the correctness criterion is that each group contains exactly the right set of anagrams.
How do you choose between sorting and character counting as the key strategy?
If strings are short (under 20-30 characters), sorting is simpler and fast enough. If strings can be very long (hundreds of characters), character counting avoids the O(k log k) factor. In interviews, implement the sorting approach first for clarity, then mention the counting optimization.
What are real-world applications of grouping anagrams?
Anagram grouping is used in spell checkers (suggesting words with the same letters), word games (finding valid rearrangements), plagiarism detection (identifying rearranged text), and search engines (matching queries that are permutations of each other). The underlying technique of canonical-form hashing applies broadly.
What is the best approach for solving Group Anagrams in a coding interview?
Start by explaining that sorting each string creates a canonical form for anagram comparison. Use a hash map with sorted strings as keys and lists of originals as values. Walk through the example to show how groups form. Mention the O(n * k * log(k)) complexity and the O(n * k) character-count alternative. Test with edge cases: empty strings, single characters, and no anagrams.