Find duplicate numbers

AS
Ackshaey Singh
Difficulty Easy
Time Complexity
O(n * log(n))
Space Complexity
O(n)
Array Set
Bloomberg

You are in a Bloomberg phone screen and the interviewer starts with: "Given an array of integers, find all the numbers that appear more than once and return them sorted." This problem, also known as Contains Duplicate (with the twist of returning the actual duplicates), is a classic test of whether you can pick the right data structure for the job. It is straightforward enough to be a warm-up but reveals a lot about how you think through set operations and sorting.

TL;DR

Use two sets: a visited set to track numbers you have already seen, and a duplicate set to collect numbers that appear more than once. Iterate through the array once, checking membership in the visited set before adding each element. Convert the duplicate set to a sorted array and return it. This runs in O(n log n) time and O(n) space.

Why This Problem Matters

Duplicate detection shows up everywhere in real systems: deduplicating database records, validating data pipeline inputs, filtering spam. In interviews, this problem tests whether you reach for the right data structure (a set) instead of brute-forcing with nested loops. It also gives interviewers a natural follow-up: "What if you needed to count how many times each duplicate appears?" Nail the basic version, and that conversation flows naturally.

Understanding the Problem

Given an integer array, return a sorted array of all values that appear more than once. Each duplicate should appear exactly once in the result.

findDuplicateNumbers([4, 3, 1, 4, 2, 1, 4]) -> [1, 4]

Both 1 and 4 appear more than once, so they go into the result. The value 4 appears three times, but it only shows up once in the output. The result is sorted in ascending order.

Edge Cases

  1. Empty array: Return an empty array
  2. No duplicates: [4, 3, 1] returns []
  3. All duplicates: [1, 1, 1, 2, 2, 2, 3, 3, 3] returns [1, 2, 3]
  4. Single element: [100] returns []
  5. Negative numbers: [-5, -1, 0, 1, 5, 1, -1, 0] returns [-1, 0, 1]

Solution Approach

The brute-force way is to compare every pair of elements with nested loops. That works but runs in O(n^2) time, which is too slow for arrays up to 10,000 elements.

A better approach: use a HashSet as a "have I seen this before?" lookup table. For each number in the array, check if it is already in the visited set. If yes, it is a duplicate. If no, add it to the visited set and move on.

For collecting duplicates, you need a second set to avoid adding the same duplicate multiple times (since a value like 4 might appear three or more times). A TreeSet keeps duplicates sorted automatically, or you can use a regular set and sort at the end.

Here is how the algorithm processes [4, 3, 1, 4, 2, 1, 4]:

Loading visualization...

When we encounter 4 the second time, it is already in the visited set, so it goes into duplicates. Same for 1. The third occurrence of 4 tries to add it to duplicates again, but the set ignores the redundant insertion. At the end, we sort and return [1, 4].

Compare with an array that has no duplicates:

Loading visualization...

Every element is new, the duplicate set stays empty, and we return [].

Handling Negative Numbers

The algorithm works identically for negative values. Sets handle negative integers the same as positive ones:

Loading visualization...

Implementation

Prefer a different language? Jump to solutions in other languages.

import java.util.*;

public class Solution {
  public int[] findDuplicateNumbers(int[] arr) {
    SortedSet<Integer> sortedSet = new TreeSet<>();
    Set<Integer> visitedSet = new HashSet<>();

    Arrays.stream(arr).forEach(n -> {
      if (visitedSet.contains(n)) {
        sortedSet.add(n);
      }
      visitedSet.add(n);
    });

    return sortedSet.stream().mapToInt(i -> i).toArray();
  }
}

Let's walk through the code:

  1. TreeSet for duplicates: sortedSet automatically maintains elements in ascending order, so we never need an explicit sort step
  2. HashSet for tracking: visitedSet provides O(1) average-time lookups to check if we have seen a number before
  3. Single pass: We iterate through the array once with Arrays.stream(arr).forEach()
  4. Membership check: If visitedSet.contains(n) returns true, the number is a duplicate and goes into sortedSet
  5. Always mark as visited: Whether or not it is a duplicate, we add the number to visitedSet
  6. Convert to array: Finally, we stream the TreeSet into a primitive int array

The TreeSet approach is elegant because it handles both deduplication and sorting in one data structure. You could also use a HashSet for duplicates and call Arrays.sort() at the end, which trades the O(log n) per-insertion cost of TreeSet for a single O(k log k) sort where k is the number of distinct duplicates.

Complexity Analysis

Time: O(n log n) where n is the array length. The iteration is O(n). Each TreeSet insertion is O(log n) in the worst case, and we may insert up to n elements. If you use a HashSet plus a final sort, the analysis changes to O(n) for detection plus O(k log k) for sorting k duplicates, but the worst case is still O(n log n) when k approaches n.

Space: O(n). The visited set holds up to n elements. The duplicate set holds at most n/2 elements (half the array can be duplicates at most). Both scale linearly with input size.

Alternative Approaches

  1. Sort first, scan for adjacents: Sort the array in O(n log n), then scan for neighboring equal values in O(n). Uses O(1) extra space if you sort in-place, but mutates the input.
  2. HashMap counting: Count occurrences of each number, then filter for counts greater than 1. Same O(n) time and O(n) space, but more work than needed since we only care about "more than once," not exact counts.
  3. Brute force: Nested loops comparing all pairs. O(n^2) time, O(n) space for the result. Too slow for large inputs.

The two-set approach is the sweet spot: clean, efficient, and does not mutate the input.

Common Pitfalls

  1. Forgetting to deduplicate the output: If you use a list instead of a set for duplicates, the value 4 in [4, 3, 1, 4, 2, 1, 4] gets added twice (on the second and third occurrence). Always use a set to collect duplicates.

  2. Not sorting the result: The problem requires the output in ascending order. A HashSet does not guarantee order. Use a TreeSet or sort the result explicitly before returning.

  3. Using Arrays.sort() on JavaScript arrays: JavaScript's default .sort() uses lexicographic order, so [1, 10, 2] would sort as [1, 10, 2]. Always pass a numeric comparator: .sort((a, b) => a - b).

  4. Missing the empty array case: Most set-based solutions handle this naturally (empty input produces empty output), but it is worth confirming with a test.

Interview Tips

When presenting this solution:

  • Start by asking: "Should the result be sorted? Should duplicates appear once or as many times as they repeat?"
  • Mention the brute-force O(n^2) approach briefly, then explain why a set-based approach is better
  • Name the pattern: "I'll use two sets. One for tracking visited elements, one for collecting duplicates."
  • If the interviewer asks about space optimization, mention the sort-first approach that uses O(1) extra space at the cost of mutating the input
  • If asked about a follow-up counting occurrences, pivot to a HashMap

Key Takeaways

  • Two sets (visited + duplicates) solve this in a single pass through the array, giving O(n log n) time and O(n) space.
  • A TreeSet automatically maintains sorted order, eliminating the need for an explicit sort step. A HashSet plus a final sort achieves the same result.
  • The visited set provides O(1) average-time lookups, making the "have I seen this before?" check efficient.
  • This pattern generalizes: swap the duplicate set for a count map and you can answer "how many times does each element appear?" with minimal changes.
  • Edge cases (empty array, single element, all duplicates, negative numbers) are handled naturally by the set-based approach without special-case code.

Practice and Related Problems

Once you are comfortable with duplicate detection, try these progressions:

  • Find the first non-repeating character in a string (HashMap with insertion order)
  • Two Sum (HashMap for complement lookup)
  • Group Anagrams (HashMap with sorted-key grouping)
  • Intersection of two arrays (set intersection)

This problem and hundreds of others are available on Firecode, where consistent daily practice helps you build the pattern recognition that top tech companies look for. Whether you are warming up for phone screens or preparing for on-site interviews, starting with fundamentals like this sets you up well.

Solutions in Other Languages

Python

class Solution:
    def find_duplicate_numbers(self, arr):
        visited_set = set()
        output_set = set()

        for n in arr:
            if n in visited_set:
                output_set.add(n)
            visited_set.add(n)

        return sorted(list(output_set))

JavaScript

class Solution {
  findDuplicateNumbers(arr) {
    const visitedSet = new Set();
    const outSet = new Set();

    arr.forEach(n => {
      if (visitedSet.has(n)) {
        outSet.add(n);
      }
      visitedSet.add(n);
    });

    return Array.from(outSet).sort((a, b) => a - b);
  }
}

TypeScript

class Solution {
  findDuplicateNumbers(arr: number[]): number[] {
    const visitedSet = new Set<number>();
    const outSet = new Set<number>();

    arr.forEach(n => {
      if (visitedSet.has(n)) {
        outSet.add(n);
      }
      visitedSet.add(n);
    });

    return Array.from(outSet).sort((a, b) => a - b);
  }
}

C++

#include <set>
#include <vector>

class Solution {
public:
  std::vector<int> findDuplicateNumbers(const std::vector<int> &arr) {
    std::set<int> visitedSet;
    std::set<int> outSet;

    for (auto n : arr) {
      if (visitedSet.find(n) != visitedSet.end()) {
        outSet.insert(n);
      } else {
        visitedSet.insert(n);
      }
    }

    return std::vector<int>(outSet.begin(), outSet.end());
  }
};

Go

package solution

import "sort"

func (s *Solution) FindDuplicateNumbers(arr []int) []int {
    visitedSet := make(map[int]bool)
    duplicateSet := make(map[int]bool)

    for _, n := range arr {
        if visitedSet[n] {
            duplicateSet[n] = true
        }
        visitedSet[n] = true
    }

    var result []int
    for num := range duplicateSet {
        result = append(result, num)
    }

    sort.Ints(result)
    return result
}

Scala

import scala.collection.mutable

class Solution {
  def findDuplicateNumbers(arr: Array[Int]): Array[Int] = {
    val sortedSet = mutable.TreeSet[Int]()
    val visitedSet = mutable.HashSet[Int]()

    arr.foreach { n =>
      if (visitedSet.contains(n)) {
        sortedSet.add(n)
      }
      visitedSet.add(n)
    }

    sortedSet.toArray
  }
}

Kotlin

class Solution {
    fun findDuplicateNumbers(arr: IntArray): IntArray {
        val sortedSet = sortedSetOf<Int>()
        val visitedSet = hashSetOf<Int>()

        for (n in arr) {
            if (n in visitedSet) {
                sortedSet.add(n)
            }
            visitedSet.add(n)
        }

        return sortedSet.toIntArray()
    }
}

Swift

import Foundation

class Solution {
    func findDuplicateNumbers(_ arr: [Int]) -> [Int] {
        var duplicateSet = Set<Int>()
        var visitedSet = Set<Int>()

        for n in arr {
            if visitedSet.contains(n) {
                duplicateSet.insert(n)
            }
            visitedSet.insert(n)
        }

        return Array(duplicateSet).sorted()
    }
}

Rust

impl Solution {
    pub fn find_duplicate_numbers(&self, arr: Vec<i32>) -> Vec<i32> {
        let mut visited = HashSet::new();
        let mut duplicates = std::collections::BTreeSet::new();

        for &num in &arr {
            if visited.contains(&num) {
                duplicates.insert(num);
            }
            visited.insert(num);
        }

        duplicates.into_iter().collect()
    }
}

C#

using System.Collections.Generic;
using System.Linq;

public class Solution {
    public int[] findDuplicateNumbers(int[] arr) {
        var sortedSet = new SortedSet<int>();
        var visitedSet = new HashSet<int>();

        foreach (var n in arr) {
            if (visitedSet.Contains(n)) {
                sortedSet.Add(n);
            }
            visitedSet.Add(n);
        }

        return sortedSet.ToArray();
    }
}

Dart

class Solution {
  List<int> findDuplicateNumbers(List<int> arr) {
    Set<int> duplicates = {};
    Set<int> visitedSet = {};

    for (int n in arr) {
      if (visitedSet.contains(n)) {
        duplicates.add(n);
      }
      visitedSet.add(n);
    }

    return duplicates.toList()..sort();
  }
}

PHP

class Solution {
    public function findDuplicateNumbers(array $arr): array {
        $visitedSet = [];
        $duplicateSet = [];

        foreach ($arr as $num) {
            if (isset($visitedSet[$num])) {
                $duplicateSet[$num] = true;
            }
            $visitedSet[$num] = true;
        }

        $result = array_keys($duplicateSet);
        sort($result);
        return $result;
    }
}

Ruby

require 'set'

class Solution
  def find_duplicate_numbers(arr)
    duplicates = Set.new
    visited_set = Set.new

    arr.each do |n|
      if visited_set.include?(n)
        duplicates.add(n)
      end
      visited_set.add(n)
    end

    duplicates.to_a.sort
  end
end

Frequently Asked Questions

What is the most efficient way to find duplicates in an array?
Use a HashSet to track seen elements and a second set to collect duplicates. For each element, check if it exists in the visited set. If yes, it is a duplicate. This runs in O(n) average time for detection, plus O(k log k) for sorting the k duplicates. The overall complexity is O(n log n) in the worst case when many elements are duplicates.
Why use a TreeSet instead of a HashSet for the duplicates?
A TreeSet automatically maintains elements in sorted order, which satisfies the requirement that the result be sorted. Using a HashSet for duplicates would require a separate sorting step at the end. In Java, TreeSet uses a red-black tree internally, giving O(log n) insertion and guaranteed sorted iteration.
What is the time complexity of this solution?
The time complexity is O(n log n) where n is the array length. Iterating through the array is O(n). Each insertion into the TreeSet is O(log n) in the worst case. If you use a HashSet for duplicates instead, detection is O(n) average but you need O(k log k) to sort the k duplicates at the end.
What is the space complexity of this solution?
The space complexity is O(n) because the visited set may store up to n elements (when all elements are unique). The duplicate set stores at most n/2 elements. Together, the auxiliary space is linear in the size of the input array.
How does this problem differ from checking if any duplicate exists?
The simpler 'contains duplicate' variant only needs a boolean answer and can return true as soon as a single duplicate is found. This problem requires collecting all duplicates, deduplicating them, and returning them sorted. You cannot short-circuit on the first duplicate because you need to find every duplicate in the array.
Can you solve this problem without extra space?
If the array can be modified, you can sort it in-place in O(n log n) time and then scan for adjacent equal elements in O(n) time, using O(1) extra space. However, sorting mutates the input, which may not be acceptable. The HashSet approach preserves the original array at the cost of O(n) space.
How do you handle negative numbers and duplicates?
HashSet and TreeSet handle negative integers the same as positive ones. The comparison and hashing operations work correctly for all integer values within the constraint range of -10^9 to 10^9. The sorted output will place negative duplicates before positive ones, as expected.
What are real-world applications of duplicate detection?
Duplicate detection is used in database deduplication (removing duplicate records), spam filtering (detecting repeated messages), file systems (identifying duplicate files via checksums), network packet deduplication, data pipeline validation, and ETL processes that need to ensure uniqueness before loading data into a warehouse.