Find duplicate numbers
You are in a Bloomberg phone screen and the interviewer starts with: "Given an array of integers, find all the numbers that appear more than once and return them sorted." This problem, also known as Contains Duplicate (with the twist of returning the actual duplicates), is a classic test of whether you can pick the right data structure for the job. It is straightforward enough to be a warm-up but reveals a lot about how you think through set operations and sorting.
TL;DR
Use two sets: a visited set to track numbers you have already seen, and a duplicate set to collect numbers that appear more than once. Iterate through the array once, checking membership in the visited set before adding each element. Convert the duplicate set to a sorted array and return it. This runs in O(n log n) time and O(n) space.
Why This Problem Matters
Duplicate detection shows up everywhere in real systems: deduplicating database records, validating data pipeline inputs, filtering spam. In interviews, this problem tests whether you reach for the right data structure (a set) instead of brute-forcing with nested loops. It also gives interviewers a natural follow-up: "What if you needed to count how many times each duplicate appears?" Nail the basic version, and that conversation flows naturally.
Understanding the Problem
Given an integer array, return a sorted array of all values that appear more than once. Each duplicate should appear exactly once in the result.
findDuplicateNumbers([4, 3, 1, 4, 2, 1, 4]) -> [1, 4]
Both 1 and 4 appear more than once, so they go into the result. The value 4 appears three times, but it only shows up once in the output. The result is sorted in ascending order.
Edge Cases
- Empty array: Return an empty array
- No duplicates:
[4, 3, 1]returns[] - All duplicates:
[1, 1, 1, 2, 2, 2, 3, 3, 3]returns[1, 2, 3] - Single element:
[100]returns[] - Negative numbers:
[-5, -1, 0, 1, 5, 1, -1, 0]returns[-1, 0, 1]
Solution Approach
The brute-force way is to compare every pair of elements with nested loops. That works but runs in O(n^2) time, which is too slow for arrays up to 10,000 elements.
A better approach: use a HashSet as a "have I seen this before?" lookup table. For each number in the array, check if it is already in the visited set. If yes, it is a duplicate. If no, add it to the visited set and move on.
For collecting duplicates, you need a second set to avoid adding the same duplicate multiple times (since a value like 4 might appear three or more times). A TreeSet keeps duplicates sorted automatically, or you can use a regular set and sort at the end.
Here is how the algorithm processes [4, 3, 1, 4, 2, 1, 4]:
Loading visualization...
When we encounter 4 the second time, it is already in the visited set, so it goes into duplicates. Same for 1. The third occurrence of 4 tries to add it to duplicates again, but the set ignores the redundant insertion. At the end, we sort and return [1, 4].
Compare with an array that has no duplicates:
Loading visualization...
Every element is new, the duplicate set stays empty, and we return [].
Handling Negative Numbers
The algorithm works identically for negative values. Sets handle negative integers the same as positive ones:
Loading visualization...
Implementation
Prefer a different language? Jump to solutions in other languages.
import java.util.*;
public class Solution {
public int[] findDuplicateNumbers(int[] arr) {
SortedSet<Integer> sortedSet = new TreeSet<>();
Set<Integer> visitedSet = new HashSet<>();
Arrays.stream(arr).forEach(n -> {
if (visitedSet.contains(n)) {
sortedSet.add(n);
}
visitedSet.add(n);
});
return sortedSet.stream().mapToInt(i -> i).toArray();
}
}
Let's walk through the code:
- TreeSet for duplicates:
sortedSetautomatically maintains elements in ascending order, so we never need an explicit sort step - HashSet for tracking:
visitedSetprovides O(1) average-time lookups to check if we have seen a number before - Single pass: We iterate through the array once with
Arrays.stream(arr).forEach() - Membership check: If
visitedSet.contains(n)returns true, the number is a duplicate and goes intosortedSet - Always mark as visited: Whether or not it is a duplicate, we add the number to
visitedSet - Convert to array: Finally, we stream the TreeSet into a primitive int array
The TreeSet approach is elegant because it handles both deduplication and sorting in one data structure. You could also use a HashSet for duplicates and call Arrays.sort() at the end, which trades the O(log n) per-insertion cost of TreeSet for a single O(k log k) sort where k is the number of distinct duplicates.
Complexity Analysis
Time: O(n log n) where n is the array length. The iteration is O(n). Each TreeSet insertion is O(log n) in the worst case, and we may insert up to n elements. If you use a HashSet plus a final sort, the analysis changes to O(n) for detection plus O(k log k) for sorting k duplicates, but the worst case is still O(n log n) when k approaches n.
Space: O(n). The visited set holds up to n elements. The duplicate set holds at most n/2 elements (half the array can be duplicates at most). Both scale linearly with input size.
Alternative Approaches
- Sort first, scan for adjacents: Sort the array in O(n log n), then scan for neighboring equal values in O(n). Uses O(1) extra space if you sort in-place, but mutates the input.
- HashMap counting: Count occurrences of each number, then filter for counts greater than 1. Same O(n) time and O(n) space, but more work than needed since we only care about "more than once," not exact counts.
- Brute force: Nested loops comparing all pairs. O(n^2) time, O(n) space for the result. Too slow for large inputs.
The two-set approach is the sweet spot: clean, efficient, and does not mutate the input.
Common Pitfalls
-
Forgetting to deduplicate the output: If you use a list instead of a set for duplicates, the value 4 in
[4, 3, 1, 4, 2, 1, 4]gets added twice (on the second and third occurrence). Always use a set to collect duplicates. -
Not sorting the result: The problem requires the output in ascending order. A HashSet does not guarantee order. Use a TreeSet or sort the result explicitly before returning.
-
Using
Arrays.sort()on JavaScript arrays: JavaScript's default.sort()uses lexicographic order, so[1, 10, 2]would sort as[1, 10, 2]. Always pass a numeric comparator:.sort((a, b) => a - b). -
Missing the empty array case: Most set-based solutions handle this naturally (empty input produces empty output), but it is worth confirming with a test.
Interview Tips
When presenting this solution:
- Start by asking: "Should the result be sorted? Should duplicates appear once or as many times as they repeat?"
- Mention the brute-force O(n^2) approach briefly, then explain why a set-based approach is better
- Name the pattern: "I'll use two sets. One for tracking visited elements, one for collecting duplicates."
- If the interviewer asks about space optimization, mention the sort-first approach that uses O(1) extra space at the cost of mutating the input
- If asked about a follow-up counting occurrences, pivot to a HashMap
Key Takeaways
- Two sets (visited + duplicates) solve this in a single pass through the array, giving O(n log n) time and O(n) space.
- A TreeSet automatically maintains sorted order, eliminating the need for an explicit sort step. A HashSet plus a final sort achieves the same result.
- The visited set provides O(1) average-time lookups, making the "have I seen this before?" check efficient.
- This pattern generalizes: swap the duplicate set for a count map and you can answer "how many times does each element appear?" with minimal changes.
- Edge cases (empty array, single element, all duplicates, negative numbers) are handled naturally by the set-based approach without special-case code.
Practice and Related Problems
Once you are comfortable with duplicate detection, try these progressions:
- Find the first non-repeating character in a string (HashMap with insertion order)
- Two Sum (HashMap for complement lookup)
- Group Anagrams (HashMap with sorted-key grouping)
- Intersection of two arrays (set intersection)
This problem and hundreds of others are available on Firecode, where consistent daily practice helps you build the pattern recognition that top tech companies look for. Whether you are warming up for phone screens or preparing for on-site interviews, starting with fundamentals like this sets you up well.
Solutions in Other Languages
Python
class Solution:
def find_duplicate_numbers(self, arr):
visited_set = set()
output_set = set()
for n in arr:
if n in visited_set:
output_set.add(n)
visited_set.add(n)
return sorted(list(output_set))
JavaScript
class Solution {
findDuplicateNumbers(arr) {
const visitedSet = new Set();
const outSet = new Set();
arr.forEach(n => {
if (visitedSet.has(n)) {
outSet.add(n);
}
visitedSet.add(n);
});
return Array.from(outSet).sort((a, b) => a - b);
}
}
TypeScript
class Solution {
findDuplicateNumbers(arr: number[]): number[] {
const visitedSet = new Set<number>();
const outSet = new Set<number>();
arr.forEach(n => {
if (visitedSet.has(n)) {
outSet.add(n);
}
visitedSet.add(n);
});
return Array.from(outSet).sort((a, b) => a - b);
}
}
C++
#include <set>
#include <vector>
class Solution {
public:
std::vector<int> findDuplicateNumbers(const std::vector<int> &arr) {
std::set<int> visitedSet;
std::set<int> outSet;
for (auto n : arr) {
if (visitedSet.find(n) != visitedSet.end()) {
outSet.insert(n);
} else {
visitedSet.insert(n);
}
}
return std::vector<int>(outSet.begin(), outSet.end());
}
};
Go
package solution
import "sort"
func (s *Solution) FindDuplicateNumbers(arr []int) []int {
visitedSet := make(map[int]bool)
duplicateSet := make(map[int]bool)
for _, n := range arr {
if visitedSet[n] {
duplicateSet[n] = true
}
visitedSet[n] = true
}
var result []int
for num := range duplicateSet {
result = append(result, num)
}
sort.Ints(result)
return result
}
Scala
import scala.collection.mutable
class Solution {
def findDuplicateNumbers(arr: Array[Int]): Array[Int] = {
val sortedSet = mutable.TreeSet[Int]()
val visitedSet = mutable.HashSet[Int]()
arr.foreach { n =>
if (visitedSet.contains(n)) {
sortedSet.add(n)
}
visitedSet.add(n)
}
sortedSet.toArray
}
}
Kotlin
class Solution {
fun findDuplicateNumbers(arr: IntArray): IntArray {
val sortedSet = sortedSetOf<Int>()
val visitedSet = hashSetOf<Int>()
for (n in arr) {
if (n in visitedSet) {
sortedSet.add(n)
}
visitedSet.add(n)
}
return sortedSet.toIntArray()
}
}
Swift
import Foundation
class Solution {
func findDuplicateNumbers(_ arr: [Int]) -> [Int] {
var duplicateSet = Set<Int>()
var visitedSet = Set<Int>()
for n in arr {
if visitedSet.contains(n) {
duplicateSet.insert(n)
}
visitedSet.insert(n)
}
return Array(duplicateSet).sorted()
}
}
Rust
impl Solution {
pub fn find_duplicate_numbers(&self, arr: Vec<i32>) -> Vec<i32> {
let mut visited = HashSet::new();
let mut duplicates = std::collections::BTreeSet::new();
for &num in &arr {
if visited.contains(&num) {
duplicates.insert(num);
}
visited.insert(num);
}
duplicates.into_iter().collect()
}
}
C#
using System.Collections.Generic;
using System.Linq;
public class Solution {
public int[] findDuplicateNumbers(int[] arr) {
var sortedSet = new SortedSet<int>();
var visitedSet = new HashSet<int>();
foreach (var n in arr) {
if (visitedSet.Contains(n)) {
sortedSet.Add(n);
}
visitedSet.Add(n);
}
return sortedSet.ToArray();
}
}
Dart
class Solution {
List<int> findDuplicateNumbers(List<int> arr) {
Set<int> duplicates = {};
Set<int> visitedSet = {};
for (int n in arr) {
if (visitedSet.contains(n)) {
duplicates.add(n);
}
visitedSet.add(n);
}
return duplicates.toList()..sort();
}
}
PHP
class Solution {
public function findDuplicateNumbers(array $arr): array {
$visitedSet = [];
$duplicateSet = [];
foreach ($arr as $num) {
if (isset($visitedSet[$num])) {
$duplicateSet[$num] = true;
}
$visitedSet[$num] = true;
}
$result = array_keys($duplicateSet);
sort($result);
return $result;
}
}
Ruby
require 'set'
class Solution
def find_duplicate_numbers(arr)
duplicates = Set.new
visited_set = Set.new
arr.each do |n|
if visited_set.include?(n)
duplicates.add(n)
end
visited_set.add(n)
end
duplicates.to_a.sort
end
end