Simple sorted string compression
You're sitting in your Microsoft interview and the interviewer slides a string problem across the table. "Given a sorted string, compress it by collapsing repeated characters into the character followed by its count." At first glance it feels straightforward, but the details matter: when do you append the count? How do you detect group boundaries? And can you do it in a single pass? This problem, also known as "String Compression" on other platforms, tests your ability to handle string manipulation cleanly and efficiently.
TL;DR
Scan the sorted string left to right with a counter that tracks consecutive identical characters. When the current character differs from the next (or you reach the end), append the character to a StringBuilder. If the count exceeds 1, also append the count. Reset the counter and continue. This runs in O(n) time with a single pass and uses O(n) space for the output.
Why This Problem Matters
String compression shows up regularly in interviews because it tests several core skills at once: loop boundary handling, conditional logic, and efficient string building. It is a building block for more advanced topics like run-length encoding, data serialization, and text processing pipelines. Getting comfortable with single-pass string algorithms pays off across a wide range of interview questions.
Understanding the Problem
Let's pin down exactly what we need to do.
Given: A sorted string (all identical characters are adjacent) Task: Replace runs of 2+ identical characters with the character followed by the count Return: The compressed string Key detail: Single occurrences are left as-is (no "a1", just "a")
Here are the examples from the problem:
compress("AAAAaaabbbbbcccc") --> "A4a3b5c4"
compress("aabbbbccc") --> "a2b4c3"
compress("abc") --> "abc"
Notice in the third example, no character repeats, so the output matches the input. The compression only helps when characters appear 3 or more times, since "aa" and "a2" consume the same space.
Visualizing Character Groups
The sorted input guarantees that identical characters cluster together. For the input "aabbbcddddeeeeef", the groups look like this:
Loading visualization...
Each group becomes either the character alone (if it appears once) or the character followed by its count. The output for this input is "a2b3cd4e5f".
Edge Cases to Consider
- Empty string: Return an empty string
- Single character: Return it unchanged
- No repeats ("abc"): Return the input as-is
- All same characters ("aaaa"): Return character + count ("a4")
- Case sensitivity: "A" and "a" are different characters
Solution Approach
Since the string is already sorted, we can solve this in a single left-to-right scan. The core idea is to count how many times each character repeats consecutively, then decide what to write to the output.
The Counting Strategy
Walk through the string one character at a time. Keep a counter (we'll call it accumulator) that tracks the current run length. At each position, ask two questions:
- Have I reached the end of the string?
- Is the next character different from the current one?
If either answer is yes, we've found a group boundary. Write the character to the output, and if the accumulator is greater than 1, also write the count. Then reset the accumulator for the next group.
The Decision at Each Boundary
Loading visualization...
When count == 1, we skip the number entirely. This is the detail that trips up many candidates. Appending "a1" would actually expand the string rather than compress it.
Building the Algorithm Step by Step
Let's trace through "aabbbcddddeeeeef" to see how the accumulator and output evolve:
Loading visualization...
At each group boundary (where the current character differs from the next), we flush the accumulated count to the output and start fresh. The full trace produces "a2b3cd4e5f".
Implementation
Prefer a different language? Jump to solutions in other languages.
Here's the Java solution with the single-pass approach:
public class Solution {
public String compress(String input) {
StringBuilder sb = new StringBuilder();
int accumulator = 0;
for (int i = 0; i < input.length(); i++) {
accumulator++;
if (i + 1 == input.length() || input.charAt(i) != input.charAt(i + 1)) {
sb.append(input.charAt(i));
if (accumulator > 1) {
sb.append(accumulator);
}
accumulator = 0;
}
}
return sb.toString();
}
}
Let's walk through the key decisions in this code:
-
StringBuilder over string concatenation: Java strings are immutable. Each
+=would create a new String object, turning our O(n) algorithm into O(n^2). StringBuilder's internal buffer avoids this. -
Boundary detection with
i + 1 == input.length(): This handles the last character in the string. Without this check, comparinginput.charAt(i)withinput.charAt(i + 1)would throw an IndexOutOfBoundsException. -
Conditional count appending: The
if (accumulator > 1)guard is what makes this "simple" compression work correctly. A single occurrence just gets the character, not "a1". -
Reset after flush: Setting
accumulator = 0(rather than 1) works because the loop immediately increments it at the top of the next iteration.
Dry Run with "abc"
| i | char | accumulator | boundary? | output |
|---|---|---|---|---|
| 0 | a | 1 | yes (a != b) | "a" |
| 1 | b | 1 | yes (b != c) | "ab" |
| 2 | c | 1 | yes (end) | "abc" |
No counts appended, so "abc" passes through unchanged.
Dry Run with "aabbbbccc"
| i | char | accumulator | boundary? | output |
|---|---|---|---|---|
| 0 | a | 1 | no | "" |
| 1 | a | 2 | yes | "a2" |
| 2 | b | 1 | no | "a2" |
| 3 | b | 2 | no | "a2" |
| 4 | b | 3 | no | "a2" |
| 5 | b | 4 | yes | "a2b4" |
| 6 | c | 1 | no | "a2b4" |
| 7 | c | 2 | no | "a2b4" |
| 8 | c | 3 | yes (end) | "a2b4c3" |
Complexity Analysis
Time Complexity: O(n)
We visit each character exactly once. The operations inside the loop (incrementing, comparing, appending) are all O(1). StringBuilder's append method runs in amortized O(1) time because it doubles its internal buffer when needed, similar to how ArrayList grows.
Space Complexity: O(n)
The StringBuilder stores the output string, which in the worst case (no repeats) is the same length as the input. In the best case (all identical characters), the output is O(log n) characters (the character plus the digits of the count), but we report worst-case space as O(n).
Why Not Constant Space?
The problem description mentions "shoot for constant space complexity." This is achievable if you're allowed to modify the input string in place (or use a character array), but in Java, strings are immutable. The StringBuilder is the minimum overhead needed to produce the output.
Common Pitfalls
-
Off-by-one on the boundary check: Forgetting
i + 1 == input.length()causes an ArrayIndexOutOfBoundsException on the last character. Always handle the end-of-string case. -
Appending count for single characters: Writing "a1" instead of "a" when the character appears only once. The
accumulator > 1guard prevents this. -
Using string concatenation in a loop: This silently degrades performance to O(n^2). In an interview, mentioning why you chose StringBuilder demonstrates awareness of performance characteristics.
-
Forgetting to reset the counter: If you skip
accumulator = 0, the counts will accumulate across groups, producing wildly wrong results. -
Ignoring case sensitivity: 'A' and 'a' have different ASCII values and must be treated as separate characters. The sorted input places uppercase before lowercase, so "AAAaaa" has two distinct groups.
Interview Tips
When you encounter this problem in a Microsoft or similar interview:
-
Clarify the constraints upfront: "Is the string already sorted? Should I handle empty input? Is case significant?" These questions show thoroughness.
-
Start with examples: Walk through "aabbbcddddeeeeef" on the whiteboard before writing any code. Identify the groups visually.
-
Mention the StringBuilder choice: Briefly explain why you're not using
+=with strings. This is a small detail that separates experienced Java developers from beginners. -
Test your boundary condition: The
i + 1 == input.length()check is the most common source of bugs. Trace through the last character explicitly. -
Discuss variations: If time allows, mention how you'd handle unsorted strings (sort first or use a frequency map) or how standard run-length encoding differs (always appends the count).
Related Variations
This problem opens the door to several follow-up questions an interviewer might ask:
- What if the string isn't sorted? You'd need a HashMap to track character frequencies, then iterate through the map. The output order might differ.
- What if you need to decompress too? Write the inverse function that reads character-count pairs and expands them back.
- What about run-length encoding for binary data? Same concept, but applied to byte streams where runs can be very long.
Solutions in Other Languages
Python
class Solution:
def compress(self, input_str: str) -> str:
builder = ""
accumulator = 0
for i in range(len(input_str)):
accumulator += 1
if i + 1 == len(input_str) or input_str[i] != input_str[i + 1]:
builder += input_str[i]
if accumulator > 1:
builder += str(accumulator)
accumulator = 0
return builder
Python strings are also immutable, so repeated concatenation is technically O(n^2). For interview purposes this is acceptable, but in production code you might use a list and ''.join() at the end.
JavaScript
class Solution {
compress(input) {
let builder = "";
let accumulator = 0;
for (let i = 0; i < input.length; i++) {
accumulator += 1;
if (i + 1 === input.length || input[i] !== input[i + 1]) {
builder += input[i];
if (accumulator > 1) {
builder += accumulator;
}
accumulator = 0;
}
}
return builder;
}
}
TypeScript
class Solution {
compress(input: string): string {
let builder = "";
let accumulator = 0;
for (let i = 0; i < input.length; i++) {
accumulator += 1;
if (i + 1 === input.length || input[i] !== input[i + 1]) {
builder += input[i];
if (accumulator > 1) {
builder += accumulator;
}
accumulator = 0;
}
}
return builder;
}
}
C++
#include <string>
class Solution {
public:
std::string compress(std::string input) {
std::string builder;
int accumulator = 0;
for (int i = 0; i < input.length(); i++) {
accumulator++;
if (i + 1 == input.length() || input.at(i) != input.at(i + 1)) {
builder.push_back(input.at(i));
if (accumulator > 1) {
builder.append(std::to_string(accumulator));
}
accumulator = 0;
}
}
return builder;
}
};
Go
import (
"strconv"
"strings"
)
func Compress(input string) string {
var builder strings.Builder
accumulator := 0
for i := range input {
accumulator++
if i+1 == len(input) || input[i] != input[i+1] {
builder.WriteByte(input[i])
if accumulator > 1 {
builder.WriteString(strconv.Itoa(accumulator))
}
accumulator = 0
}
}
return builder.String()
}
Go's strings.Builder is the idiomatic choice for efficient string construction, similar to Java's StringBuilder.
Scala
import scala.collection.mutable
class Solution {
def compress(input: String): String = {
val sb = new mutable.StringBuilder()
var accumulator = 0
for (i <- 0 until input.length) {
accumulator += 1
if (i + 1 == input.length || input(i) != input(i + 1)) {
sb.append(input(i))
if (accumulator > 1) {
sb.append(accumulator)
}
accumulator = 0
}
}
sb.toString
}
}
Kotlin
class Solution {
fun compress(input: String): String {
val sb = StringBuilder()
var accumulator = 0
for (i in 0 until input.length) {
accumulator++
if (i + 1 == input.length || input[i] != input[i + 1]) {
sb.append(input[i])
if (accumulator > 1) {
sb.append(accumulator)
}
accumulator = 0
}
}
return sb.toString()
}
}
Ruby
class Solution
def compress(input)
result = ""
accumulator = 0
(0...input.length).each do |i|
accumulator += 1
if i + 1 == input.length || input[i] != input[i + 1]
result += input[i]
result += accumulator.to_s if accumulator > 1
accumulator = 0
end
end
result
end
end
Rust
pub fn compress(input: String) -> String {
let mut result = String::new();
let mut accumulator = 0;
let chars: Vec<char> = input.chars().collect();
for i in 0..chars.len() {
accumulator += 1;
if i + 1 == chars.len() || chars[i] != chars[i + 1] {
result.push(chars[i]);
if accumulator > 1 {
result.push_str(&accumulator.to_string());
}
accumulator = 0;
}
}
result
}
Rust requires converting the string to a Vec<char> for indexed access since Rust strings are UTF-8 encoded and don't support O(1) indexing natively.
Swift
func compress(_ input: String) -> String {
let chars = Array(input)
var result = ""
var accumulator = 0
for i in 0..<chars.count {
accumulator += 1
if i + 1 == chars.count || chars[i] != chars[i + 1] {
result.append(chars[i])
if accumulator > 1 {
result += "\(accumulator)"
}
accumulator = 0
}
}
return result
}
Dart
class Solution {
String compress(String input) {
StringBuffer sb = StringBuffer();
int accumulator = 0;
for (int i = 0; i < input.length; i++) {
accumulator++;
if (i + 1 == input.length || input[i] != input[i + 1]) {
sb.write(input[i]);
if (accumulator > 1) {
sb.write(accumulator);
}
accumulator = 0;
}
}
return sb.toString();
}
}
C#
using System.Text;
public class Solution {
public string compress(string input) {
var sb = new StringBuilder();
int accumulator = 0;
for (int i = 0; i < input.Length; i++) {
accumulator++;
if (i + 1 == input.Length || input[i] != input[i + 1]) {
sb.Append(input[i]);
if (accumulator > 1) {
sb.Append(accumulator);
}
accumulator = 0;
}
}
return sb.ToString();
}
}
PHP
class Solution {
public function compress(string $input): string {
$result = "";
$accumulator = 0;
for ($i = 0; $i < strlen($input); $i++) {
$accumulator++;
if ($i + 1 === strlen($input) || $input[$i] !== $input[$i + 1]) {
$result .= $input[$i];
if ($accumulator > 1) {
$result .= $accumulator;
}
$accumulator = 0;
}
}
return $result;
}
}
Practice Makes Perfect
Once you're comfortable with sorted string compression, try these related problems to strengthen your string manipulation skills:
- Find the first non-duplicate character in a string
- Check if two strings are anagrams
- Reverse an integer (similar boundary-handling patterns)
This problem and thousands of others are available on Firecode, where over 50,000 users have prepared for technical interviews and landed roles at top tech companies. Whether you're targeting Microsoft or your next big opportunity, building a habit of daily practice on fundamentals like this will make the difference.