Mutate DNA pairs by inserting G between adjacent duplicates

AS
Ackshaey Singh
Difficulty Medium
Time Complexity
O(n)
Space Complexity
O(n)
Recursion String
Microsoft Amazon

You're in the middle of a Microsoft phone screen, and the interviewer poses what sounds like a biology question: "Given a DNA sequence, insert a G between every pair of adjacent identical nucleotides." Before you start worrying about double helixes, recognize that this is a string recursion problem wearing a lab coat. The DNA framing is just flavor. Underneath, you're walking through a string character by character and deciding whether to insert something between neighbors.

TL;DR

Process the string one character at a time using recursion. Compare the first two characters: if they match, prepend the first character plus "G" to the recursive result; if they differ, just prepend the first character. The base case handles strings of length 0 or 1. This runs in O(n) time and O(n) space.

Why This Problem Matters

This problem is a clean introduction to recursive string building. It tests whether you can identify a repeating subproblem, define a base case, and trust the recursion to handle the rest. Companies like Microsoft and Amazon use problems like this early in interview loops because the logic is approachable, but the recursive thinking transfers directly to harder problems like string permutations, palindrome partitioning, and tree traversals.

Prefer a different language? Jump to solutions in other languages.

Understanding the Problem

Given a string of DNA nucleotides (A, T, C, G), insert a "G" between every pair of adjacent identical characters.

Examples:

mutateDnaPairs("AA")     -> "AGA"
mutateDnaPairs("TTTT")   -> "TGTGTGT"
mutateDnaPairs("TATA")   -> "TATA"
mutateDnaPairs("")        -> ""

The first example has one pair of identical neighbors (A, A), so we get "AGA". The second has three consecutive T's, which means T-T, T-T, T-T, producing "TGTGTGT". The third has no adjacent duplicates, so nothing changes.

Here is what "TTTT" looks like after mutation:

Loading visualization...

Each original T is preserved, and a G is spliced in between every pair of matching neighbors.

Edge Cases

  1. Empty string: Return "" immediately
  2. Single character: Return it as-is, nothing to compare
  3. No adjacent matches: The string passes through unchanged (e.g., "ATCG")
  4. All identical characters: Maximum insertions (e.g., "GGGGGGGG" becomes "GGGGGGGGGGGGGGGG")

Solution Approach

The recursive insight is straightforward: look at the first two characters, make a decision, then hand the rest of the string back to the same function.

Breaking It Down

  1. Base case: If the string has 0 or 1 characters, return it. There is nothing to compare.
  2. Recursive case: Compare sequence[0] and sequence[1].
    • If they match, return sequence[0] + "G" + recurse(sequence[1:]).
    • If they differ, return sequence[0] + recurse(sequence[1:]).

Each call chops one character off the front and passes the remainder forward. The recursion bottoms out when only one character (or none) remains.

Tracing the Recursion

Let's trace mutateDnaPairs("AAGG") to see this in action:

Loading visualization...

The recursion walks down the string one character at a time. At each level, it checks whether the current character matches its neighbor and decides whether to insert a G.

How the Result Builds Up

As the recursion unwinds, each level prepends its contribution to the result from below:

Loading visualization...

Starting from the base case "G", the result grows rightward as each recursive frame adds its piece. The final output is "AGAGGG".

Implementation

public class Solution {
  public String mutateDnaPairs(String sequence) {
    // Base case: nothing to compare
    if (sequence.length() <= 1) {
      return sequence;
    }

    char firstChar = sequence.charAt(0);

    // If the first two characters match, insert G between them
    return firstChar +
             (firstChar == sequence.charAt(1) ? "G" : "") +
             mutateDnaPairs(sequence.substring(1));
  }
}

The ternary expression keeps the logic compact. When firstChar equals the next character, we concatenate "G"; otherwise we concatenate an empty string, effectively a no-op.

Why substring(1) Works

sequence.substring(1) returns everything after the first character. This guarantees each recursive call processes a strictly shorter string, so we always converge on the base case. For the input "AAGG":

  • Call 1: "AAGG" - processes A, recurses on "AGG"
  • Call 2: "AGG" - processes A, recurses on "GG"
  • Call 3: "GG" - processes G, recurses on "G"
  • Call 4: "G" - base case, returns "G"

Complexity Analysis

Time Complexity: O(n)

Each character in the string is visited exactly once across all recursive calls. The comparison and conditional insertion are O(1) per call, giving O(n) total.

A subtle note: in Java, substring() creates a new String object, and string concatenation with + also allocates new objects. In the worst case (all characters identical), the concatenation at each level copies an increasingly long string, making the true cost closer to O(n^2). For interview purposes, O(n) is the standard answer, but mentioning this nuance shows depth.

Space Complexity: O(n)

The call stack holds one frame per character, so the recursion depth is O(n). The output string also occupies O(n) space. Combined, space usage is O(n).

Iterative Alternative

An iterative version using StringBuilder would eliminate the call stack overhead:

public String mutateDnaPairsIterative(String sequence) {
    StringBuilder result = new StringBuilder();
    for (int i = 0; i < sequence.length(); i++) {
        result.append(sequence.charAt(i));
        if (i + 1 < sequence.length() && sequence.charAt(i) == sequence.charAt(i + 1)) {
            result.append('G');
        }
    }
    return result.toString();
}

This achieves true O(n) time with O(n) space for the output only. Mention it as a follow-up to show engineering judgment.

Common Pitfalls

  1. Forgetting the base case: Without if (sequence.length() <= 1), the recursion never stops. An empty or single-character string has no pair to compare.

  2. Off-by-one on substring: Using substring(2) instead of substring(1) would skip characters. The recursive call always advances by exactly one character.

  3. Checking the wrong pair: The comparison must be between positions 0 and 1, not between positions 0 and the last character.

  4. Stack overflow on long strings: For inputs approaching 1000 characters, the recursive approach uses 1000 stack frames. Java's default stack is large enough, but in languages with smaller stacks you may need the iterative approach.

Interview Tips

When you encounter this problem:

  1. Clarify the alphabet: Confirm the input contains only A, T, C, G. Ask about empty strings and single characters.

  2. State the approach clearly: "I'll use recursion. At each call, I compare the first two characters, optionally insert a G, and recurse on the rest of the string."

  3. Draw a short trace: Walk through "AA" or "AAG" on the whiteboard. Three characters is enough to show the pattern.

  4. Mention trade-offs: "The recursive version is clean but uses O(n) stack space. An iterative version with StringBuilder avoids that."

  5. Watch for the follow-up: The interviewer might ask you to generalize the insertion character, handle different rules for different pairs, or process the string iteratively.

Key Takeaways

  • The recursive decomposition is "handle one character, recurse on the rest." This pattern shows up in dozens of string and list problems.
  • The base case is strings of length 0 or 1. Always identify the smallest input that requires no work.
  • String concatenation in Java creates new objects at every level. For production code, prefer StringBuilder; for interviews, the recursive version is fine unless the interviewer asks for optimization.
  • Tracing two or three calls by hand is the fastest way to verify your logic before writing code.
  • The iterative version is worth mentioning as a follow-up to demonstrate awareness of stack depth and performance trade-offs.

Practice and Related Problems

Once you're comfortable with this recursive string builder pattern, try these related challenges:

  • Palindrome Tester - another character-by-character string recursion
  • Recursive String Permutations - builds on the same shrinking-substring idea but with branching
  • Find and Replace CSV Delimiter - iterative string processing with conditional insertion

This problem and hundreds of others are available on Firecode, where over 50,000 engineers have prepared for technical interviews at Microsoft, Amazon, Google, and other top companies. Consistent practice with problems like this builds the recursive thinking muscle that pays off in harder interviews.

Solutions in Other Languages

Python

class Solution:
    def mutate_dna_pairs(self, sequence: str) -> str:
        if len(sequence) <= 1:
            return sequence

        first_char = sequence[0]
        mutator = "G" if first_char == sequence[1] else ""

        return first_char + mutator + self.mutate_dna_pairs(sequence[1:])

Python's string slicing with sequence[1:] is the equivalent of Java's substring(1). The conditional expression keeps the logic on one line.

JavaScript

class Solution {
  mutateDnaPairs(sequence) {
    if (sequence.length <= 1) {
      return sequence;
    }

    const firstChar = sequence.charAt(0);

    return firstChar +
      (firstChar === sequence[1] ? "G" : "") +
      this.mutateDnaPairs(sequence.substring(1));
  }
}

TypeScript

class Solution {
  mutateDnaPairs(sequence: string): string {
    if (sequence.length <= 1) {
      return sequence;
    }

    const firstChar = sequence.charAt(0);

    return firstChar +
      (firstChar === sequence[1] ? "G" : "") +
      this.mutateDnaPairs(sequence.substring(1));
  }
}

C++

class Solution {
public:
  std::string mutateDnaPairs(std::string sequence) {
    if (sequence.length() <= 1) {
      return sequence;
    }

    std::string result;
    result.push_back(sequence.at(0));

    if (sequence.at(0) == sequence.at(1)) {
      result.push_back('G');
    }

    result.append(mutateDnaPairs(sequence.substr(1)));
    return result;
  }
};

The C++ version builds the result with push_back and append rather than string concatenation, which is more idiomatic for performance-sensitive C++ code.

Go

func (s *Solution) MutateDnaPairs(sequence string) string {
    if len(sequence) <= 1 {
        return sequence
    }

    var builder strings.Builder
    builder.WriteByte(sequence[0])

    if sequence[0] == sequence[1] {
        builder.WriteByte('G')
    }

    builder.WriteString(s.MutateDnaPairs(sequence[1:]))
    return builder.String()
}

Go's strings.Builder avoids repeated string allocations. Slice syntax sequence[1:] handles the substring naturally.

Scala

class Solution {
  def mutateDnaPairs(sequence: String): String = {
    if (sequence.length <= 1) return sequence

    val firstChar = sequence(0)
    val mutator = if (firstChar == sequence(1)) "G" else ""

    firstChar + mutator + mutateDnaPairs(sequence.substring(1))
  }
}

Kotlin

class Solution {
    fun mutateDnaPairs(sequence: String): String {
        if (sequence.length <= 1) {
            return sequence
        }

        val firstChar = sequence[0]

        return firstChar +
                (if (firstChar == sequence[1]) "G" else "") +
                mutateDnaPairs(sequence.substring(1))
    }
}

Swift

func mutateDnaPairs(_ sequence: String) -> String {
    if sequence.count <= 1 {
        return sequence
    }

    let firstChar = sequence[sequence.startIndex]

    return String(firstChar) +
            (firstChar == sequence[sequence.index(after: sequence.startIndex)] ? "G" : "") +
            mutateDnaPairs(String(sequence.dropFirst()))
}

Swift's string indexing is more verbose than other languages since it uses String.Index rather than integer subscripts.

Rust

pub fn mutate_dna_pairs(&self, sequence: String) -> String {
    if sequence.len() <= 1 {
        return sequence;
    }

    let chars: Vec<char> = sequence.chars().collect();
    let first_char = chars[0];
    let rest = self.mutate_dna_pairs(sequence[1..].to_string());

    if first_char == chars[1] {
        format!("{}G{}", first_char, rest)
    } else {
        format!("{}{}", first_char, rest)
    }
}

Rust requires explicit conversion between string slices and owned Strings. The chars().collect() pattern handles Unicode-safe character access.

Ruby

class Solution
  def mutate_dna_pairs(sequence)
    return sequence if sequence.length <= 1

    first_char = sequence[0]
    mutator = first_char == sequence[1] ? "G" : ""

    first_char + mutator + mutate_dna_pairs(sequence[1..])
  end
end

Ruby's range syntax sequence[1..] returns everything from index 1 onward.

Dart

String mutateDnaPairs(String sequence) {
    if (sequence.length <= 1) {
        return sequence;
    }

    String firstChar = sequence[0];

    return firstChar +
        (firstChar == sequence[1] ? "G" : "") +
        mutateDnaPairs(sequence.substring(1));
}

C#

public string mutateDnaPairs(string sequence) {
    if (sequence.Length <= 1) {
        return sequence;
    }

    char firstChar = sequence[0];

    return firstChar +
            (firstChar == sequence[1] ? "G" : "") +
            mutateDnaPairs(sequence.Substring(1));
}

PHP

public function mutateDnaPairs(string $sequence): string {
    if (strlen($sequence) <= 1) {
        return $sequence;
    }

    $firstChar = $sequence[0];
    $mutator = ($firstChar === $sequence[1]) ? "G" : "";

    return $firstChar . $mutator . $this->mutateDnaPairs(substr($sequence, 1));
}

PHP uses the dot operator for string concatenation and substr() for slicing.

Frequently Asked Questions

Why is recursion a good fit for the mutate DNA pairs problem?
Recursion works well because the problem has a natural substructure: decide whether to insert a G between the first two characters, then solve the same problem for the remaining substring. Each recursive call shrinks the input by one character, and the base case (empty or single character) is trivial.
What is the time complexity of the recursive DNA mutation solution?
The time complexity is O(n) where n is the length of the input string. Each character is examined exactly once across all recursive calls. However, string concatenation in languages like Java creates new String objects at each level, so the effective cost can be O(n^2) without a StringBuilder. For interview purposes, O(n) is the accepted answer.
What is the space complexity of the mutate DNA pairs solution?
The space complexity is O(n) due to two factors: the recursive call stack uses O(n) frames (one per character), and the result string itself requires O(n) space. In languages where strings are immutable, intermediate string copies also contribute O(n) space per level.
Can this problem be solved iteratively instead of recursively?
Yes. You can iterate through the string with a loop, appending each character to a StringBuilder or list, and inserting a G whenever two consecutive characters match. The iterative approach avoids the O(n) call stack overhead and is often more efficient in practice, but interviewers typically ask for the recursive version to test your understanding of recursion.
How does the base case prevent infinite recursion?
The base case checks if the string has zero or one character. Since each recursive call passes sequence.substring(1), the string shrinks by exactly one character per call. Eventually the string reaches length 1 or 0, the base case returns immediately, and the recursion unwinds.
What happens when no adjacent characters match?
When no adjacent characters are identical, for example ATCG, no G is ever inserted. The recursion still processes every character, but the conditional check always evaluates to false, so the original string is reconstructed unchanged.
Why does the problem only insert G and not other nucleotides?
The G insertion is an artificial constraint designed for the coding exercise. In real molecular biology, mutations can involve any nucleotide (A, T, C, or G) and include substitutions, insertions, and deletions. The fixed G keeps the problem focused on recursion and string building rather than biology.
How would you handle very long DNA sequences efficiently?
For very long sequences, use an iterative approach with a mutable string builder (StringBuilder in Java, list joining in Python, strings.Builder in Go). This avoids stack overflow from deep recursion and eliminates the overhead of creating intermediate string objects at each recursive level.