Skip to content
On this page

Heap & Priority Queue: Top K Frequent

The Top K Frequent pattern is one of the most common heap applications in coding interviews. It involves finding the K most frequently occurring elements in a dataset, which naturally requires both counting frequencies and maintaining the top K elements efficiently.

This pattern demonstrates the power of heaps for "top K" problems and showcases different optimization strategies based on constraints.


Common Interview Questions

  1. Top K Frequent Elements (LC 347)

    • Problem: Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.
    • Heap Signal: This is a classic top K problem that requires tracking frequencies and maintaining the K most frequent elements.
  2. Top K Frequent Words (LC 692)

    • Problem: Given an array of strings words and an integer k, return the k most frequent strings. Return the answer sorted by frequency from highest to lowest. If two words have the same frequency, sort them lexicographically.
    • Heap Signal: Similar to top K frequent elements but with additional lexicographical ordering requirements.
  3. Sort Characters By Frequency (LC 451)

    • Problem: Given a string s, sort it in decreasing order based on the frequency of the characters. The frequency of a character is the number of times it appears in the string.
    • Heap Signal: Another frequency-based sorting problem that benefits from heap-based approaches.

Core Logic & Approach

The pattern typically involves two phases:

  1. Count frequencies of all elements
  2. Find top K using a heap

Why use a heap?

  • Direct sorting would be O(n log n)
  • Heap approach can be O(n log k) where k << n
  • Memory efficient for streaming data

Key insight: For top K problems, maintain a heap of size K rather than sorting all elements.


Approach 1: Min-Heap of Size K

Use a min-heap of size K. The heap stores the K most frequent elements, with the least frequent at the root.

Core Template:

python
def top_k_frequent_min_heap(nums, k):
    from collections import Counter
    import heapq
    
    # Step 1: Count frequencies
    count = Counter(nums)
    
    # Step 2: Use min-heap of size k
    heap = []
    for num, freq in count.items():
        # HEAP ORDERING EXPLANATION:
        # Python's heapq is always a MIN-HEAP (smallest element at root)
        # Default order: uses first element of tuple, then second, etc.
        # (freq, num) means: order by frequency first, then by number value
        
        heapq.heappush(heap, (freq, num))
        
        # Keep only k elements - remove the SMALLEST frequency
        # This ensures we retain the k LARGEST frequencies
        if len(heap) > k:
            heapq.heappop(heap)  # Remove least frequent (min-heap root)
    
    # Step 3: Extract results (heap contains k most frequent)
    return [num for freq, num in heap]

# Time: O(n + m log k) where m = unique elements, n = total elements
# Space: O(m + k)

🔄 Min vs Max Heap Variations:

python
# FOR TOP K MOST FREQUENT (what we usually want):
# Use min-heap of size k - keep removing smallest to retain largest
heapq.heappush(heap, (freq, num))  # Standard approach above

# FOR TOP K LEAST FREQUENT (opposite problem):
# Use max-heap of size k - keep removing largest to retain smallest
heapq.heappush(heap, (-freq, num))  # Negate frequency for max-heap behavior

# Alternative for least frequent: min-heap with all elements, take first k
heap = [(freq, num) for num, freq in count.items()]
heapq.heapify(heap)
result = [heapq.heappop(heap)[1] for _ in range(k)]  # k least frequent

🎯 Heap Ordering Control:

python
# IMPORTANT: Python heapq is ALWAYS a min-heap - you CANNOT make it a true max-heap!
# But you can simulate max-heap behavior using negation tricks

# DEFAULT: Python heapq uses lexicographic tuple comparison
# (a, b) < (c, d) if a < c, or (a == c and b < d)

# FREQUENCY FIRST, THEN VALUE:
heap.append((frequency, value))  # Order by freq, break ties by value

# VALUE FIRST, THEN FREQUENCY:
heap.append((value, frequency))  # Order by value, break ties by freq

# SIMULATE MAX-HEAP BEHAVIOR:
heap.append((-frequency, value))     # Max frequency first (most frequent at root)
heap.append((frequency, -value))     # Min frequency, but max value for ties

# WORKING WITH CHARACTERS/STRINGS:
# Characters use ASCII values for comparison: 'a' < 'b' < 'c' < ... < 'z'
heap.append((frequency, char))       # 'a' comes before 'z' (lexicographical)
heap.append((frequency, -ord(char))) # 'z' comes before 'a' (reverse lexicographical)

# STRING EXAMPLES:
heap.append((freq, "apple"))         # "apple" < "banana" < "cherry"
heap.append((freq, string[::-1]))    # Reverse string for different ordering

📝 Character Ordering Examples:

python
import heapq

# CHARACTERS - Default lexicographical (a-z)
char_heap = []
for char, freq in [('z', 1), ('a', 3), ('m', 2)]:
    heapq.heappush(char_heap, (freq, char))

print(char_heap)  # [(1, 'z'), (3, 'a'), (2, 'm')]
# Ordered by frequency first: 1 < 2 < 3

# When frequencies are equal, chars ordered: 'a' < 'm' < 'z'
equal_freq_heap = []
for char in ['z', 'a', 'm']:
    heapq.heappush(equal_freq_heap, (2, char))
print(equal_freq_heap)  # [(2, 'a'), (2, 'z'), (2, 'm')]

# REVERSE CHARACTER ORDER - Use negative ASCII values
reverse_char_heap = []
for char, freq in [('z', 1), ('a', 3), ('m', 2)]:
    heapq.heappush(reverse_char_heap, (freq, -ord(char)))
    
# To get back original character: chr(-ascii_val)
result = []
while reverse_char_heap:
    freq, neg_ascii = heapq.heappop(reverse_char_heap)
    original_char = chr(-neg_ascii)
    result.append(original_char)
print(result)  # Characters in reverse lexicographical order for same frequency

# STRINGS - Lexicographical comparison
string_heap = []
for word, freq in [("zebra", 1), ("apple", 1), ("banana", 1)]:
    heapq.heappush(string_heap, (freq, word))
print(string_heap)  # [(1, 'apple'), (1, 'zebra'), (1, 'banana')]
# "apple" < "banana" < "zebra"

Example Walkthrough:

python
nums = [1,1,1,2,2,3], k = 2

# Step 1: count = [1: 3, 2: 2, 3: 1]

# Step 2: Build heap
# Process (1, 3): heap = [(3, 1)]
# Process (2, 2): heap = [(2, 2), (3, 1)]
# Process (3, 1): heap = [(1, 3), (3, 1), (2, 2)]
#                 len(heap) > k, so pop (1, 3)
#                 heap = [(2, 2), (3, 1)]

# Step 3: Result = [2, 1] (any order acceptable)

Approach 2: Max-Heap with All Elements

Build a max-heap with all frequencies and extract the top K.

Core Template:

python
def top_k_frequent_max_heap(nums, k):
    from collections import Counter
    import heapq
    
    # Step 1: Count frequencies
    count = Counter(nums)
    
    # Step 2: Build max-heap (use negative frequencies)
    heap = [(-freq, num) for num, freq in count.items()]
    heapq.heapify(heap)
    
    # Step 3: Extract top k
    result = []
    for _ in range(k):
        neg_freq, num = heapq.heappop(heap)
        result.append(num)
    
    return result

# Time: O(n + m log m) where m = unique elements
# Space: O(m)

Approach 3: Using heapq.nlargest

Python provides a convenient built-in for this exact pattern.

Core Template:

python
def top_k_frequent_builtin(nums, k):
    from collections import Counter
    import heapq
    
    count = Counter(nums)
    
    # Extract k most frequent using built-in
    return heapq.nlargest(k, count.keys(), key=count.get)

# Time: O(n + m log k) - similar to min-heap approach
# Space: O(m)

Advanced: Top K Frequent Words (with Lexicographical Order)

When additional sorting criteria are needed:

Core Template:

python
def top_k_frequent_words(words, k):
    from collections import Counter
    import heapq
    
    count = Counter(words)
    
    # Use min-heap with custom comparison
    # For min-heap: we want to keep the "larger" elements
    # So we need to invert the comparison logic
    heap = []
    
    for word, freq in count.items():
        heapq.heappush(heap, (freq, word))
        if len(heap) > k:
            heapq.heappop(heap)
    
    # Extract and reverse for correct order
    result = []
    while heap:
        freq, word = heapq.heappop(heap)
        result.append(word)
    
    # Result is in ascending order of frequency, we want descending
    return result[::-1]

# Alternative: Use custom comparison for better handling
def top_k_frequent_words_custom(words, k):
    from collections import Counter
    import heapq
    
    count = Counter(words)
    
    # Use tuple comparison: (-freq, word)
    # This gives us max-heap behavior for frequency
    # and lexicographical order for ties
    heap = [(-freq, word) for word, freq in count.items()]
    heapq.heapify(heap)
    
    result = []
    for _ in range(k):
        neg_freq, word = heapq.heappop(heap)
        result.append(word)
    
    return result

Comparison Logic Explanation:

python
# For words with different frequencies: higher frequency wins
(-3, "apple") < (-2, "banana")  # True, "apple" has higher priority

# For words with same frequency: lexicographically smaller wins  
(-2, "apple") < (-2, "banana")  # True, "apple" comes first alphabetically

Approach Comparison

ApproachTime ComplexitySpace ComplexityBest Use Case
Min-heap of size KO(n + m log k)O(m + k)When k << m (few results needed)
Max-heap all elementsO(n + m log m)O(m)When k is close to m
heapq.nlargestO(n + m log k)O(m)Simple cases, clean code
Bucket sortO(n + m)O(n)When frequencies are bounded

Where n = total elements, m = unique elements, k = results needed.


Optimization: Bucket Sort Approach

When the frequency range is limited (≤ n), bucket sort can achieve O(n) time:

Core Template:

python
def top_k_frequent_bucket(nums, k):
    from collections import Counter
    
    count = Counter(nums)
    n = len(nums)
    
    # Create buckets indexed by frequency
    buckets = [[] for _ in range(n + 1)]
    for num, freq in count.items():
        buckets[freq].append(num)
    
    # Collect top k from highest frequency buckets
    result = []
    for freq in range(n, 0, -1):
        for num in buckets[freq]:
            result.append(num)
            if len(result) == k:
                return result
    
    return result

# Time: O(n)
# Space: O(n)

Example Walkthrough

Let's trace through Top K Frequent Elements with nums = [1,1,1,2,2,3], k = 2:

Min-Heap Approach:

  1. Count frequencies:

    • Counter: [1: 3, 2: 2, 3: 1]
  2. Build min-heap of size k=2:

    • Process (1, 3): heap = [(3, 1)]
    • Process (2, 2): heap = [(2, 2), (3, 1)]
    • Process (3, 1):
      • Push: heap = [(1, 3), (3, 1), (2, 2)]
      • Size > k, so pop minimum: (1, 3)
      • Final: heap = [(2, 2), (3, 1)]
  3. Extract results: [2, 1]

Why this works:

  • Heap maintains the k largest frequencies
  • When heap size exceeds k, we remove the smallest frequency
  • Final heap contains exactly the k most frequent elements

Max-Heap Approach:

  1. Count frequencies: [1: 3, 2: 2, 3: 1]

  2. Build max-heap with all elements:

    • heap = [(-3, 1), (-2, 2), (-1, 3)]
    • After heapify: [(-3, 1), (-2, 2), (-1, 3)]
  3. Extract top k=2:

    • Pop (-3, 1): result = [1]
    • Pop (-2, 2): result = [1, 2]

Result: [1, 2]


Tricky Parts & Decision Points

  1. Choosing the right approach based on k:
python
# If k is small compared to unique elements
if k << len(set(nums)):
    use_min_heap_of_size_k()
# If k is large
else:
    use_max_heap_all_elements()
  1. Handling ties in frequency:
python
# Problem might specify tie-breaking rules
# Use appropriate tuple ordering
heap.append((freq, -ord(word[0]), word))  # Custom tie-breaking
  1. Max-heap simulation in Python:
python
# Remember to negate for max-heap behavior
heapq.heappush(heap, (-frequency, element))
freq, elem = heapq.heappop(heap)
actual_freq = -freq  # Don't forget to negate back
  1. Result ordering:
python
# Min-heap gives ascending order, might need to reverse
result = [num for freq, num in heap]
# vs
result = []
while heap:
    result.append(heapq.heappop(heap)[1])
return result[::-1]  # Reverse for descending order
  1. Edge cases:
python
# Handle when k >= number of unique elements
if k >= len(count):
    return list(count.keys())

# Handle empty input
if not nums:
    return []
  1. Memory considerations for streaming data:
python
# For very large datasets, min-heap of size k is crucial
# It uses O(k) space instead of O(n) space
def top_k_frequent_streaming(stream, k):
    count = {}
    heap = []
    
    for num in stream:
        count[num] = count.get(num, 0) + 1
        
        # Rebuild heap periodically or use more sophisticated approach
        if len(count) % 1000 == 0:  # Rebuild every 1000 elements
            heap = rebuild_top_k_heap(count, k)