Heap & Priority Queue: Top K Frequent
The Top K Frequent pattern is one of the most common heap applications in coding interviews. It involves finding the K most frequently occurring elements in a dataset, which naturally requires both counting frequencies and maintaining the top K elements efficiently.
This pattern demonstrates the power of heaps for "top K" problems and showcases different optimization strategies based on constraints.
Common Interview Questions
Top K Frequent Elements (LC 347)
- Problem: Given an integer array
numsand an integerk, return thekmost frequent elements. You may return the answer in any order. - Heap Signal: This is a classic top K problem that requires tracking frequencies and maintaining the K most frequent elements.
- Problem: Given an integer array
Top K Frequent Words (LC 692)
- Problem: Given an array of strings
wordsand an integerk, return thekmost frequent strings. Return the answer sorted by frequency from highest to lowest. If two words have the same frequency, sort them lexicographically. - Heap Signal: Similar to top K frequent elements but with additional lexicographical ordering requirements.
- Problem: Given an array of strings
Sort Characters By Frequency (LC 451)
- Problem: Given a string
s, sort it in decreasing order based on the frequency of the characters. The frequency of a character is the number of times it appears in the string. - Heap Signal: Another frequency-based sorting problem that benefits from heap-based approaches.
- Problem: Given a string
Core Logic & Approach
The pattern typically involves two phases:
- Count frequencies of all elements
- Find top K using a heap
Why use a heap?
- Direct sorting would be O(n log n)
- Heap approach can be O(n log k) where k << n
- Memory efficient for streaming data
Key insight: For top K problems, maintain a heap of size K rather than sorting all elements.
Approach 1: Min-Heap of Size K
Use a min-heap of size K. The heap stores the K most frequent elements, with the least frequent at the root.
Core Template:
def top_k_frequent_min_heap(nums, k):
from collections import Counter
import heapq
# Step 1: Count frequencies
count = Counter(nums)
# Step 2: Use min-heap of size k
heap = []
for num, freq in count.items():
# HEAP ORDERING EXPLANATION:
# Python's heapq is always a MIN-HEAP (smallest element at root)
# Default order: uses first element of tuple, then second, etc.
# (freq, num) means: order by frequency first, then by number value
heapq.heappush(heap, (freq, num))
# Keep only k elements - remove the SMALLEST frequency
# This ensures we retain the k LARGEST frequencies
if len(heap) > k:
heapq.heappop(heap) # Remove least frequent (min-heap root)
# Step 3: Extract results (heap contains k most frequent)
return [num for freq, num in heap]
# Time: O(n + m log k) where m = unique elements, n = total elements
# Space: O(m + k)🔄 Min vs Max Heap Variations:
# FOR TOP K MOST FREQUENT (what we usually want):
# Use min-heap of size k - keep removing smallest to retain largest
heapq.heappush(heap, (freq, num)) # Standard approach above
# FOR TOP K LEAST FREQUENT (opposite problem):
# Use max-heap of size k - keep removing largest to retain smallest
heapq.heappush(heap, (-freq, num)) # Negate frequency for max-heap behavior
# Alternative for least frequent: min-heap with all elements, take first k
heap = [(freq, num) for num, freq in count.items()]
heapq.heapify(heap)
result = [heapq.heappop(heap)[1] for _ in range(k)] # k least frequent🎯 Heap Ordering Control:
# IMPORTANT: Python heapq is ALWAYS a min-heap - you CANNOT make it a true max-heap!
# But you can simulate max-heap behavior using negation tricks
# DEFAULT: Python heapq uses lexicographic tuple comparison
# (a, b) < (c, d) if a < c, or (a == c and b < d)
# FREQUENCY FIRST, THEN VALUE:
heap.append((frequency, value)) # Order by freq, break ties by value
# VALUE FIRST, THEN FREQUENCY:
heap.append((value, frequency)) # Order by value, break ties by freq
# SIMULATE MAX-HEAP BEHAVIOR:
heap.append((-frequency, value)) # Max frequency first (most frequent at root)
heap.append((frequency, -value)) # Min frequency, but max value for ties
# WORKING WITH CHARACTERS/STRINGS:
# Characters use ASCII values for comparison: 'a' < 'b' < 'c' < ... < 'z'
heap.append((frequency, char)) # 'a' comes before 'z' (lexicographical)
heap.append((frequency, -ord(char))) # 'z' comes before 'a' (reverse lexicographical)
# STRING EXAMPLES:
heap.append((freq, "apple")) # "apple" < "banana" < "cherry"
heap.append((freq, string[::-1])) # Reverse string for different ordering📝 Character Ordering Examples:
import heapq
# CHARACTERS - Default lexicographical (a-z)
char_heap = []
for char, freq in [('z', 1), ('a', 3), ('m', 2)]:
heapq.heappush(char_heap, (freq, char))
print(char_heap) # [(1, 'z'), (3, 'a'), (2, 'm')]
# Ordered by frequency first: 1 < 2 < 3
# When frequencies are equal, chars ordered: 'a' < 'm' < 'z'
equal_freq_heap = []
for char in ['z', 'a', 'm']:
heapq.heappush(equal_freq_heap, (2, char))
print(equal_freq_heap) # [(2, 'a'), (2, 'z'), (2, 'm')]
# REVERSE CHARACTER ORDER - Use negative ASCII values
reverse_char_heap = []
for char, freq in [('z', 1), ('a', 3), ('m', 2)]:
heapq.heappush(reverse_char_heap, (freq, -ord(char)))
# To get back original character: chr(-ascii_val)
result = []
while reverse_char_heap:
freq, neg_ascii = heapq.heappop(reverse_char_heap)
original_char = chr(-neg_ascii)
result.append(original_char)
print(result) # Characters in reverse lexicographical order for same frequency
# STRINGS - Lexicographical comparison
string_heap = []
for word, freq in [("zebra", 1), ("apple", 1), ("banana", 1)]:
heapq.heappush(string_heap, (freq, word))
print(string_heap) # [(1, 'apple'), (1, 'zebra'), (1, 'banana')]
# "apple" < "banana" < "zebra"Example Walkthrough:
nums = [1,1,1,2,2,3], k = 2
# Step 1: count = [1: 3, 2: 2, 3: 1]
# Step 2: Build heap
# Process (1, 3): heap = [(3, 1)]
# Process (2, 2): heap = [(2, 2), (3, 1)]
# Process (3, 1): heap = [(1, 3), (3, 1), (2, 2)]
# len(heap) > k, so pop (1, 3)
# heap = [(2, 2), (3, 1)]
# Step 3: Result = [2, 1] (any order acceptable)Approach 2: Max-Heap with All Elements
Build a max-heap with all frequencies and extract the top K.
Core Template:
def top_k_frequent_max_heap(nums, k):
from collections import Counter
import heapq
# Step 1: Count frequencies
count = Counter(nums)
# Step 2: Build max-heap (use negative frequencies)
heap = [(-freq, num) for num, freq in count.items()]
heapq.heapify(heap)
# Step 3: Extract top k
result = []
for _ in range(k):
neg_freq, num = heapq.heappop(heap)
result.append(num)
return result
# Time: O(n + m log m) where m = unique elements
# Space: O(m)Approach 3: Using heapq.nlargest
Python provides a convenient built-in for this exact pattern.
Core Template:
def top_k_frequent_builtin(nums, k):
from collections import Counter
import heapq
count = Counter(nums)
# Extract k most frequent using built-in
return heapq.nlargest(k, count.keys(), key=count.get)
# Time: O(n + m log k) - similar to min-heap approach
# Space: O(m)Advanced: Top K Frequent Words (with Lexicographical Order)
When additional sorting criteria are needed:
Core Template:
def top_k_frequent_words(words, k):
from collections import Counter
import heapq
count = Counter(words)
# Use min-heap with custom comparison
# For min-heap: we want to keep the "larger" elements
# So we need to invert the comparison logic
heap = []
for word, freq in count.items():
heapq.heappush(heap, (freq, word))
if len(heap) > k:
heapq.heappop(heap)
# Extract and reverse for correct order
result = []
while heap:
freq, word = heapq.heappop(heap)
result.append(word)
# Result is in ascending order of frequency, we want descending
return result[::-1]
# Alternative: Use custom comparison for better handling
def top_k_frequent_words_custom(words, k):
from collections import Counter
import heapq
count = Counter(words)
# Use tuple comparison: (-freq, word)
# This gives us max-heap behavior for frequency
# and lexicographical order for ties
heap = [(-freq, word) for word, freq in count.items()]
heapq.heapify(heap)
result = []
for _ in range(k):
neg_freq, word = heapq.heappop(heap)
result.append(word)
return resultComparison Logic Explanation:
# For words with different frequencies: higher frequency wins
(-3, "apple") < (-2, "banana") # True, "apple" has higher priority
# For words with same frequency: lexicographically smaller wins
(-2, "apple") < (-2, "banana") # True, "apple" comes first alphabeticallyApproach Comparison
| Approach | Time Complexity | Space Complexity | Best Use Case |
|---|---|---|---|
| Min-heap of size K | O(n + m log k) | O(m + k) | When k << m (few results needed) |
| Max-heap all elements | O(n + m log m) | O(m) | When k is close to m |
| heapq.nlargest | O(n + m log k) | O(m) | Simple cases, clean code |
| Bucket sort | O(n + m) | O(n) | When frequencies are bounded |
Where n = total elements, m = unique elements, k = results needed.
Optimization: Bucket Sort Approach
When the frequency range is limited (≤ n), bucket sort can achieve O(n) time:
Core Template:
def top_k_frequent_bucket(nums, k):
from collections import Counter
count = Counter(nums)
n = len(nums)
# Create buckets indexed by frequency
buckets = [[] for _ in range(n + 1)]
for num, freq in count.items():
buckets[freq].append(num)
# Collect top k from highest frequency buckets
result = []
for freq in range(n, 0, -1):
for num in buckets[freq]:
result.append(num)
if len(result) == k:
return result
return result
# Time: O(n)
# Space: O(n)Example Walkthrough
Let's trace through Top K Frequent Elements with nums = [1,1,1,2,2,3], k = 2:
Min-Heap Approach:
Count frequencies:
- Counter: [1: 3, 2: 2, 3: 1]
Build min-heap of size k=2:
- Process (1, 3): heap = [(3, 1)]
- Process (2, 2): heap = [(2, 2), (3, 1)]
- Process (3, 1):
- Push: heap = [(1, 3), (3, 1), (2, 2)]
- Size > k, so pop minimum: (1, 3)
- Final: heap = [(2, 2), (3, 1)]
Extract results: [2, 1]
Why this works:
- Heap maintains the k largest frequencies
- When heap size exceeds k, we remove the smallest frequency
- Final heap contains exactly the k most frequent elements
Max-Heap Approach:
Count frequencies: [1: 3, 2: 2, 3: 1]
Build max-heap with all elements:
- heap = [(-3, 1), (-2, 2), (-1, 3)]
- After heapify: [(-3, 1), (-2, 2), (-1, 3)]
Extract top k=2:
- Pop (-3, 1): result = [1]
- Pop (-2, 2): result = [1, 2]
Result: [1, 2]
Tricky Parts & Decision Points
- Choosing the right approach based on k:
# If k is small compared to unique elements
if k << len(set(nums)):
use_min_heap_of_size_k()
# If k is large
else:
use_max_heap_all_elements()- Handling ties in frequency:
# Problem might specify tie-breaking rules
# Use appropriate tuple ordering
heap.append((freq, -ord(word[0]), word)) # Custom tie-breaking- Max-heap simulation in Python:
# Remember to negate for max-heap behavior
heapq.heappush(heap, (-frequency, element))
freq, elem = heapq.heappop(heap)
actual_freq = -freq # Don't forget to negate back- Result ordering:
# Min-heap gives ascending order, might need to reverse
result = [num for freq, num in heap]
# vs
result = []
while heap:
result.append(heapq.heappop(heap)[1])
return result[::-1] # Reverse for descending order- Edge cases:
# Handle when k >= number of unique elements
if k >= len(count):
return list(count.keys())
# Handle empty input
if not nums:
return []- Memory considerations for streaming data:
# For very large datasets, min-heap of size k is crucial
# It uses O(k) space instead of O(n) space
def top_k_frequent_streaming(stream, k):
count = {}
heap = []
for num in stream:
count[num] = count.get(num, 0) + 1
# Rebuild heap periodically or use more sophisticated approach
if len(count) % 1000 == 0: # Rebuild every 1000 elements
heap = rebuild_top_k_heap(count, k)