Heap & Priority Queue: Basics
A heap is a specialized tree-based data structure that satisfies the heap property, and a priority queue is an abstract data type that heaps commonly implement. Heaps are essential for efficiently maintaining order among dynamically changing elements and are particularly useful for "top K" problems and scenarios requiring constant access to extremal values.
Understanding heaps is crucial for coding interviews as they appear in many optimization problems and provide elegant solutions to seemingly complex challenges.
What It Is & When To Use It
A heap is a complete binary tree where each node satisfies the heap property relative to its children. This property ensures efficient access to the minimum or maximum element.
Core Idea: Maintain a partially ordered structure that gives you instant access to the smallest (min-heap) or largest (max-heap) element while allowing efficient insertion and deletion.
Signals for Heap/Priority Queue: Look for these keywords and problem types. If you see one, immediately think "Can I use a heap?"
"Top K" Problems:
- Find the top K largest/smallest elements
- K closest points, K most frequent elements
- K largest/smallest in a stream
Dynamic Extremes:
- Running median (requires two heaps)
- Sliding window maximum/minimum
- Task scheduling with priorities
Merge Operations:
- Merge K sorted lists/arrays
- K-way merge problems
Graph Algorithms:
- Dijkstra's shortest path algorithm
- Prim's minimum spanning tree
- A search* algorithm
Optimization with Ordering:
- Problems requiring maintaining sorted order as elements change
- Greedy algorithms that need the next best choice
The Two Types of Heaps:
- Min-Heap: Parent ≤ children (root is minimum)
- Max-Heap: Parent ≥ children (root is maximum)
Heap Properties & Operations
Core Properties
- Complete Binary Tree: All levels filled except possibly the last, which fills left to right
- Heap Property: Each parent satisfies the ordering constraint with its children
- Array Representation: For node at index
i:- Left child:
2*i + 1 - Right child:
2*i + 2 - Parent:
(i-1) // 2
- Left child:
Essential Operations
| Operation | Time Complexity | Description |
|---|---|---|
| Insert (push) | O(log n) | Add element and restore heap property |
| Extract Min/Max (pop) | O(log n) | Remove root and restore heap property |
| Peek (top) | O(1) | Access root without removing |
| Heapify | O(n) | Convert arbitrary array to heap |
| Build Heap | O(n) | Create heap from array of elements |
Python Heap Implementation
Python's heapq module provides a min-heap implementation:
Core Template (Min-Heap):
import heapq
# Create empty heap
heap = []
# Insert elements
heapq.heappush(heap, 10)
heapq.heappush(heap, 5)
heapq.heappush(heap, 15)
# Peek at minimum (don't remove)
if heap:
min_val = heap[0]
# Extract minimum
if heap:
min_val = heapq.heappop(heap)
# Convert list to heap in-place
nums = [10, 5, 15, 20, 3]
heapq.heapify(nums) # O(n) operation
# Get K largest/smallest
heapq.nlargest(k, nums) # Returns list of K largest
heapq.nsmallest(k, nums) # Returns list of K smallestMax-Heap Simulation:
# Python heapq only supports min-heap
# For max-heap, negate values
max_heap = []
heapq.heappush(max_heap, -10) # Push negative values
heapq.heappush(max_heap, -5)
heapq.heappush(max_heap, -15)
# Extract maximum (remember to negate back)
if max_heap:
max_val = -heapq.heappop(max_heap)
# Alternative: Use custom comparison with tuples
# (priority, value) where lower priority = higher importanceHeap with Custom Objects:
import heapq
from dataclasses import dataclass
@dataclass
class Task:
priority: int
name: str
def __lt__(self, other):
return self.priority < other.priority
# Use directly with heapq
tasks = []
heapq.heappush(tasks, Task(1, "High priority"))
heapq.heappush(tasks, Task(5, "Low priority"))
heapq.heappush(tasks, Task(3, "Medium priority"))
# Or use tuples (priority, data)
heap = []
heapq.heappush(heap, (1, "High priority"))
heapq.heappush(heap, (5, "Low priority"))
heapq.heappush(heap, (3, "Medium priority"))Common Patterns & Templates
1. Top K Elements Pattern
def top_k_elements(nums, k):
# Method 1: Use heapq.nlargest/nsmallest (simple)
return heapq.nlargest(k, nums)
# Method 2: Use min-heap of size k (memory efficient for large arrays)
heap = []
for num in nums:
heapq.heappush(heap, num)
if len(heap) > k:
heapq.heappop(heap) # Remove smallest
return heap # Contains k largest elements
def top_k_frequent(nums, k):
"""Find k most frequent elements"""
from collections import Counter
count = Counter(nums)
# Use max-heap with frequencies (negate for max behavior)
heap = [(-freq, num) for num, freq in count.items()]
heapq.heapify(heap)
result = []
for _ in range(k):
freq, num = heapq.heappop(heap)
result.append(num)
return result2. Running Median (Two Heaps)
class MedianFinder:
def __init__(self):
self.small = [] # Max-heap (use negative values)
self.large = [] # Min-heap
def addNum(self, num):
# Add to appropriate heap
if not self.small or num <= -self.small[0]:
heapq.heappush(self.small, -num)
else:
heapq.heappush(self.large, num)
# Balance heaps (ensure sizes differ by at most 1)
if len(self.small) > len(self.large) + 1:
val = -heapq.heappop(self.small)
heapq.heappush(self.large, val)
elif len(self.large) > len(self.small) + 1:
val = heapq.heappop(self.large)
heapq.heappush(self.small, -val)
def findMedian(self):
if len(self.small) > len(self.large):
return -self.small[0]
elif len(self.large) > len(self.small):
return self.large[0]
else:
return (-self.small[0] + self.large[0]) / 23. Merge K Sorted Lists
def merge_k_sorted_lists(lists):
"""Merge k sorted lists using min-heap"""
import heapq
heap = []
result = []
# Initialize heap with first element from each list
for i, lst in enumerate(lists):
if lst:
heapq.heappush(heap, (lst[0], i, 0)) # (value, list_index, element_index)
while heap:
val, list_idx, elem_idx = heapq.heappop(heap)
result.append(val)
# Add next element from same list if exists
if elem_idx + 1 < len(lists[list_idx]):
next_val = lists[list_idx][elem_idx + 1]
heapq.heappush(heap, (next_val, list_idx, elem_idx + 1))
return result4. Dijkstra's Algorithm Template
def dijkstra(graph, start):
"""Find shortest distances from start to all nodes"""
import heapq
from collections import defaultdict
# Distance dictionary
dist = defaultdict(lambda: float('inf'))
dist[start] = 0
# Priority queue: (distance, node)
pq = [(0, start)]
visited = set()
while pq:
d, node = heapq.heappop(pq)
if node in visited:
continue
visited.add(node)
# Explore neighbors
for neighbor, weight in graph[node]:
new_dist = d + weight
if new_dist < dist[neighbor]:
dist[neighbor] = new_dist
heapq.heappush(pq, (new_dist, neighbor))
return dict(dist)When to Choose Heap vs Alternatives
| Use Case | Heap | Alternative | Why Heap is Better |
|---|---|---|---|
| Top K elements | O(n log k) | Sorting: O(n log n) | More memory efficient, better for streaming |
| Running median | O(log n) insert | Sorted array: O(n) insert | Much faster insertions |
| Dynamic min/max | O(log n) | Linear scan: O(n) | Significant speedup for frequent queries |
| Merge K lists | O(n log k) | Naive merge: O(nk) | Better scalability with more lists |
| Priority scheduling | O(log n) | Array/list: O(n) | Efficient priority-based extraction |
When NOT to use Heap:
- Need to access elements other than min/max frequently
- All elements fit in memory and static (just sort once)
- Need to maintain strict ordering of all elements (use balanced BST)
- Simple problems where linear solution is sufficient
Common Pitfalls & Tips
- Python heapq is min-heap only: Use negative values for max-heap behavior.
# Wrong: Can't directly get maximum
max_val = max(heap) # O(n) operation!
# Correct: Use negative values for max-heap
heapq.heappush(max_heap, -value)
max_val = -heapq.heappop(max_heap)- Tuple comparison for custom objects: When using tuples, comparison happens element by element.
# Safe: Primary comparison by priority
heapq.heappush(heap, (priority, id, data))
# Risky: If priorities equal, might compare data objects
heapq.heappush(heap, (priority, data)) # May fail if data not comparable- Heap size management: For top-K problems, maintain heap size carefully.
# Correct: Maintain heap of size k
if len(heap) > k:
heapq.heappop(heap)
# Wrong: Let heap grow unbounded
# This defeats the memory efficiency purpose- Two-heap balance: In running median, maintain balance after each operation.
# Always balance after insertion
if len(small) > len(large) + 1:
# Move from small to large
elif len(large) > len(small) + 1:
# Move from large to small- Empty heap checks: Always check if heap is empty before accessing.
# Wrong: May raise IndexError
min_val = heap[0]
# Correct: Check first
if heap:
min_val = heap[0]- Heapify vs repeated pushes: Use heapify for better performance when building from array.
# Efficient: O(n)
heapq.heapify(nums)
# Less efficient: O(n log n)
heap = []
for num in nums:
heapq.heappush(heap, num)