Heap & Priority Queue: Summary
This summary provides a high-level overview of heap and priority queue patterns covered in this section. Heaps are fundamental data structures for efficiently maintaining ordered data and are essential for many optimization algorithms.
Pattern Recognition Signals
When to use Heap/Priority Queue: Look for these keywords and problem types:
"Top K" problems:
- Find K largest/smallest elements
- K most/least frequent items
- K closest points/elements
Dynamic extremes:
- Running median or streaming statistics
- Sliding window maximum/minimum
- Next/previous greater/smaller element
Merge operations:
- Merge K sorted lists/arrays
- K-way merge problems
Scheduling and optimization:
- Task scheduling with priorities
- Meeting room allocation
- Shortest path algorithms (Dijkstra)
Real-time processing:
- Data streams requiring ordered processing
- Event simulation with time-based priorities
Decision Matrix
| Problem Type | Best Approach | Time Complexity | When to Use |
|---|---|---|---|
| Find top K elements | Min-heap of size K | O(n log k) | k << n, memory efficient |
| Running median | Two heaps (max + min) | O(log n) insert, O(1) median | Dynamic data, frequent queries |
| Merge K lists | Min-heap frontier | O(n log k) | K sorted sequences |
| Task scheduling | Priority queue | O(n log n) | Priority-based ordering |
| Shortest path | Dijkstra + heap | O((V+E) log V) | Weighted graphs, non-negative |
| K closest points | Max-heap of size K | O(n log k) | Geometric problems |
Core Algorithms by Pattern
1. Top K Elements Pattern
# Min-heap of size K (most memory efficient)
heap = []
for element in data:
heapq.heappush(heap, element)
if len(heap) > k:
heapq.heappop(heap)
return heap # Contains K largest elements2. Running Median (Two Heaps)
class MedianFinder:
def __init__(self):
self.small = [] # Max-heap (negated values)
self.large = [] # Min-heap
def addNum(self, num):
if not self.small or num <= -self.small[0]:
heapq.heappush(self.small, -num)
else:
heapq.heappush(self.large, num)
self._balance()
def findMedian(self):
if len(self.small) > len(self.large):
return -self.small[0]
elif len(self.large) > len(self.small):
return self.large[0]
else:
return (-self.small[0] + self.large[0]) / 2.03. Merge K Sorted Lists
def merge_k_lists(lists):
heap = []
result = []
# Initialize with first element from each list
for i, lst in enumerate(lists):
if lst:
heapq.heappush(heap, (lst[0], i, 0))
while heap:
val, list_idx, elem_idx = heapq.heappop(heap)
result.append(val)
# Add next element from same list
if elem_idx + 1 < len(lists[list_idx]):
next_val = lists[list_idx][elem_idx + 1]
heapq.heappush(heap, (next_val, list_idx, elem_idx + 1))
return result4. Dijkstra's Algorithm
def dijkstra(graph, start):
distances = defaultdict(lambda: float('inf'))
distances[start] = 0
pq = [(0, start)]
visited = set()
while pq:
dist, node = heapq.heappop(pq)
if node in visited:
continue
visited.add(node)
for neighbor, weight in graph[node]:
new_dist = dist + weight
if new_dist < distances[neighbor]:
distances[neighbor] = new_dist
heapq.heappush(pq, (new_dist, neighbor))
return dict(distances)Common Implementation Patterns
Python heapq Essentials
import heapq
# Basic operations
heap = []
heapq.heappush(heap, item) # O(log n)
min_item = heapq.heappop(heap) # O(log n)
min_item = heap[0] # O(1) peek
# Bulk operations
heapq.heapify(list) # O(n) - convert list to heap
heapq.nlargest(k, iterable) # O(n log k) - k largest
heapq.nsmallest(k, iterable) # O(n log k) - k smallest
# Max-heap simulation (Python only has min-heap)
max_heap = []
heapq.heappush(max_heap, -value) # Negate for max behavior
max_value = -heapq.heappop(max_heap)Custom Comparison
# Method 1: Use tuples (comparison by first element, then second, etc.)
heapq.heappush(heap, (priority, secondary, data))
# Method 2: Custom class with __lt__
class Task:
def __init__(self, priority, data):
self.priority = priority
self.data = data
def __lt__(self, other):
return self.priority < other.priorityHandling Duplicate Priorities
# Include unique identifier to break ties
import itertools
counter = itertools.count()
heapq.heappush(heap, (priority, next(counter), data))When NOT to Use Heaps
- Need frequent access to arbitrary elements → Use balanced BST or sorted data structure
- All data fits in memory and is static → Just sort once O(n log n)
- Simple linear scan suffices → Don't over-engineer with heap
- Need to maintain total ordering → Heap only gives partial ordering
- Very small datasets → Overhead might not be worth it
Time Complexity Cheat Sheet
| Operation | Time Complexity | Notes |
|---|---|---|
| Insert | O(log n) | Single element insertion |
| Extract min/max | O(log n) | Remove root element |
| Peek min/max | O(1) | Access root without removal |
| Build heap | O(n) | From unsorted array |
| Find arbitrary element | O(n) | Heap doesn't maintain search structure |
| Delete arbitrary element | O(n) | Need to find first, then delete |
Space Complexity Patterns
| Pattern | Space Complexity | Explanation |
|---|---|---|
| Basic heap | O(n) | Store all elements |
| Top K with fixed-size heap | O(k) | Only store K elements |
| Two heaps (median) | O(n) | Split data between two heaps |
| Dijkstra | O(V) | Distance array + heap |
| K-way merge | O(k) | Heap size bounded by number of lists |
Common Pitfalls & Best Practices
1. Max-heap in Python
# Wrong: Python doesn't have built-in max-heap
# Right: Negate values for max-heap behavior
heap = []
heapq.heappush(heap, -value) # For max-heap
max_val = -heapq.heappop(heap)2. Heap Size Management
# Correct: Maintain bounded heap size
while len(heap) > k:
heapq.heappop(heap)
# Wrong: Let heap grow unbounded (defeats memory efficiency)3. Handling Empty Heaps
# Always check before accessing
if heap:
min_val = heap[0]
else:
handle_empty_case()4. Comparison of Non-comparable Objects
# Wrong: May fail if objects can't be compared
heapq.heappush(heap, (priority, object))
# Right: Add tiebreaker
heapq.heappush(heap, (priority, unique_id, object))5. Two-Heap Balance
# Always rebalance after each operation
if len(small) > len(large) + 1:
move_from_small_to_large()
elif len(large) > len(small) + 1:
move_from_large_to_small()Advanced Considerations
1. Lazy Deletion
For sliding window problems where elements need to be removed:
# Mark for deletion instead of immediate removal
removed = set()
while heap and heap[0] in removed:
heapq.heappop(heap)2. Heap with TTL (Time-to-Live)
# Include timestamps for automatic expiration
import time
current_time = time.time()
heapq.heappush(heap, (priority, current_time + ttl, data))3. Thread Safety
# For concurrent access
import threading
import heapq
class ThreadSafeHeap:
def __init__(self):
self.heap = []
self.lock = threading.Lock()
def push(self, item):
with self.lock:
heapq.heappush(self.heap, item)
def pop(self):
with self.lock:
return heapq.heappop(self.heap) if self.heap else NoneInterview Strategy
Pattern Recognition: Look for keywords like "top K", "merge", "median", "schedule", "shortest path"
Ask Clarifying Questions:
- Is K fixed or variable?
- Do we need to handle streaming data?
- Are there memory constraints?
- What are the tie-breaking rules?
Choose Optimal Approach:
- Small K: Min/max-heap of size K
- Large K: Different algorithm might be better
- Streaming: Heap is often ideal
- Static data: Consider if sorting once is better
Implementation Tips:
- Start with basic heap operations
- Handle edge cases (empty heap, ties)
- Consider space optimization for large datasets
- Use appropriate comparison for custom objects
Common Follow-ups:
- How would you handle updates/deletions?
- What if data doesn't fit in memory?
- How would you make it thread-safe?
- Can you optimize for the specific K value?
The heap/priority queue pattern is powerful and appears frequently in interviews. Master these core templates and decision frameworks to tackle a wide variety of optimization problems efficiently.