Skip to content
On this page

Heap & Priority Queue: Summary

This summary provides a high-level overview of heap and priority queue patterns covered in this section. Heaps are fundamental data structures for efficiently maintaining ordered data and are essential for many optimization algorithms.


Pattern Recognition Signals

When to use Heap/Priority Queue: Look for these keywords and problem types:

  1. "Top K" problems:

    • Find K largest/smallest elements
    • K most/least frequent items
    • K closest points/elements
  2. Dynamic extremes:

    • Running median or streaming statistics
    • Sliding window maximum/minimum
    • Next/previous greater/smaller element
  3. Merge operations:

    • Merge K sorted lists/arrays
    • K-way merge problems
  4. Scheduling and optimization:

    • Task scheduling with priorities
    • Meeting room allocation
    • Shortest path algorithms (Dijkstra)
  5. Real-time processing:

    • Data streams requiring ordered processing
    • Event simulation with time-based priorities

Decision Matrix

Problem TypeBest ApproachTime ComplexityWhen to Use
Find top K elementsMin-heap of size KO(n log k)k << n, memory efficient
Running medianTwo heaps (max + min)O(log n) insert, O(1) medianDynamic data, frequent queries
Merge K listsMin-heap frontierO(n log k)K sorted sequences
Task schedulingPriority queueO(n log n)Priority-based ordering
Shortest pathDijkstra + heapO((V+E) log V)Weighted graphs, non-negative
K closest pointsMax-heap of size KO(n log k)Geometric problems

Core Algorithms by Pattern

1. Top K Elements Pattern

python
# Min-heap of size K (most memory efficient)
heap = []
for element in data:
    heapq.heappush(heap, element)
    if len(heap) > k:
        heapq.heappop(heap)
return heap  # Contains K largest elements

2. Running Median (Two Heaps)

python
class MedianFinder:
    def __init__(self):
        self.small = []  # Max-heap (negated values)
        self.large = []  # Min-heap
    
    def addNum(self, num):
        if not self.small or num <= -self.small[0]:
            heapq.heappush(self.small, -num)
        else:
            heapq.heappush(self.large, num)
        self._balance()
    
    def findMedian(self):
        if len(self.small) > len(self.large):
            return -self.small[0]
        elif len(self.large) > len(self.small):
            return self.large[0]
        else:
            return (-self.small[0] + self.large[0]) / 2.0

3. Merge K Sorted Lists

python
def merge_k_lists(lists):
    heap = []
    result = []
    
    # Initialize with first element from each list
    for i, lst in enumerate(lists):
        if lst:
            heapq.heappush(heap, (lst[0], i, 0))
    
    while heap:
        val, list_idx, elem_idx = heapq.heappop(heap)
        result.append(val)
        
        # Add next element from same list
        if elem_idx + 1 < len(lists[list_idx]):
            next_val = lists[list_idx][elem_idx + 1]
            heapq.heappush(heap, (next_val, list_idx, elem_idx + 1))
    
    return result

4. Dijkstra's Algorithm

python
def dijkstra(graph, start):
    distances = defaultdict(lambda: float('inf'))
    distances[start] = 0
    pq = [(0, start)]
    visited = set()
    
    while pq:
        dist, node = heapq.heappop(pq)
        if node in visited:
            continue
        visited.add(node)
        
        for neighbor, weight in graph[node]:
            new_dist = dist + weight
            if new_dist < distances[neighbor]:
                distances[neighbor] = new_dist
                heapq.heappush(pq, (new_dist, neighbor))
    
    return dict(distances)

Common Implementation Patterns

Python heapq Essentials

python
import heapq

# Basic operations
heap = []
heapq.heappush(heap, item)    # O(log n)
min_item = heapq.heappop(heap) # O(log n)
min_item = heap[0]             # O(1) peek

# Bulk operations
heapq.heapify(list)           # O(n) - convert list to heap
heapq.nlargest(k, iterable)   # O(n log k) - k largest
heapq.nsmallest(k, iterable)  # O(n log k) - k smallest

# Max-heap simulation (Python only has min-heap)
max_heap = []
heapq.heappush(max_heap, -value)  # Negate for max behavior
max_value = -heapq.heappop(max_heap)

Custom Comparison

python
# Method 1: Use tuples (comparison by first element, then second, etc.)
heapq.heappush(heap, (priority, secondary, data))

# Method 2: Custom class with __lt__
class Task:
    def __init__(self, priority, data):
        self.priority = priority
        self.data = data
    
    def __lt__(self, other):
        return self.priority < other.priority

Handling Duplicate Priorities

python
# Include unique identifier to break ties
import itertools
counter = itertools.count()
heapq.heappush(heap, (priority, next(counter), data))

When NOT to Use Heaps

  1. Need frequent access to arbitrary elements → Use balanced BST or sorted data structure
  2. All data fits in memory and is static → Just sort once O(n log n)
  3. Simple linear scan suffices → Don't over-engineer with heap
  4. Need to maintain total ordering → Heap only gives partial ordering
  5. Very small datasets → Overhead might not be worth it

Time Complexity Cheat Sheet

OperationTime ComplexityNotes
InsertO(log n)Single element insertion
Extract min/maxO(log n)Remove root element
Peek min/maxO(1)Access root without removal
Build heapO(n)From unsorted array
Find arbitrary elementO(n)Heap doesn't maintain search structure
Delete arbitrary elementO(n)Need to find first, then delete

Space Complexity Patterns

PatternSpace ComplexityExplanation
Basic heapO(n)Store all elements
Top K with fixed-size heapO(k)Only store K elements
Two heaps (median)O(n)Split data between two heaps
DijkstraO(V)Distance array + heap
K-way mergeO(k)Heap size bounded by number of lists

Common Pitfalls & Best Practices

1. Max-heap in Python

python
# Wrong: Python doesn't have built-in max-heap
# Right: Negate values for max-heap behavior
heap = []
heapq.heappush(heap, -value)  # For max-heap
max_val = -heapq.heappop(heap)

2. Heap Size Management

python
# Correct: Maintain bounded heap size
while len(heap) > k:
    heapq.heappop(heap)

# Wrong: Let heap grow unbounded (defeats memory efficiency)

3. Handling Empty Heaps

python
# Always check before accessing
if heap:
    min_val = heap[0]
else:
    handle_empty_case()

4. Comparison of Non-comparable Objects

python
# Wrong: May fail if objects can't be compared
heapq.heappush(heap, (priority, object))

# Right: Add tiebreaker
heapq.heappush(heap, (priority, unique_id, object))

5. Two-Heap Balance

python
# Always rebalance after each operation
if len(small) > len(large) + 1:
    move_from_small_to_large()
elif len(large) > len(small) + 1:
    move_from_large_to_small()

Advanced Considerations

1. Lazy Deletion

For sliding window problems where elements need to be removed:

python
# Mark for deletion instead of immediate removal
removed = set()
while heap and heap[0] in removed:
    heapq.heappop(heap)

2. Heap with TTL (Time-to-Live)

python
# Include timestamps for automatic expiration
import time
current_time = time.time()
heapq.heappush(heap, (priority, current_time + ttl, data))

3. Thread Safety

python
# For concurrent access
import threading
import heapq

class ThreadSafeHeap:
    def __init__(self):
        self.heap = []
        self.lock = threading.Lock()
    
    def push(self, item):
        with self.lock:
            heapq.heappush(self.heap, item)
    
    def pop(self):
        with self.lock:
            return heapq.heappop(self.heap) if self.heap else None

Interview Strategy

  1. Pattern Recognition: Look for keywords like "top K", "merge", "median", "schedule", "shortest path"

  2. Ask Clarifying Questions:

    • Is K fixed or variable?
    • Do we need to handle streaming data?
    • Are there memory constraints?
    • What are the tie-breaking rules?
  3. Choose Optimal Approach:

    • Small K: Min/max-heap of size K
    • Large K: Different algorithm might be better
    • Streaming: Heap is often ideal
    • Static data: Consider if sorting once is better
  4. Implementation Tips:

    • Start with basic heap operations
    • Handle edge cases (empty heap, ties)
    • Consider space optimization for large datasets
    • Use appropriate comparison for custom objects
  5. Common Follow-ups:

    • How would you handle updates/deletions?
    • What if data doesn't fit in memory?
    • How would you make it thread-safe?
    • Can you optimize for the specific K value?

The heap/priority queue pattern is powerful and appears frequently in interviews. Master these core templates and decision frameworks to tackle a wide variety of optimization problems efficiently.