Heap & Priority Queue: Summary

This summary provides a high-level overview of heap and priority queue patterns covered in this section. Heaps are fundamental data structures for efficiently maintaining ordered data and are essential for many optimization algorithms.

Pattern Recognition Signals

When to use Heap/Priority Queue: Look for these keywords and problem types:

"Top K" problems:
- Find K largest/smallest elements
- K most/least frequent items
- K closest points/elements
Dynamic extremes:
- Running median or streaming statistics
- Sliding window maximum/minimum
- Next/previous greater/smaller element
Merge operations:
- Merge K sorted lists/arrays
- K-way merge problems
Scheduling and optimization:
- Task scheduling with priorities
- Meeting room allocation
- Shortest path algorithms (Dijkstra)
Real-time processing:
- Data streams requiring ordered processing
- Event simulation with time-based priorities

Decision Matrix

Problem Type	Best Approach	Time Complexity	When to Use
Find top K elements	Min-heap of size K	O(n log k)	k << n, memory efficient
Running median	Two heaps (max + min)	O(log n) insert, O(1) median	Dynamic data, frequent queries
Merge K lists	Min-heap frontier	O(n log k)	K sorted sequences
Task scheduling	Priority queue	O(n log n)	Priority-based ordering
Shortest path	Dijkstra + heap	O((V+E) log V)	Weighted graphs, non-negative
K closest points	Max-heap of size K	O(n log k)	Geometric problems

Core Algorithms by Pattern

1. Top K Elements Pattern

python

# Min-heap of size K (most memory efficient)
heap = []
for element in data:
    heapq.heappush(heap, element)
    if len(heap) > k:
        heapq.heappop(heap)
return heap  # Contains K largest elements

2. Running Median (Two Heaps)

python

class MedianFinder:
    def __init__(self):
        self.small = []  # Max-heap (negated values)
        self.large = []  # Min-heap
    
    def addNum(self, num):
        if not self.small or num <= -self.small[0]:
            heapq.heappush(self.small, -num)
        else:
            heapq.heappush(self.large, num)
        self._balance()
    
    def findMedian(self):
        if len(self.small) > len(self.large):
            return -self.small[0]
        elif len(self.large) > len(self.small):
            return self.large[0]
        else:
            return (-self.small[0] + self.large[0]) / 2.0

3. Merge K Sorted Lists

python

def merge_k_lists(lists):
    heap = []
    result = []
    
    # Initialize with first element from each list
    for i, lst in enumerate(lists):
        if lst:
            heapq.heappush(heap, (lst[0], i, 0))
    
    while heap:
        val, list_idx, elem_idx = heapq.heappop(heap)
        result.append(val)
        
        # Add next element from same list
        if elem_idx + 1 < len(lists[list_idx]):
            next_val = lists[list_idx][elem_idx + 1]
            heapq.heappush(heap, (next_val, list_idx, elem_idx + 1))
    
    return result

4. Dijkstra's Algorithm

python

def dijkstra(graph, start):
    distances = defaultdict(lambda: float('inf'))
    distances[start] = 0
    pq = [(0, start)]
    visited = set()
    
    while pq:
        dist, node = heapq.heappop(pq)
        if node in visited:
            continue
        visited.add(node)
        
        for neighbor, weight in graph[node]:
            new_dist = dist + weight
            if new_dist < distances[neighbor]:
                distances[neighbor] = new_dist
                heapq.heappush(pq, (new_dist, neighbor))
    
    return dict(distances)

Common Implementation Patterns

Python heapq Essentials

python

import heapq

# Basic operations
heap = []
heapq.heappush(heap, item)    # O(log n)
min_item = heapq.heappop(heap) # O(log n)
min_item = heap[0]             # O(1) peek

# Bulk operations
heapq.heapify(list)           # O(n) - convert list to heap
heapq.nlargest(k, iterable)   # O(n log k) - k largest
heapq.nsmallest(k, iterable)  # O(n log k) - k smallest

# Max-heap simulation (Python only has min-heap)
max_heap = []
heapq.heappush(max_heap, -value)  # Negate for max behavior
max_value = -heapq.heappop(max_heap)

Custom Comparison

python

# Method 1: Use tuples (comparison by first element, then second, etc.)
heapq.heappush(heap, (priority, secondary, data))

# Method 2: Custom class with __lt__
class Task:
    def __init__(self, priority, data):
        self.priority = priority
        self.data = data
    
    def __lt__(self, other):
        return self.priority < other.priority

Handling Duplicate Priorities

python

# Include unique identifier to break ties
import itertools
counter = itertools.count()
heapq.heappush(heap, (priority, next(counter), data))

When NOT to Use Heaps

Need frequent access to arbitrary elements → Use balanced BST or sorted data structure
All data fits in memory and is static → Just sort once O(n log n)
Simple linear scan suffices → Don't over-engineer with heap
Need to maintain total ordering → Heap only gives partial ordering
Very small datasets → Overhead might not be worth it

Time Complexity Cheat Sheet

Operation	Time Complexity	Notes
Insert	O(log n)	Single element insertion
Extract min/max	O(log n)	Remove root element
Peek min/max	O(1)	Access root without removal
Build heap	O(n)	From unsorted array
Find arbitrary element	O(n)	Heap doesn't maintain search structure
Delete arbitrary element	O(n)	Need to find first, then delete

Space Complexity Patterns

Pattern	Space Complexity	Explanation
Basic heap	O(n)	Store all elements
Top K with fixed-size heap	O(k)	Only store K elements
Two heaps (median)	O(n)	Split data between two heaps
Dijkstra	O(V)	Distance array + heap
K-way merge	O(k)	Heap size bounded by number of lists

Common Pitfalls & Best Practices

1. Max-heap in Python

python

# Wrong: Python doesn't have built-in max-heap
# Right: Negate values for max-heap behavior
heap = []
heapq.heappush(heap, -value)  # For max-heap
max_val = -heapq.heappop(heap)

2. Heap Size Management

python

# Correct: Maintain bounded heap size
while len(heap) > k:
    heapq.heappop(heap)

# Wrong: Let heap grow unbounded (defeats memory efficiency)

3. Handling Empty Heaps

python

# Always check before accessing
if heap:
    min_val = heap[0]
else:
    handle_empty_case()

4. Comparison of Non-comparable Objects

python

# Wrong: May fail if objects can't be compared
heapq.heappush(heap, (priority, object))

# Right: Add tiebreaker
heapq.heappush(heap, (priority, unique_id, object))

5. Two-Heap Balance

python

# Always rebalance after each operation
if len(small) > len(large) + 1:
    move_from_small_to_large()
elif len(large) > len(small) + 1:
    move_from_large_to_small()

Advanced Considerations

1. Lazy Deletion

For sliding window problems where elements need to be removed:

python

# Mark for deletion instead of immediate removal
removed = set()
while heap and heap[0] in removed:
    heapq.heappop(heap)

2. Heap with TTL (Time-to-Live)

python

# Include timestamps for automatic expiration
import time
current_time = time.time()
heapq.heappush(heap, (priority, current_time + ttl, data))

3. Thread Safety

python

# For concurrent access
import threading
import heapq

class ThreadSafeHeap:
    def __init__(self):
        self.heap = []
        self.lock = threading.Lock()
    
    def push(self, item):
        with self.lock:
            heapq.heappush(self.heap, item)
    
    def pop(self):
        with self.lock:
            return heapq.heappop(self.heap) if self.heap else None

Interview Strategy

Pattern Recognition: Look for keywords like "top K", "merge", "median", "schedule", "shortest path"
Ask Clarifying Questions:
- Is K fixed or variable?
- Do we need to handle streaming data?
- Are there memory constraints?
- What are the tie-breaking rules?
Choose Optimal Approach:
- Small K: Min/max-heap of size K
- Large K: Different algorithm might be better
- Streaming: Heap is often ideal
- Static data: Consider if sorting once is better
Implementation Tips:
- Start with basic heap operations
- Handle edge cases (empty heap, ties)
- Consider space optimization for large datasets
- Use appropriate comparison for custom objects
Common Follow-ups:
- How would you handle updates/deletions?
- What if data doesn't fit in memory?
- How would you make it thread-safe?
- Can you optimize for the specific K value?

The heap/priority queue pattern is powerful and appears frequently in interviews. Master these core templates and decision frameworks to tackle a wide variety of optimization problems efficiently.

Heap & Priority Queue: Summary ​

Pattern Recognition Signals ​

Decision Matrix ​

Core Algorithms by Pattern ​

1. Top K Elements Pattern ​

2. Running Median (Two Heaps) ​

3. Merge K Sorted Lists ​

4. Dijkstra's Algorithm ​

Common Implementation Patterns ​

Python heapq Essentials ​

Custom Comparison ​

Handling Duplicate Priorities ​

When NOT to Use Heaps ​

Time Complexity Cheat Sheet ​

Space Complexity Patterns ​

Common Pitfalls & Best Practices ​

1. Max-heap in Python ​

2. Heap Size Management ​

3. Handling Empty Heaps ​

4. Comparison of Non-comparable Objects ​

5. Two-Heap Balance ​

Advanced Considerations ​

1. Lazy Deletion ​

2. Heap with TTL (Time-to-Live) ​

3. Thread Safety ​

Interview Strategy ​

Heap & Priority Queue: Summary

Pattern Recognition Signals

Decision Matrix

Core Algorithms by Pattern

1. Top K Elements Pattern

2. Running Median (Two Heaps)

3. Merge K Sorted Lists

4. Dijkstra's Algorithm

Common Implementation Patterns

Python heapq Essentials

Custom Comparison

Handling Duplicate Priorities

When NOT to Use Heaps

Time Complexity Cheat Sheet

Space Complexity Patterns

Common Pitfalls & Best Practices

1. Max-heap in Python

2. Heap Size Management

3. Handling Empty Heaps

4. Comparison of Non-comparable Objects

5. Two-Heap Balance

Advanced Considerations

1. Lazy Deletion

2. Heap with TTL (Time-to-Live)

3. Thread Safety

Interview Strategy