Skip to content
On this page

Heap & Priority Queue: Basics

A heap is a specialized tree-based data structure that satisfies the heap property, and a priority queue is an abstract data type that heaps commonly implement. Heaps are essential for efficiently maintaining order among dynamically changing elements and are particularly useful for "top K" problems and scenarios requiring constant access to extremal values.

Understanding heaps is crucial for coding interviews as they appear in many optimization problems and provide elegant solutions to seemingly complex challenges.


What It Is & When To Use It

A heap is a complete binary tree where each node satisfies the heap property relative to its children. This property ensures efficient access to the minimum or maximum element.

Core Idea: Maintain a partially ordered structure that gives you instant access to the smallest (min-heap) or largest (max-heap) element while allowing efficient insertion and deletion.

Signals for Heap/Priority Queue: Look for these keywords and problem types. If you see one, immediately think "Can I use a heap?"

  1. "Top K" Problems:

    • Find the top K largest/smallest elements
    • K closest points, K most frequent elements
    • K largest/smallest in a stream
  2. Dynamic Extremes:

    • Running median (requires two heaps)
    • Sliding window maximum/minimum
    • Task scheduling with priorities
  3. Merge Operations:

    • Merge K sorted lists/arrays
    • K-way merge problems
  4. Graph Algorithms:

    • Dijkstra's shortest path algorithm
    • Prim's minimum spanning tree
    • A search* algorithm
  5. Optimization with Ordering:

    • Problems requiring maintaining sorted order as elements change
    • Greedy algorithms that need the next best choice

The Two Types of Heaps:

  1. Min-Heap: Parent ≤ children (root is minimum)
  2. Max-Heap: Parent ≥ children (root is maximum)

Heap Properties & Operations

Core Properties

  • Complete Binary Tree: All levels filled except possibly the last, which fills left to right
  • Heap Property: Each parent satisfies the ordering constraint with its children
  • Array Representation: For node at index i:
    • Left child: 2*i + 1
    • Right child: 2*i + 2
    • Parent: (i-1) // 2

Essential Operations

OperationTime ComplexityDescription
Insert (push)O(log n)Add element and restore heap property
Extract Min/Max (pop)O(log n)Remove root and restore heap property
Peek (top)O(1)Access root without removing
HeapifyO(n)Convert arbitrary array to heap
Build HeapO(n)Create heap from array of elements

Python Heap Implementation

Python's heapq module provides a min-heap implementation:

Core Template (Min-Heap):

python
import heapq

# Create empty heap
heap = []

# Insert elements
heapq.heappush(heap, 10)
heapq.heappush(heap, 5)
heapq.heappush(heap, 15)

# Peek at minimum (don't remove)
if heap:
    min_val = heap[0]

# Extract minimum
if heap:
    min_val = heapq.heappop(heap)

# Convert list to heap in-place
nums = [10, 5, 15, 20, 3]
heapq.heapify(nums)  # O(n) operation

# Get K largest/smallest
heapq.nlargest(k, nums)   # Returns list of K largest
heapq.nsmallest(k, nums)  # Returns list of K smallest

Max-Heap Simulation:

python
# Python heapq only supports min-heap
# For max-heap, negate values
max_heap = []
heapq.heappush(max_heap, -10)  # Push negative values
heapq.heappush(max_heap, -5)
heapq.heappush(max_heap, -15)

# Extract maximum (remember to negate back)
if max_heap:
    max_val = -heapq.heappop(max_heap)

# Alternative: Use custom comparison with tuples
# (priority, value) where lower priority = higher importance

Heap with Custom Objects:

python
import heapq
from dataclasses import dataclass

@dataclass
class Task:
    priority: int
    name: str
    
    def __lt__(self, other):
        return self.priority < other.priority

# Use directly with heapq
tasks = []
heapq.heappush(tasks, Task(1, "High priority"))
heapq.heappush(tasks, Task(5, "Low priority"))
heapq.heappush(tasks, Task(3, "Medium priority"))

# Or use tuples (priority, data)
heap = []
heapq.heappush(heap, (1, "High priority"))
heapq.heappush(heap, (5, "Low priority"))
heapq.heappush(heap, (3, "Medium priority"))

Common Patterns & Templates

1. Top K Elements Pattern

python
def top_k_elements(nums, k):
    # Method 1: Use heapq.nlargest/nsmallest (simple)
    return heapq.nlargest(k, nums)
    
    # Method 2: Use min-heap of size k (memory efficient for large arrays)
    heap = []
    for num in nums:
        heapq.heappush(heap, num)
        if len(heap) > k:
            heapq.heappop(heap)  # Remove smallest
    return heap  # Contains k largest elements

def top_k_frequent(nums, k):
    """Find k most frequent elements"""
    from collections import Counter
    count = Counter(nums)
    
    # Use max-heap with frequencies (negate for max behavior)
    heap = [(-freq, num) for num, freq in count.items()]
    heapq.heapify(heap)
    
    result = []
    for _ in range(k):
        freq, num = heapq.heappop(heap)
        result.append(num)
    
    return result

2. Running Median (Two Heaps)

python
class MedianFinder:
    def __init__(self):
        self.small = []  # Max-heap (use negative values)
        self.large = []  # Min-heap
    
    def addNum(self, num):
        # Add to appropriate heap
        if not self.small or num <= -self.small[0]:
            heapq.heappush(self.small, -num)
        else:
            heapq.heappush(self.large, num)
        
        # Balance heaps (ensure sizes differ by at most 1)
        if len(self.small) > len(self.large) + 1:
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)
        elif len(self.large) > len(self.small) + 1:
            val = heapq.heappop(self.large)
            heapq.heappush(self.small, -val)
    
    def findMedian(self):
        if len(self.small) > len(self.large):
            return -self.small[0]
        elif len(self.large) > len(self.small):
            return self.large[0]
        else:
            return (-self.small[0] + self.large[0]) / 2

3. Merge K Sorted Lists

python
def merge_k_sorted_lists(lists):
    """Merge k sorted lists using min-heap"""
    import heapq
    
    heap = []
    result = []
    
    # Initialize heap with first element from each list
    for i, lst in enumerate(lists):
        if lst:
            heapq.heappush(heap, (lst[0], i, 0))  # (value, list_index, element_index)
    
    while heap:
        val, list_idx, elem_idx = heapq.heappop(heap)
        result.append(val)
        
        # Add next element from same list if exists
        if elem_idx + 1 < len(lists[list_idx]):
            next_val = lists[list_idx][elem_idx + 1]
            heapq.heappush(heap, (next_val, list_idx, elem_idx + 1))
    
    return result

4. Dijkstra's Algorithm Template

python
def dijkstra(graph, start):
    """Find shortest distances from start to all nodes"""
    import heapq
    from collections import defaultdict
    
    # Distance dictionary
    dist = defaultdict(lambda: float('inf'))
    dist[start] = 0
    
    # Priority queue: (distance, node)
    pq = [(0, start)]
    visited = set()
    
    while pq:
        d, node = heapq.heappop(pq)
        
        if node in visited:
            continue
        visited.add(node)
        
        # Explore neighbors
        for neighbor, weight in graph[node]:
            new_dist = d + weight
            if new_dist < dist[neighbor]:
                dist[neighbor] = new_dist
                heapq.heappush(pq, (new_dist, neighbor))
    
    return dict(dist)

When to Choose Heap vs Alternatives

Use CaseHeapAlternativeWhy Heap is Better
Top K elementsO(n log k)Sorting: O(n log n)More memory efficient, better for streaming
Running medianO(log n) insertSorted array: O(n) insertMuch faster insertions
Dynamic min/maxO(log n)Linear scan: O(n)Significant speedup for frequent queries
Merge K listsO(n log k)Naive merge: O(nk)Better scalability with more lists
Priority schedulingO(log n)Array/list: O(n)Efficient priority-based extraction

When NOT to use Heap:

  • Need to access elements other than min/max frequently
  • All elements fit in memory and static (just sort once)
  • Need to maintain strict ordering of all elements (use balanced BST)
  • Simple problems where linear solution is sufficient

Common Pitfalls & Tips

  1. Python heapq is min-heap only: Use negative values for max-heap behavior.
python
# Wrong: Can't directly get maximum
max_val = max(heap)  # O(n) operation!

# Correct: Use negative values for max-heap
heapq.heappush(max_heap, -value)
max_val = -heapq.heappop(max_heap)
  1. Tuple comparison for custom objects: When using tuples, comparison happens element by element.
python
# Safe: Primary comparison by priority
heapq.heappush(heap, (priority, id, data))

# Risky: If priorities equal, might compare data objects
heapq.heappush(heap, (priority, data))  # May fail if data not comparable
  1. Heap size management: For top-K problems, maintain heap size carefully.
python
# Correct: Maintain heap of size k
if len(heap) > k:
    heapq.heappop(heap)

# Wrong: Let heap grow unbounded
# This defeats the memory efficiency purpose
  1. Two-heap balance: In running median, maintain balance after each operation.
python
# Always balance after insertion
if len(small) > len(large) + 1:
    # Move from small to large
elif len(large) > len(small) + 1:
    # Move from large to small
  1. Empty heap checks: Always check if heap is empty before accessing.
python
# Wrong: May raise IndexError
min_val = heap[0]

# Correct: Check first
if heap:
    min_val = heap[0]
  1. Heapify vs repeated pushes: Use heapify for better performance when building from array.
python
# Efficient: O(n)
heapq.heapify(nums)

# Less efficient: O(n log n)
heap = []
for num in nums:
    heapq.heappush(heap, num)