Skip to content
On this page

Heap & Priority Queue: Running Median (Two Heaps)

The Running Median pattern uses two heaps to efficiently maintain the median of a dynamic dataset as elements are added or removed. This is a classic two-heap technique that provides O(log n) insertion and O(1) median retrieval.

This pattern is essential for streaming statistics, sliding window medians, and real-time data analysis.


Common Interview Questions

  1. Find Median from Data Stream (LC 295)

    • Problem: Design a data structure that supports adding integers and finding the median of all elements.
    • Heap Signal: This is the classic running median problem requiring efficient insertion and median access.
  2. Sliding Window Median (LC 480)

    • Problem: Given an array of integers and a window size k, find the median of each window of size k as the window slides.
    • Heap Signal: Extends running median to handle element removal from the window.
  3. IPO (LC 502)

    • Problem: Given projects with capital requirements and profits, maximize capital by selecting projects optimally.
    • Heap Signal: Uses two heaps to separate available vs unavailable projects.

Core Logic & Approach

The key insight is to split the data into two halves using two heaps:

Heap Structure:

  • small (max-heap): Contains smaller half of numbers (left side)
  • large (min-heap): Contains larger half of numbers (right side)

Invariants:

  1. Size constraint: len(small) - len(large) ∈ {0, 1}
  2. Ordering constraint: max(small) ≤ min(large)

Median calculation:

  • If sizes are equal: (max(small) + min(large)) / 2
  • If small is larger: max(small)

Why this works:

  • Two heaps divide the sorted sequence in the middle
  • Heap tops give us the median elements directly
  • Balancing maintains the "middle" position

Approach 1: Basic Two-Heap Median Finder

Core Template:

python
import heapq

class MedianFinder:
    def __init__(self):
        # small: max-heap (use negative values)
        # large: min-heap
        self.small = []  # Left half (smaller values)
        self.large = []  # Right half (larger values)
    
    def addNum(self, num):
        # Step 1: Add to appropriate heap
        if not self.small or num <= -self.small[0]:
            heapq.heappush(self.small, -num)  # Max-heap simulation
        else:
            heapq.heappush(self.large, num)   # Min-heap
        
        # Step 2: Balance heaps to maintain size constraint
        self._balance()
    
    def _balance(self):
        # Ensure sizes differ by at most 1
        if len(self.small) > len(self.large) + 1:
            # Move from small to large
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)
        elif len(self.large) > len(self.small) + 1:
            # Move from large to small
            val = heapq.heappop(self.large)
            heapq.heappush(self.small, -val)
    
    def findMedian(self):
        if len(self.small) > len(self.large):
            return -self.small[0]  # Max of smaller half
        elif len(self.large) > len(self.small):
            return self.large[0]   # Min of larger half
        else:
            # Equal sizes: average of middle two
            return (-self.small[0] + self.large[0]) / 2.0

# Time: O(log n) for addNum, O(1) for findMedian
# Space: O(n)

Example Walkthrough:

python
mf = MedianFinder()

# Add 1: small = [-1], large = []
mf.addNum(1)
print(mf.findMedian())  # 1.0

# Add 2: small = [-1], large = [2] (balanced)
mf.addNum(2) 
print(mf.findMedian())  # 1.5

# Add 3: small = [-1], large = [2, 3], rebalance to small = [-2, -1], large = [3]
mf.addNum(3)
print(mf.findMedian())  # 2.0

Tricky Parts & Decision Points

  1. Heap size balancing logic:
python
# Choose one consistent rule:
# Option 1: small can have at most 1 more element
if len(self.small) > len(self.large) + 1:
    move_from_small_to_large()

# Option 2: large can have at most 1 more element  
if len(self.large) > len(self.small) + 1:
    move_from_large_to_small()
  1. Max-heap simulation in Python:
python
# Remember to negate values for max-heap
heapq.heappush(small, -num)  # Push negative
max_val = -small[0]          # Negate to get actual max
  1. Initial heap assignment:
python
# Be consistent about which heap gets the first element
# One approach: always add to small first, then balance
# Another: compare with current median
  1. Edge cases:
python
# Handle empty heaps
if not self.small:
    # Only large has elements
    return self.large[0]
if not self.large:
    # Only small has elements  
    return -self.small[0]