Heap & Priority Queue: Running Median (Two Heaps)
The Running Median pattern uses two heaps to efficiently maintain the median of a dynamic dataset as elements are added or removed. This is a classic two-heap technique that provides O(log n) insertion and O(1) median retrieval.
This pattern is essential for streaming statistics, sliding window medians, and real-time data analysis.
Common Interview Questions
Find Median from Data Stream (LC 295)
- Problem: Design a data structure that supports adding integers and finding the median of all elements.
- Heap Signal: This is the classic running median problem requiring efficient insertion and median access.
Sliding Window Median (LC 480)
- Problem: Given an array of integers and a window size
k, find the median of each window of sizekas the window slides. - Heap Signal: Extends running median to handle element removal from the window.
- Problem: Given an array of integers and a window size
IPO (LC 502)
- Problem: Given projects with capital requirements and profits, maximize capital by selecting projects optimally.
- Heap Signal: Uses two heaps to separate available vs unavailable projects.
Core Logic & Approach
The key insight is to split the data into two halves using two heaps:
Heap Structure:
- small (max-heap): Contains smaller half of numbers (left side)
- large (min-heap): Contains larger half of numbers (right side)
Invariants:
- Size constraint:
len(small) - len(large) ∈ {0, 1} - Ordering constraint:
max(small) ≤ min(large)
Median calculation:
- If sizes are equal:
(max(small) + min(large)) / 2 - If small is larger:
max(small)
Why this works:
- Two heaps divide the sorted sequence in the middle
- Heap tops give us the median elements directly
- Balancing maintains the "middle" position
Approach 1: Basic Two-Heap Median Finder
Core Template:
python
import heapq
class MedianFinder:
def __init__(self):
# small: max-heap (use negative values)
# large: min-heap
self.small = [] # Left half (smaller values)
self.large = [] # Right half (larger values)
def addNum(self, num):
# Step 1: Add to appropriate heap
if not self.small or num <= -self.small[0]:
heapq.heappush(self.small, -num) # Max-heap simulation
else:
heapq.heappush(self.large, num) # Min-heap
# Step 2: Balance heaps to maintain size constraint
self._balance()
def _balance(self):
# Ensure sizes differ by at most 1
if len(self.small) > len(self.large) + 1:
# Move from small to large
val = -heapq.heappop(self.small)
heapq.heappush(self.large, val)
elif len(self.large) > len(self.small) + 1:
# Move from large to small
val = heapq.heappop(self.large)
heapq.heappush(self.small, -val)
def findMedian(self):
if len(self.small) > len(self.large):
return -self.small[0] # Max of smaller half
elif len(self.large) > len(self.small):
return self.large[0] # Min of larger half
else:
# Equal sizes: average of middle two
return (-self.small[0] + self.large[0]) / 2.0
# Time: O(log n) for addNum, O(1) for findMedian
# Space: O(n)Example Walkthrough:
python
mf = MedianFinder()
# Add 1: small = [-1], large = []
mf.addNum(1)
print(mf.findMedian()) # 1.0
# Add 2: small = [-1], large = [2] (balanced)
mf.addNum(2)
print(mf.findMedian()) # 1.5
# Add 3: small = [-1], large = [2, 3], rebalance to small = [-2, -1], large = [3]
mf.addNum(3)
print(mf.findMedian()) # 2.0Tricky Parts & Decision Points
- Heap size balancing logic:
python
# Choose one consistent rule:
# Option 1: small can have at most 1 more element
if len(self.small) > len(self.large) + 1:
move_from_small_to_large()
# Option 2: large can have at most 1 more element
if len(self.large) > len(self.small) + 1:
move_from_large_to_small()- Max-heap simulation in Python:
python
# Remember to negate values for max-heap
heapq.heappush(small, -num) # Push negative
max_val = -small[0] # Negate to get actual max- Initial heap assignment:
python
# Be consistent about which heap gets the first element
# One approach: always add to small first, then balance
# Another: compare with current median- Edge cases:
python
# Handle empty heaps
if not self.small:
# Only large has elements
return self.large[0]
if not self.large:
# Only small has elements
return -self.small[0]