This repository explores a threshold-based approach to maximum finding.
While the theoretical lower bound for maximum search remains Θ(n),
we investigate whether cost-aware filtering can reduce practical runtime
in scenarios where post-processing of candidate elements is expensive.
- Standard maximum finding requires scanning all elements → Θ(n).
- Our approach introduces a threshold filter:
- Estimate an upper bound for the maximum.
- Discard elements below a threshold (e.g., 50% of the bound or a learned quantile).
- Only apply expensive post-processing to the reduced candidate set.
- This does not improve worst-case complexity, but can reduce the number of expensive operations in practical settings. 👉 We don’t beat Θ(n), but we save cost when post-processing is expensive.
- Information retrieval (reduce re-ranking candidates)
- Machine learning (filtering before heavy model inference)
- Database queries (avoiding expensive lookups on low-value rows)
- Financial/streaming systems (focus on top signals only)
Run the provided Python file:
python cost_aware_maximum_finding.py
➡️ Full code is available here: cost_aware_maximum_finding.py
Example output:
Linear scan: 0.012 sec
Threshold scan: 0.007 sec
(Times depend on data distribution and threshold choice.)
You can try the code directly in Google Colab. Uncomment one scenario in the cell below to test different behaviors:
- Baseline comparison (no post-processing)
- Linear vs Cost-Aware on plain max search
- → Cost-Aware may be slightly slower due to filtering overhead
- Heavy post-processing comparison
- Linear: applies expensive work to every element
- Cost-Aware: applies expensive work only to candidates (~top 20%)
- → Cost-Aware becomes much faster when post-processing dominates
- Known upper bound scenario
- The true maximum possible value is already known
- → Skip the initial scan and directly compute cutoff
- → Further improves Cost-Aware efficiency, especially for large n
- Sample-based upper bound estimation
- Estimate cutoff from a small random sample (1000 elements)
- → Reduces overhead for large n
- → But if the sample underestimates the max, the candidate set grows and speedup decreases
We provide two complementary plots:
-
Without post-processing — both Linear and Cost-Aware runs in Ω(n) time, but Cost-Aware includes an extra filtering step, so the constant factor is larger.
-
With heavy post-processing — both methods still require O(n) to scan the input, but Cost-Aware applies the expensive step (e.g., DB lookups / model inference) only to a reduced subset O(k) (k ≪ n), instead of O(n) in the Linear case. This makes Cost-Aware significantly faster in scenarios where post-processing dominates runtime.
To reproduce:
pip install -r requirements.txt
python benchmark.py
- Python 3
- Matplotlib (for plotting)
-
This is a prototype and not an optimized production algorithm.
-
The goal is to demonstrate that cost-aware strategies can sometimes outperform naive linear scans in practice when post-processing dominates.
-
Although this repository demonstrates maximum finding, the same cost-aware filtering idea applies to minimum finding as well. Simply invert the logic (use a lower threshold instead of an upper bound).
In both cases, the theoretical bound remains Θ(n), but practical savings arise when post-processing is costly.
Maintained by SuDev feel free to open an issue or suggestion!
이 프로젝트는 후처리 비용이 큰 환경에서 임계값 기반 필터링을 활용해 후보 개수를 줄여 실행 시간을 절감할 수 있는 실험적 알고리즘 접근을 다룹니다.