Skip to content

SuhnDev/Beyond-Linear-Max

Repository files navigation

Beyond Linear Max

Cost-Aware Maximum Finding

License: MIT Made with Python

This repository explores a threshold-based approach to maximum finding.
While the theoretical lower bound for maximum search remains Θ(n),
we investigate whether cost-aware filtering can reduce practical runtime
in scenarios where post-processing of candidate elements is expensive.


🔍 Idea

  • Standard maximum finding requires scanning all elements → Θ(n).
  • Our approach introduces a threshold filter:
    1. Estimate an upper bound for the maximum.
    2. Discard elements below a threshold (e.g., 50% of the bound or a learned quantile).
    3. Only apply expensive post-processing to the reduced candidate set.
  • This does not improve worst-case complexity, but can reduce the number of expensive operations in practical settings. 👉 We don’t beat Θ(n), but we save cost when post-processing is expensive.

📊 Applications

  • Information retrieval (reduce re-ranking candidates)
  • Machine learning (filtering before heavy model inference)
  • Database queries (avoiding expensive lookups on low-value rows)
  • Financial/streaming systems (focus on top signals only)

⚙️ Example Code

Run the provided Python file:

python cost_aware_maximum_finding.py

➡️ Full code is available here: cost_aware_maximum_finding.py

Example output:

Linear scan: 0.012 sec
Threshold scan: 0.007 sec

(Times depend on data distribution and threshold choice.)


🚀 Run in Colab

You can try the code directly in Google Colab. Uncomment one scenario in the cell below to test different behaviors:

Open In Colab

args = "--compare"

  • Baseline comparison (no post-processing)
  • Linear vs Cost-Aware on plain max search
  • → Cost-Aware may be slightly slower due to filtering overhead

args = "--compare --post-iters 1500 --threshold 0.8 --seed 42"

  • Heavy post-processing comparison
  • Linear: applies expensive work to every element
  • Cost-Aware: applies expensive work only to candidates (~top 20%)
  • → Cost-Aware becomes much faster when post-processing dominates

args = "--compare --known-upper 1000000 --post-iters 1500 --threshold 0.8 --seed 42"

  • Known upper bound scenario
  • The true maximum possible value is already known
  • → Skip the initial scan and directly compute cutoff
  • → Further improves Cost-Aware efficiency, especially for large n

args = "--compare --sample-size 1000 --post-iters 1500 --threshold 0.8 --seed 42"

  • Sample-based upper bound estimation
  • Estimate cutoff from a small random sample (1000 elements)
  • → Reduces overhead for large n
  • → But if the sample underestimates the max, the candidate set grows and speedup decreases

📈 Performance Comparison

We provide two complementary plots:

  1. Without post-processing — both Linear and Cost-Aware runs in Ω(n) time, but Cost-Aware includes an extra filtering step, so the constant factor is larger.

    no-post

  2. With heavy post-processing — both methods still require O(n) to scan the input, but Cost-Aware applies the expensive step (e.g., DB lookups / model inference) only to a reduced subset O(k) (k ≪ n), instead of O(n) in the Linear case. This makes Cost-Aware significantly faster in scenarios where post-processing dominates runtime.

    with-post

To reproduce:

pip install -r requirements.txt
python benchmark.py

🛠️ Tech

  • Python 3
  • Matplotlib (for plotting)

📌 Notes

  • This is a prototype and not an optimized production algorithm.

  • The goal is to demonstrate that cost-aware strategies can sometimes outperform naive linear scans in practice when post-processing dominates.

  • Although this repository demonstrates maximum finding, the same cost-aware filtering idea applies to minimum finding as well. Simply invert the logic (use a lower threshold instead of an upper bound).

In both cases, the theoretical bound remains Θ(n), but practical savings arise when post-processing is costly.


📫 Contact

Maintained by SuDev feel free to open an issue or suggestion!


🇰🇷

이 프로젝트는 후처리 비용이 큰 환경에서 임계값 기반 필터링을 활용해 후보 개수를 줄여 실행 시간을 절감할 수 있는 실험적 알고리즘 접근을 다룹니다.

About

Cost-Aware Maximum Finding — a threshold-based approach to reduce expensive post-processing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published