Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
309e934
OTHER: Add 'tck' to output params for all pspline methods
derb12 Oct 4, 2025
8c7fb1a
MAINT: Document tck in the output params dict
derb12 Oct 5, 2025
85e31dd
ENH: Add initial implementation of optimize_pls
derb12 Oct 12, 2025
cd69169
MAINT: Add banded to sparse conversion and lam updater
derb12 Oct 18, 2025
cc7cd36
BREAK: Change how optimize_extended_range selects variables
derb12 Oct 19, 2025
8503240
MAINT: Add factorize and factorized_solve to PenalizedSystem
derb12 Oct 19, 2025
817cdf7
MAINT: Add btwb calc for PSpline
derb12 Oct 19, 2025
673cec1
MAINT: Add effective dimension calc for PenalizedSystem and PSpline
derb12 Oct 19, 2025
eb99e5f
OTHER: Add stochastic trace estimation for faster evaluation
derb12 Oct 19, 2025
5448e57
TEST: Fix test logic for penalized system factorized_solve
derb12 Oct 20, 2025
eafbf70
OTHER: Return fidelity and penalty terms from beads
derb12 Oct 26, 2025
9797132
OTHER: Add effective dimension calc for 2D Whittaker and PSpline
derb12 Oct 29, 2025
8008c62
MAINT: Ensure banded to sparse calc work for 1d array
derb12 Nov 16, 2025
ae31efa
MAINT: Faster stochastic pspline trace calc
derb12 Nov 16, 2025
35bdad7
DOCS: Fix typo in pspline_iasls equation
derb12 Nov 16, 2025
ef2c1d1
MAINT: Fix sparse indexing for older scipy versions
derb12 Nov 29, 2025
7979050
ENH: Add GCV and BIC as options for optmize_pls
derb12 Jan 17, 2026
76c82b5
TEST: Add basic tests for optmize_pls
derb12 Jan 17, 2026
b8a3064
Merge branch 'development' into optimize_lam
derb12 Jan 28, 2026
7d915b9
MAINT: Fix sparse indexing test logic
derb12 Jan 28, 2026
bbdec2b
Merge branch 'development' into optimize_lam
derb12 Feb 1, 2026
99eef75
OTH: Create result objects for Whittaker and P-splines
derb12 Feb 1, 2026
c3209db
MAINT: Remove effective dimension calcs from penalty objects
derb12 Feb 1, 2026
b3cbcc4
DOCS: Add results module to docs
derb12 Feb 1, 2026
1dd9851
DOCS: Add extension to add cross references for param dict items
derb12 Feb 1, 2026
270b121
ENH: Return result objects for all relevant 1d methods
derb12 Feb 23, 2026
3b86376
ENH: Return result objects for all relevant 2D methods
derb12 Feb 23, 2026
408f858
MAINT: Move `tck` from params to PSplineResult[2D] property
derb12 Feb 28, 2026
72fbbe1
DOC: Add class attributes to toctree
derb12 Mar 1, 2026
7d226a7
MAINT: Ensure beads works with optimize_pls when using l-curve
derb12 Mar 1, 2026
d2dc247
DOC: Add discussion on beads simplification
derb12 Mar 1, 2026
7d8e72b
TST: Add tests for effective dimension limits in 1D
derb12 Mar 8, 2026
dca91f2
TST: Add tests for effective dimension limits in 2D
derb12 Mar 8, 2026
a0c27a0
TST: Add tests for algorithms with binary weighting
derb12 Mar 8, 2026
789ed2c
BREAK: iasls returns squared weights
derb12 Mar 8, 2026
c53b533
DOC: Simplify optimize_pls docs in algorithms section
derb12 Mar 15, 2026
ca7e424
MAINT: Allow iasls, drpls, and aspls for U-curve optimization
derb12 Mar 15, 2026
974e543
MAINT: Add docstring for optimize_pls functional interface
derb12 Mar 15, 2026
6ff38e4
TST: Fix flaky test failures
derb12 Mar 15, 2026
47d728d
MAINT: Fix docstring section ordering
derb12 Mar 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions docs/_templates/autosummary/class.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
{{ fullname | escape | underline}}

.. currentmodule:: {{ module }}

.. autoclass:: {{ objname }}

{% block attributes %}
{% if attributes %}
.. rubric:: {{ _('Attributes') }}

.. autosummary::
:toctree:
{% for item in attributes %}
~{{ name }}.{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}

{% block methods %}
{% if methods %}
.. rubric:: {{ _('Methods') }}

.. autosummary::
:template: autosummary/method.rst
:toctree:
:nosignatures:
{% for item in methods %}
{%- if not item.startswith('_') or not item in ['__init__'] %}
~{{ name }}.{{ item }}
{%- endif -%}
{%- endfor %}
{% endif %}
{% endblock %}
1 change: 1 addition & 0 deletions docs/_templates/autosummary/module.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
.. rubric:: {{ _('Classes') }}

.. autosummary::
:template: autosummary/class.rst
:toctree:
:nosignatures:
{% for item in classes %}
Expand Down
135 changes: 118 additions & 17 deletions docs/algorithms/algorithms_1d/misc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,29 @@ beads (Baseline Estimation And Denoising with Sparsity)

:meth:`~.Baseline.beads` decomposes the input data into baseline and pure, noise-free signal by
modeling the baseline as a low pass filter and by considering the signal and its derivatives
as sparse.
as sparse. The minimized equation for calculating the pure signal is given as:

.. math::

\frac{1}{2} ||H(y - s)||_2^2
+ \lambda_0 \sum\limits_{i}^{N} \theta(s_i, r)
+ \lambda_1 \sum\limits_{i}^{N - 1} \phi(\Delta^1 s_i)
+ \lambda_2 \sum\limits_{i}^{N - 2} \phi(\Delta^2 s_i)

where :math:`y` is the measured data, :math:`s` is the calculated pure signal,
:math:`H` is a high pass filter, :math:`\theta()` is a differentiable, symmetric
or asymmetric penalty function on the calculated signal, :math:`r` is the asymmetry
of :math:`\theta()` by which negative values are penalized more, :math:`\Delta^1` and :math:`\Delta^2`
are :ref:`finite-difference operators <difference-matrix-explanation>` of order 1
and 2, respectively, and :math:`\phi()` is a differentiable, symmetric penalty function
approximating the L1 loss (mean absolute error) applied to the first and second derivatives
of the calculated signal.

The calculated baseline, :math:`v`, upon convergence of calculating the pure signal is given by:

.. math::

v = y - s - H(y - s)

.. plot::
:align: center
Expand Down Expand Up @@ -189,22 +211,23 @@ as sparse.
x, data, baselines = create_data()
baseline_fitter = Baseline(x, check_finite=False)

fit_params = [(3, 3), (0.15, 8), (0.1, 6), (0.25, 8), (0.1, 0.6)]

fit_params = [
(500, 6, 0.01),
(0.01, 8, 0.08),
(80, 8, 0.01),
(0.2, 6, 0.04),
(100, 1, 0.01)
]
figure, axes, handles = create_plots(data, baselines)
for i, (ax, y) in enumerate(zip(axes, data)):
if i == 0:
freq_cutoff = 0.002
else:
freq_cutoff = 0.005
lam_0, asymmetry = fit_params[i]
alpha, asymmetry, freq_cutoff = fit_params[i]
baseline, params = baseline_fitter.beads(
y, freq_cutoff=freq_cutoff, lam_0=lam_0, lam_1=0.05, lam_2=0.2, asymmetry=asymmetry
y, freq_cutoff=freq_cutoff, alpha=alpha, asymmetry=asymmetry, tol=1e-3
)
ax.plot(baseline, 'g--')

The signal with both noise and baseline removed can also be obtained from the output
of the beads function.
of the beads method by accessing the 'signal' key in the output parameters.

.. plot::
:align: center
Expand All @@ -214,15 +237,10 @@ of the beads function.

# to see contents of create_data function, look at the second-to-top-most algorithm's code
figure, axes, handles = create_plots(data, baselines)
fit_params = [(3, 3), (0.15, 8), (0.1, 6), (0.25, 8), (0.1, 0.6)]
for i, (ax, y) in enumerate(zip(axes, data)):
if i == 0:
freq_cutoff = 0.002
else:
freq_cutoff = 0.005
lam_0, asymmetry = fit_params[i]
alpha, asymmetry, freq_cutoff = fit_params[i]
baseline, params = baseline_fitter.beads(
y, freq_cutoff=freq_cutoff, lam_0=lam_0, lam_1=0.05, lam_2=0.2, asymmetry=asymmetry
y, freq_cutoff=freq_cutoff, alpha=alpha, asymmetry=asymmetry, tol=1e-3
)

ax.clear() # remove the old plots in the axis
Expand All @@ -234,3 +252,86 @@ of the beads function.
(data_handle[0], signal_handle[0]),
('data', 'signal from beads'), loc='center', frameon=False
)

pybaselines version 1.3.0 introduced an optional simplification of the :math:`\lambda_0`,
:math:`\lambda_1`, :math:`\lambda_2` regularization parameter selection using the procedure
recommended by the BEADS manuscript through the addition of the parameter :math:`\alpha`.
In detail, it is assumed that each :math:`\lambda_d` value is approximately proportional to some
constant :math:`\alpha` divided by the L1 norm of the d'th derivative of the input data
such that:

.. math::
:label: lam_eq

\lambda_0 = \frac{\alpha}{||y||_1},
\lambda_1 = \frac{\alpha}{||y'||_1},
\lambda_2 = \frac{\alpha}{||y''||_1}

Such a parametrization allows varying just :math:`\alpha`, which simplifies basic usage and
allows for easier integration within optimization frameworks to find the best regularization
parameter, as demonstrated by `Bosten, et al. <https://doi.org/10.1016/j.chroma.2023.464360>`_

At first glance, Eq. :eq:`lam_eq` seems to have an issue in that the penalties within the BEADS
algorithm are applied to the calculated signal, while the estimated :math:`\lambda_d` values
will be based on the input data, which is composed of the signal, baseline, and noise.
Due to this, the estimate for :math:`\lambda_0` is affected by an overestimation of the
signal's L1 norm, while also not accounting for the asymmetry of the penalty function
:math:`\theta()`. Further, while the estimates for :math:`\lambda_1` and :math:`\lambda_2`
are less affected by the baseline since taking the derivatives eliminates some
contributions from the baseline, the influence of noise is amplified, as demonstrated
in the figure below, also resulting in an overestimation of their L1 norms. In practice,
however, this systematic norm overestimation can be accounted for by simply selecting a larger
:math:`\alpha`, and Eq. :eq:`lam_eq` ends up being a fairly good first approximation to
allow one value to determine all three regularization terms.

.. plot::
:align: center
:context: close-figs
:include-source: False
:show-source-link: True

from pybaselines.utils import make_data

x, y_no_noise, known_baseline = make_data(
return_baseline=True, noise_std=0.001, bkg_type='sine'
)
true_y = y_no_noise - known_baseline
x, y = make_data(noise_std=0.1, bkg_type='sine')
# subtract the minimum so it plots nicer with the pure signal
y -= y.min()

_, (ax1, ax2, ax3) = plt.subplots(nrows=3, sharex=True)
dy = np.gradient(y)
d2y = np.gradient(dy)
true_dy = np.gradient(true_y)
true_d2y = np.gradient(true_dy)
ax1.plot(x, y, label='data')
ax1.plot(x, true_y, '--', label='true signal')
ax2.plot(x, dy)
ax2.plot(x, true_dy, '--')
ax3.plot(x, d2y, label='data')
ax3.plot(x, true_d2y, '--', label='true signal')

ax1.legend()
ax1.set_ylabel('0th Derivative')
ax2.set_ylabel('1st Derivative')
ax3.set_ylabel('2nd Derivative')

plt.figure()
deriv_values = np.arange(3, dtype=int)
plt.plot(
deriv_values,
[abs(y).sum(), abs(dy).sum(), abs(d2y).sum()],
'o-', label='data'
)
plt.plot(
deriv_values,
[abs(true_y).sum(), abs(true_dy).sum(), abs(true_d2y).sum()],
'o--', label='true signal'
)

plt.xlabel('Derivative Order')
plt.xticks(deriv_values)
plt.ylabel('L1 Norm')
plt.semilogy()
plt.legend()
136 changes: 134 additions & 2 deletions docs/algorithms/algorithms_1d/optimizers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ Algorithms
optimize_extended_range
~~~~~~~~~~~~~~~~~~~~~~~

The :meth:`~.Baseline.optimize_extended_range` function is based on the `Extended Range
The :meth:`~.Baseline.optimize_extended_range` method is based on the `Extended Range
Penalized Least Squares (erPLS) method <https://doi.org/10.3390/s20072015>`_,
but extends its usage to all Whittaker-smoothing-based, polynomial, and spline algorithms.

In this algorithm, a linear baseline is extrapolated from the left and/or
right edges, Gaussian peaks are added to these baselines, and then the original
data plus the extensions are input into the indicated Whittaker or polynomial function.
data plus the extensions are input into the indicated Whittaker or polynomial method.
An example of data with added baseline and Gaussian peaks is shown below.

.. _extending-data-explanation:
Expand Down Expand Up @@ -313,3 +313,135 @@ reducing the number of data points in regions where higher stiffness
is required. There is no figure showing the fits for various baseline types for
this method since it is more suited for hard-to-fit data; however, :ref:`an
example <sphx_glr_generated_examples_optimizers_plot_custom_bc_1_whittaker.py>` showcases its use.


optimize_pls (Optimize Penalized Least Squares)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The :meth:`~.Baseline.optimize_pls` method is based on the `method developed by the authors
of the arPLS algorithm <https://dx.doi.org/10.5573/ieie.2016.53.3.124>`_,
but extends its usage to all Whittaker-smoothing and penalized spline algorithms.

The method works by considering the general equation of penalized least squares given by

.. math::
F + \lambda P

where :math:`F` is the fidelity of the fit and :math:`P` is the penalty term whose contribution
is controlled by the regularization parameter :math:`\lambda`. In general, both Whittaker
smoothing and penalized splines have a fidelity given by:

.. math::

F = \sum\limits_{i}^N w_i (y_i - v_i)^2

where :math:`y_i` is the measured data, :math:`v_i` is the calculated baseline,
and :math:`w_i` is the weight. The penalty for Whittaker smoothing is generally:

.. math::

P = \sum\limits_{i}^{N - d} (\Delta^d v_i)^2

where :math:`\Delta^d` is the finite-difference operator of order d. For penalized
splines the penalty is generally:

.. math::


P = \sum\limits_{i}^{M - d} (\Delta^d c_i)^2

where :math:`c` are the calculated spline coefficients.

In either case, a range of ``lam`` values are tested, with the fidelity and penalty
values calculated for each fit. Then the array of fidelity and penalty values are
normalized by their minimum and maximum values:

.. math::

F_{norm} = \frac{F - \min(F)}{\max(F) - \min(F)}

.. math::

P_{norm} = \frac{P - \min(P)}{\max(P) - \min(P)}


The ``lam`` value that produces a minimum of the sum of normalized penalty and fidelity
values is then selected as the optimal value. An example demonstrating this process is shown below.

.. note::
Fun fact: this method was the reason that all baseline correction methods in pybaselines
output a parameter dictionary. Since the conception of pybaselines, the author had tried to
implement this method and only realized after ~5 years that the original publication had a
typo that prevented being able to replicate the publication's results.


.. plot::
:align: center
:context: close-figs
:include-source: False
:show-source-link: True

def normalize(values):
return (values - values.min()) / (values.max() - values.min())

x = np.linspace(1, 1000, 500)
signal = (
gaussian(x, 6, 180, 5)
+ gaussian(x, 6, 550, 5)
+ gaussian(x, 9, 800, 10)
+ gaussian(x, 9, 100, 12)
+ gaussian(x, 15, 400, 8)
+ gaussian(x, 13, 700, 12)
+ gaussian(x, 9, 880, 8)
)
baseline = 5 + 15 * np.exp(-x / 800) + gaussian(x, 5, 700, 300)
noise = np.random.default_rng(1).normal(0, 0.2, x.size)
y = signal * 0.5 + baseline + noise

min_value = 2
max_value = 10
step = 0.25
lam_range = np.arange(min_value, max_value, step)

fit, params = baseline_fitter.optimize_pls(
y, opt_method='u-curve', min_value=min_value, max_value=max_value, step=step
)

plt.figure()
plt.plot(lam_range, normalize(params['penalty']), label='Penalty, $P_{norm}$')
plt.plot(lam_range, normalize(params['fidelity']), label='Fidelity, $F_{norm}$')
plt.plot(lam_range, params['metric'], '--', label='$F_{norm} + P_{norm}$')
index = np.argmin(abs(lam_range - np.log10(params['optimal_parameter'])))
plt.plot(lam_range[index], params['metric'][index], 'o', label='Optimal Value')
plt.xlabel('log$_{10}$(lam)')
plt.ylabel('Normalized Value')
plt.legend()

plt.figure()
plt.plot(x, y)
plt.plot(
x, baseline_fitter.arpls(y, lam=10**min_value)[0], label=f'lam=10$^{{{min_value}}}$'
)
plt.plot(
x, baseline_fitter.arpls(y, lam=10**max_value)[0], label=f'lam=10$^{{{max_value}}}$'
)
plt.plot(x, fit, label=f'lam=10$^{{{np.log10(params['optimal_parameter']):.1f}}}$')
plt.legend()


In general, this method is more sensitive to the minimum and maximum ``lam`` values used for
the fits compared to :meth:`~.Baseline.optimize_extended_range`.

.. plot::
:align: center
:context: close-figs
:include-source: False
:show-source-link: True

# to see contents of create_data function, look at the top-most algorithm's code
figure, axes, handles = create_plots(data, baselines)
for i, (ax, y) in enumerate(zip(axes, data)):
baseline, params = baseline_fitter.optimize_pls(
y, method='arpls', opt_method='u-curve', min_value=2.5, max_value=5
)
ax.plot(baseline, 'g--')
Loading
Loading