Skip to content

SGDVector domain and example backreference#431

Open
Craigacp wants to merge 13 commits intooracle:mainfrom
Craigacp:vector-domain
Open

SGDVector domain and example backreference#431
Craigacp wants to merge 13 commits intooracle:mainfrom
Craigacp:vector-domain

Conversation

@Craigacp
Copy link
Member

@Craigacp Craigacp commented Feb 11, 2026

Description

Adds a reference to the ImmutableFeatureMap and Example that a SGDVector instance was created from. Added a new factory to SGDVector that dynamically chooses between DenseVector and SparseVector based on how sparse the example with those feature indices using a sparsity level set via a system property. Then migrated over (almost) all the trainers and models to use that factory. The ones that remain using SparseVector exclusively are MultinomialNaiveBayes and the ExternalModel subclasses, they will be migrated separately. After that, it relaxed most functions which operate on SparseVector to accept SGDVector and to work on dense vectors. The next changes are to move from vector.numActiveElements to vector.numNonZeroElements as DenseVector.numActiveElements always returns the size of the dense vector, not the number of non-zero elements (for historical and also slightly technical reasons). Finally there's a slight update to how the ForkJoinPools are used in KNNModel to ensure they are actually closed properly which I noticed and fixed on the way through. That involved bumping to Java 21, which in turn means it's time to remove support for SecurityManager (the bump to 21 has happened in a number of other in-progress branches, but this is the first one to become ready for merging).

Motivation

The ImmutableFeatureMap and Example link will enable a fast validation check for optimizations like #417, in concert with #426. Once both this and #426 have landed I'll go in and fix the validation check in #417 and relax it to apply the vector creation optimization to most ensembles.

Separately the sparse/dense vector factory will improve performance when working on dense data, as most models used to work only on sparse vector and the sparse format is inefficient for dense data. Fixes #432.

- Fixed DenseVector.equals and SparseVector.equals so they actually
  compare correctly if one is sparse and the other is dense.
- Added references and accessors for the ImmutableFeatureMap and Example when creating
  SGDVectors from examples.
- Added SGDVector.createFromExample which automatically switches between
  sparse and dense based on a system property for the sparsity.
- Added accessors for the fields in VectorTuple to make them easier to
  use in comparators as method references.
- Fixed a bug in the VectorTuple and MatrixTuple constructors which only
  accepted integers not doubles.
…ctiveElements, and moved over more things to SGDVector.createFromExample.
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Feb 11, 2026
@Craigacp Craigacp added the Oracle employee This PR is from an Oracle employee label Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement. Oracle employee This PR is from an Oracle employee

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a SGDVector creation factory that decides if the example should be sparse or dense.

1 participant