SGDVector domain and example backreference#431
Open
Craigacp wants to merge 13 commits intooracle:mainfrom
Open
SGDVector domain and example backreference#431Craigacp wants to merge 13 commits intooracle:mainfrom
Craigacp wants to merge 13 commits intooracle:mainfrom
Conversation
- Fixed DenseVector.equals and SparseVector.equals so they actually compare correctly if one is sparse and the other is dense. - Added references and accessors for the ImmutableFeatureMap and Example when creating SGDVectors from examples. - Added SGDVector.createFromExample which automatically switches between sparse and dense based on a system property for the sparsity. - Added accessors for the fields in VectorTuple to make them easier to use in comparators as method references. - Fixed a bug in the VectorTuple and MatrixTuple constructors which only accepted integers not doubles.
…reateFromExample.
…ctiveElements, and moved over more things to SGDVector.createFromExample.
…operate on SGDVector not SparseVector.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a reference to the
ImmutableFeatureMapandExamplethat aSGDVectorinstance was created from. Added a new factory toSGDVectorthat dynamically chooses betweenDenseVectorandSparseVectorbased on how sparse the example with those feature indices using a sparsity level set via a system property. Then migrated over (almost) all the trainers and models to use that factory. The ones that remain usingSparseVectorexclusively are MultinomialNaiveBayes and theExternalModelsubclasses, they will be migrated separately. After that, it relaxed most functions which operate onSparseVectorto acceptSGDVectorand to work on dense vectors. The next changes are to move fromvector.numActiveElementstovector.numNonZeroElementsasDenseVector.numActiveElementsalways returns the size of the dense vector, not the number of non-zero elements (for historical and also slightly technical reasons). Finally there's a slight update to how the ForkJoinPools are used in KNNModel to ensure they are actually closed properly which I noticed and fixed on the way through. That involved bumping to Java 21, which in turn means it's time to remove support forSecurityManager(the bump to 21 has happened in a number of other in-progress branches, but this is the first one to become ready for merging).Motivation
The
ImmutableFeatureMapandExamplelink will enable a fast validation check for optimizations like #417, in concert with #426. Once both this and #426 have landed I'll go in and fix the validation check in #417 and relax it to apply the vector creation optimization to most ensembles.Separately the sparse/dense vector factory will improve performance when working on dense data, as most models used to work only on sparse vector and the sparse format is inefficient for dense data. Fixes #432.