Skip to content

Commit 3033764

Browse files
authored
Merge pull request #5289 from FlorentinD/fix-training-docs
Fix smaller issues in training docs
2 parents 964319d + 4b259a9 commit 3033764

File tree

2 files changed

+3
-11
lines changed
  • doc/asciidoc/machine-learning

2 files changed

+3
-11
lines changed

doc/asciidoc/machine-learning/linkprediction-pipeline/training.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ These graphs are internally managed and exist only for the duration of the train
1515
. Split the training instances using stratified k-fold cross-validation.
1616
The number of folds `k` can be configured using `validationFolds` in `gds.beta.pipeline.linkPrediction.configureSplit`.
1717
. Train each model candidate given by the <<linkprediction-adding-model-candidates,parameter space>> for each of the folds and evaluate the model on the respective validation set.
18-
The training process uses a logistic regression or random forest algorithm, and the evaluation uses the specified <<linkprediction-pipelines-metrics,metric>>.
18+
The evaluation uses the specified <<linkprediction-pipelines-metrics,metric>>.
1919
. Declare as winner the model with the highest average metric across the folds.
2020
. Re-train the winning model on the whole training set and evaluate it on both the `train` and `test` sets.
2121
In order to evaluate on the `test` set, the feature pipeline is first applied again as for the `train` set.

doc/asciidoc/machine-learning/node-property-prediction/nodeclassification-pipeline/training.adoc

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ More precisely, the training proceeds as follows:
1414
These graphs are internally managed and exist only for the duration of the training.
1515
. Split the nodes in the train graph using stratified k-fold cross-validation.
1616
The number of folds `k` can be configured as described in <<nodeclassification-pipelines-configure-splits, Configuring the node splits>>.
17-
. Each model candidate defined in the <<nodeclassification-pipelines-adding-model-candidates,parameter space>> is trained on each train set and evaluated on the respective validation set for every fold. The training process uses a logistic regression algorithm, and the evaluation uses the specified <<nodeclassification-pipeline-metrics,metric>>.
17+
. Each model candidate defined in the <<nodeclassification-pipelines-adding-model-candidates,parameter space>> is trained on each train set and evaluated on the respective validation set for every fold. The evaluation uses the specified primary <<nodeclassification-pipeline-metrics,metric>>.
1818
. Choose the best performing model according to the highest average score for the primary metric.
1919
. Retrain the winning model on the entire train graph.
2020
. Evaluate the performance of the winning model on the whole train graph as well as the test graph.
@@ -142,15 +142,7 @@ The structure of `modelInfo` is:
142142
max: Float,
143143
min: Float,
144144
params: Map
145-
},
146-
{
147-
avg: Float,
148-
max: Float,
149-
min: Float,
150-
params: Map
151-
},
152-
...
153-
]
145+
}
154146
}
155147
}
156148
}

0 commit comments

Comments
 (0)