Skip to content

add more algorithms  #53

@ljzzju

Description

@ljzzju

Sorry firstly, I have no idea where to put this issue.

By using the java reflection , ddf can easily implement the mllib algrithms.

I have found that the method name is restricted to be MLClassMethods.DEFAULT_TRAIN_METHOD_NAME which is defined as "train" in io.ddf.ml.MLClassMethods

this is fine for many mllib algrithms because they have provided the "train" method.

However, there are excludings , e.g. RandomForest . thus RF cannot be simply defined as KMeans does, like:

public IModel decisionTree(args....) throws DDFException {
return this.train("decisionTree", args...);
}

So I wonder things should be changed to let the actural training method awared towards specific mllib algorithm.

Here is what my suggestion:

(1) expand the current algorithm traing entrance with a training mehtod parameter, e.g

current: public IModel train(String trainMethodName, Object... paramArgs) throws DDFException
modified: public IModel train(String trainMethodName, String runMethodName, Object... paramArgs) throws DDFException

(2) the API provided to users should not include the runMethodName, thus maintaining the current ddf algorithm API entrance,, e.g

modified: public IModel KMeans(int numCentroids, int maxIters, int runs) throws DDFException {
return this.train("kmeans", "train",numCentroids, maxIters, runs);
}

any help would be appreciated, thanks.

// //// ISupportML //////

/**

  • Runs a training algorithm on the entire DDF dataset.
    *
  • @param trainMethodName
  • @param args
  • @return
  • @throws DDFException
    /
    @OverRide
    public IModel train(String trainMethodName, Object... paramArgs) throws DDFException {
    /
    *
    • Example signatures we must support:

    • Unsupervised Training

    • Kmeans.train(data: RDD[Array[Double]], k: Int, maxIterations: Int, runs: Int, initializationMode: String)

    • Supervised Training

    • LogisticRegressionWithSGD.train(input: RDD[LabeledPoint], numIterations: Int, stepSize: Double, miniBatchFraction:

    • Double, initialWeights: Array[Double])
      *

    • SVM.train(input: RDD[LabeledPoint], numIterations: Int, stepSize: Double, regParam: Double, miniBatchFraction:

    • Double)

    • */

      // Build the argument type array
      if (paramArgs == null) paramArgs = new Object[0];

      // Locate the training method
      String mappedName = Config.getValueWithGlobalDefault(this.getEngine(), trainMethodName);
      if (!Strings.isNullOrEmpty(mappedName)) trainMethodName = mappedName;

      TrainMethod trainMethod = new TrainMethod(trainMethodName, _MLClassMethods.DEFAULT_TRAIN_METHOD_NAME_, paramArgs);
      if (trainMethod.getMethod() == null) {
      throw new DDFException(String.format("Cannot locate method specified by %s", trainMethodName));
      }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions