PR to add accuracy verification to auto_scheduler by islavutin · Pull Request #21 · Deelvin/tvm

islavutin · 2022-08-23T16:13:29Z

@apeskov , @elvin-n - guys, would you please review and provide your feedback

…ation to auto scheduler process

apeskov · 2022-08-26T08:08:59Z

CMakeLists.txt

+        3rdparty/dmlc-core/src/io/line_split.cc
+        3rdparty/dmlc-core/src/io/local_filesys.cc
+        3rdparty/dmlc-core/src/io/recordio_split.cc
+        )


could you please point which function requires compilation of this set of files?

This is required for serialization of ref tensors in measure_records.cc https://github.com/Deelvin/tvm/pull/21/files#diff-64f00aa6e5a03e1cee25959caacb7e33bf4fb4d829ab0f1825ccfdfc459add51

apeskov · 2022-08-26T08:15:18Z

include/tvm/auto_scheduler/measure.h


+
+  virtual Array<tvm::runtime::NDArray> GetOutput(const Array<MeasureInput>& inputs,
+                                   const Array<BuildResult>& build_results, int verbose) = 0;


Documentation string is required.
GetXXX sounds like a getter, but suppose it's one more version of method Run with reported outputs.

apeskov · 2022-08-26T08:18:05Z

include/tvm/auto_scheduler/search_task.h

  LayoutRewriteOption layout_rewrite_option;
  /*! \brief Names of some user defined input data used in program measuring. */
  Array<String> task_input_names;
+  /*! \brief keeping custom seed to reproduce randomly generated input values */


Comments should start from capital letter.

apeskov · 2022-08-26T08:21:12Z

include/tvm/auto_scheduler/search_task.h


+  void SetReferenceTensors(Array<tvm::runtime::NDArray> arr);
+
+  void SetTarget(Target target, Target target_host);


Why do we need these setters? All this fields are public.

apeskov · 2022-08-26T08:24:32Z

include/tvm/auto_scheduler/search_task.h

  /*! \brief Names of some user defined input data used in program measuring. */
  Array<String> task_input_names;
+  /*! \brief keeping custom seed to reproduce randomly generated input values */
+  int custom_seed;


I still wondering about random generator behaviour on different platforms. I guess different std library may produce different random arrays with same seed. I don't think that using seed is acceptable in that case.

apeskov · 2022-08-26T08:57:17Z

python/tvm/auto_scheduler/relay_integration.py

+
+    #faked ref to extract task
+    #TODO: without it, initial serialization crashes... how to solve it more elegantly? 
+    ref = [tvm.nd.empty((1, 1))]


Very suspicious comment. Looks like constructor of SearchTask object with default arguments produces invalid object. I guess it should be clarified and improved.

Looks like None value in field of type Array<NDArray> leads to wrong searialization.

apeskov · 2022-08-26T09:17:33Z

python/tvm/auto_scheduler/task_scheduler.py

+                original_target_host = task.target_host
+
+                ref_target = tvm.target.Target("llvm", host="llvm")
+                _ffi_api.SetTarget(task, ref_target, None)


rewriting targets field of original task is not elegant solution. Is it possible to make a copy of search task object with ref_taget?

apeskov · 2022-08-26T09:23:05Z

python/tvm/auto_scheduler/task_scheduler.py

+
+                results = local_runner.get_ouput(measure_inputs, build_results)
+
+                _ffi_api.SetReferenceTensors(task, results)


Using of function _ffi_api.SetReferenceTensors is a workaround. I'm personally sure that ffi infrastructure allow to set and get values of native object without special setting methods.

apeskov · 2022-08-26T09:26:21Z

python/tvm/auto_scheduler/task_scheduler.py

 from . import _ffi_api

+
+import tvm


where do you use tvm module in this file? Do you really need it? I guess from tvm.target import Target will be better.

apeskov · 2022-08-26T09:35:00Z

src/runtime/contrib/random/random.cc

 TVM_REGISTER_GLOBAL("tvm.contrib.random.random_fill").set_body([](TVMArgs args, TVMRetValue* ret) {
  RandomThreadLocalEntry* entry = RandomThreadLocalEntry::ThreadLocal();
  DLTensor* out = args[0];
+  int seed = args[1];


There are a lot of usage random_fill function with only one argument. By this you changed semantic and now all of them will crash. Please make it optional.

apeskov · 2022-08-29T11:59:43Z

src/auto_scheduler/measure_record.cc

+        i.Save(fs);
+    }
+
+    delete fs;


Using of external tmp file for serialisation is not nice. It may confuse some one (including me). Moreover it cannot be easily enhanced to RPC case.

First option is to use direct serialisation into json.

example:

std::string bytes; runtime::NDArray arr = data.ref_output_tensors[0]; dmlc::MemoryStringStream stream(&bytes); ci.Save(&stream); writer->WriteArrayItem(bytes);

It has a critical issue, because "bytes" array may contain special chars like ", \t, \n and other. Alternative way to keep it as hex.

Second option is to change the place where we keep "ref_output_tensors". I'm not sure that "SearchTask" object is the best choice.

Icemist · 2022-09-01T14:46:47Z