You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tree/dataframe/src/RDataFrame.cxx
+36-1Lines changed: 36 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -791,7 +791,6 @@ of the cluster schedulers supported by Dask (more information in the
791
791
~~~{.py}
792
792
import ROOT
793
793
from dask.distributed import Client
794
-
795
794
# In a Python script the Dask client needs to be initalized in a context
796
795
# Jupyter notebooks / Python session don't need this
797
796
if __name__ == "__main__":
@@ -839,6 +838,42 @@ if __name__ == "__main__":
839
838
Note that when processing a TTree or TChain dataset, the `npartitions` value should not exceed the number of clusters in
840
839
the dataset. The number of clusters in a TTree can be retrieved by typing `rootls -lt myfile.root` at a command line.
841
840
841
+
### Distributed FromSpec
842
+
843
+
RDataFrame can be also built from a JSON sample specification file using the FromSpec function. In distributed mode, two arguments need to be provided: the path to the specification
844
+
jsonFile (same as for local RDF case) and an additional executor argument - in the same manner as for the RDataFrame constructors above - an executor can either be a spark connection or a dask client.
845
+
If no second argument is given, the local version of FromSpec will be run. Here is an example of FromSpec usage in distributed RDF using either spark or dask backends.
846
+
For more information on FromSpec functionality itself please refer to [FromSpec](\ref rdf-from-spec) documentation.
0 commit comments