-
Notifications
You must be signed in to change notification settings - Fork 1
Description
SciSpark 202: Algorithms for MCC Search and PDF Clustering using SciSpark
Abstract/Agenda:
We introduce a 3 part course module on SciSpark, our AIST14 funded project for Highly Interactive and Scalable Climate Model Metrics and Analytics. The three part course session introduces a 101, 202, and 303 class for learning how to use Spark for science.
SciSpark 202 is a 1.5 hour session teacing two algorithms representative of the motivation for SciSpark - iterative data-reuse algorithms that share information between multiple stages. We will build on SciSpark 101 and Scala for science programming as an entry-course. The first algorithm will be an iterative graph-based algorithm for identifying Mesoscale Convective Complexes in Satellite Infrared data:
- Whitehall, Kim, et al. "Exploring a graph theory based algorithm for automated identification and characterization of large mesoscale convective systems in satellite datasets." Earth Science Informatics 8.3 (2015): 663-675.
- Implementation of Grab Em', Tag Em', Graph Em' (GTG) algorithm in Python.
We will demonstrate its implementation in SciSpark and discuss future directions.
The second algorithm is a K-means clustering algorithm for identification of Probability Density Functions (PDFs) for Climate Extremes in the North American Regional Climate Change Assessment Program (NARCCAP) data:
- P. C. Loikith, J. Kim, H. Lee, B. Linter, C. Mattmann, J. J. D. Neelin, D. E. Waliser, L. Mearns, S. McGinnis. Evaluation of Surface Temperature Probability Distribution Functions in the NARCCAP Hindcast Experiment. Journal of Climate, Vol. 28, No. 3, pp. 978-997, February 2015. doi:10.1175/JCLI-D-13-00457.1.