Skip to content

Draft curriculum for 201 #2

@wmburke

Description

@wmburke

SciSpark 202: Algorithms for MCC Search and PDF Clustering using SciSpark

Abstract/Agenda:

We introduce a 3 part course module on SciSpark, our AIST14 funded project for Highly Interactive and Scalable Climate Model Metrics and Analytics. The three part course session introduces a 101, 202, and 303 class for learning how to use Spark for science.

SciSpark 202 is a 1.5 hour session teacing two algorithms representative of the motivation for SciSpark - iterative data-reuse algorithms that share information between multiple stages. We will build on SciSpark 101 and Scala for science programming as an entry-course. The first algorithm will be an iterative graph-based algorithm for identifying Mesoscale Convective Complexes in Satellite Infrared data:

  • Whitehall, Kim, et al. "Exploring a graph theory based algorithm for automated identification and characterization of large mesoscale convective systems in satellite datasets." Earth Science Informatics 8.3 (2015): 663-675.
  • Implementation of Grab Em', Tag Em', Graph Em' (GTG) algorithm in Python.

We will demonstrate its implementation in SciSpark and discuss future directions.

The second algorithm is a K-means clustering algorithm for identification of Probability Density Functions (PDFs) for Climate Extremes in the North American Regional Climate Change Assessment Program (NARCCAP) data:

  • P. C. Loikith, J. Kim, H. Lee, B. Linter, C. Mattmann, J. J. D. Neelin, D. E. Waliser, L. Mearns, S. McGinnis. Evaluation of Surface Temperature Probability Distribution Functions in the NARCCAP Hindcast Experiment. Journal of Climate, Vol. 28, No. 3, pp. 978-997, February 2015. doi:10.1175/JCLI-D-13-00457.1.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions