lab-design
diff --git a/‎.DS_Store‎
4 KB b/‎.DS_Store‎
4 KB
diff --git a/‎_data/grants.yml‎
Lines changed: 64 additions & 1 deletion b/‎_data/grants.yml‎
Lines changed: 64 additions & 1 deletion
diff --git a/‎_data/members.yml‎
Lines changed: 12 additions & 18 deletions b/‎_data/members.yml‎
Lines changed: 12 additions & 18 deletions
diff --git a/‎_papers/ESEC-FSE-20b/index.md‎
Lines changed: 1 addition & 1 deletion b/‎_papers/ESEC-FSE-20b/index.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎_papers/ESEC-FSE-22/23Shades-fse22.pdf‎
-455 KB b/‎_papers/ESEC-FSE-22/23Shades-fse22.pdf‎
-455 KB
diff --git a/‎_papers/ICSE-22b/index.md‎
Lines changed: 1 addition & 1 deletion b/‎_papers/ICSE-22b/index.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎_papers/ICSE-23a/fairify.pdf‎
474 KB b/‎_papers/ICSE-23a/fairify.pdf‎
474 KB
diff --git a/‎_papers/ICSE-23a/index.md‎
Lines changed: 24 additions & 0 deletions b/‎_papers/ICSE-23a/index.md‎
Lines changed: 24 additions & 0 deletions
diff --git a/‎_papers/ICSE-23b/index.md‎
Lines changed: 24 additions & 0 deletions b/‎_papers/ICSE-23b/index.md‎
Lines changed: 24 additions & 0 deletions
diff --git a/‎_papers/ICSE-23b/rnn.pdf‎
781 KB b/‎_papers/ICSE-23b/rnn.pdf‎
781 KB
@@ -610,4 +610,67 @@
     In today’s software-centric world, ultra-large-scale software repositories, e.g. GitHub, with hundreds of thousands of projects each, are the new library of Alexandria. They contain an enormous corpus of software and information about software. Scientists and engineers alike are interested in analyzing this wealth of information both for curiosity as well as for testing important research hypotheses. However, the current barrier to entry is prohibitive and only a few with well-established infrastructure and deep expertise can attempt such ultra-large-scale analysis. Necessary expertise includes: programmatically accessing version control systems, data storage and retrieval, data mining, and parallelization. The need to have expertise in these four different areas significantly increases the cost of scientific research that attempts to answer research questions involving ultra-large-scale software repositories. As a result, experiments are often not replicable, and reusability of experimental infrastructure low. Furthermore, data associated and produced by such experiments is often lost and becomes inaccessible and obsolete, because there is no systematic curation. Last but not least, building analysis infrastructure to process ultra-large-scale data efficiently can be very hard.
 
     This project will continue to enhance the CISE research infrastructure called Boa to aid and assist with such research. This next version of Boa will be called Boa 2.0 and it will continue to be globally disseminated. The project will further develop the programming language also called Boa, that can hide the details of programmatically accessing version control systems, data storage and retrieval, data mining, and parallelization from the scientists and engineers and allow them to focus on the program logic. The project will also enhance the data mining infrastructure for Boa, and a BIGDATA repository containing millions of open source project for analyzing ultra-large-scale software repositories to help with such experiments. The project will integrate Boa 2.0 with the Center for Open Science Open Science Framework (OSF) to improve reproducibility and with the national computing resource XSEDE to improve scalability. The broader impacts of Boa 2.0 stem from its potential to enable developers, designers and researchers to build intuitive, multi-modal, user-centric, scientific applications that can aid and enable scientific research on individual, social, legal, policy, and technical aspects of open source software development. This advance will primarily be achieved by significantly lowering the barrier to entry and thus enabling a larger and more ambitious line of data-intensive scientific discovery in this area.
-    
+- key: grant-nsf-2223812
+  agency: NSF
+  primary: true
+  title: "SHF:Small: More Modular Deep Learning"
+  start_date: 2022-10-01 #Roughly
+  url: "https://www.nsf.gov/awardsearch/showAward?AWD_ID=2223812&HistoricalAwards=false"
+  amount: $580,000.00
+  PI: Hridesh Rajan
+  coPIs:
+  end_date: 2025-09-30 #Roughly
+  abstract: >
+    This project will study a class of machine learning algorithms known as deep learning 
+    that has received much attention in academia and industry. Deep learning has a large 
+    number of important societal applications, from self-driving cars to question-answering 
+    systems such as Siri and Alexa. A deep learning algorithm uses multiple layers of 
+    transformation functions to convert inputs to outputs, each layer learning higher-level 
+    of abstractions in the data successively. The availability of large datasets has made it 
+    feasible to train deep learning models. Since the layers are organized in the form of a 
+    network, such models are also referred to as deep neural networks (DNN). While the jury 
+    is still out on the impact of deep learning on the overall understanding of software's 
+    behavior, a significant uptick in its usage and applications in wide-ranging areas and 
+    safety-critical systems, e.g., autonomous driving, aviation system, medical analysis, 
+    etc., combine to warrant research on software engineering practices in the presence of 
+    deep learning. One challenge is to enable the reuse and replacement of the parts of a 
+    DNN that has the potential to make DNN development more reliable. This project will 
+    investigate a comprehensive approach to systematically investigate the decomposition of 
+    deep neural networks into modules to enable reuse, replacement, and independent evolution 
+    of those modules. A module is an independent part of a software system that can be tested, 
+    validated, or utilized without a major change to the rest of the system. Allowing the 
+    reuse of DNN modules is expected to reduce energy and data intensive training efforts 
+    to construct DNN models. Allowing replacement is expected to help replace faulty 
+    functionality in DNN models without needing costly retraining steps.
+
+    The preliminary work of the investigator has shown that it is possible to decompose fully
+    connected neural networks and CNN models into modules and conceptualize the notion of 
+    modules. The main goals and the intellectual merits of this project are to further expand 
+    this decomposition approach along three dimensions: (1) Does the decomposition approach
+    generalize to large Natural Language Processing (NLP) models, where a huge reduction in CO2e 
+    emission is expected? (2) What criteria should be used for decomposing a DNN into modules? 
+    A better understanding of the decomposition criteria can help inform the design and 
+    implementation of DNNs and reduce the impact of changes. (3) While coarse-grained
+    decomposition has worked well for FCNNs and CNNs, does a finer-grained decomposition of 
+    DNNs into modules connected using AND-OR-NOT primitives a la structured decomposition has 
+    the potential to both enable more reuse (especially for larger DNNs) and provide deeper
+    insights into the behavior of DNNs? The project also incorporates a rigorous evaluation plan 
+    using widely studied datasets. The project is expected to broadly impact society by informing 
+    the science and practice of deep learning. A serious problem facing the current software 
+    development workforce is that deep learning is widely utilized in our software systems, but 
+    scientists and practitioners do not yet have a clear handle on critical problems such as  
+    explainability of DNN models, DNN reuse, replacement, independent testing, and independent 
+    development. There was no apparent need to investigate the notions of modularity as neural 
+    network models trained before the deep learning era were mostly small, trained on small 
+    datasets, and were mostly used as experimental features. The notion of DNN modules developed 
+    by this project, if successful, could help make significant advances on a number of open 
+    challenges in this area. DNN modules could enable the reuse of already trained DNN modules in 
+    another context. Viewing a DNN as a composition of DNN modules instead of a black box could  
+    enhance the explainability of a DNN's behavior. This project, if successful, will thus have a 
+    large positive impact on the productivity of these programmers, the understandability and 
+    maintainability of the DNN models that they deploy, and the scalability and correctness of 
+    software systems that they produce. Other impacts will include: research-based advanced 
+    training as well as enhancement in experimental and system-building expertise of future 
+    computer scientists, incorporation of research results into courses at Iowa State University 
+    as well as facilitating the integration of modularity research-related topics, and increased 
+    opportunities for the participation of underrepresented groups in research-based training.
@@ -31,12 +31,6 @@
   site: https://www.cs.iastate.edu/people/shibbir-ahmed
   img: sahmed.jpg
 
-- name: Muhammad Arshad
-  status: PhD
-  email: arbab@iastate.edu
-  site: https://www.cs.iastate.edu/people/arbab
-  img: arbab.jpg
-
 - name: Fraol Batole
   status: PhD
   email: fraol@iastate.edu
@@ -51,21 +45,27 @@
 
 - name: Sayem Imtiaz
   status: PhD
-  email: liyp0095@iastate.edu
+  email: sayem@iastate.edu
   site: https://www.cs.iastate.edu/people/sayem-mohammad-imtiaz
   img: simtiaz.jpg
-
-- name: David OBrien
+  
+- name: Ruchira Manke
   status: PhD
-  email: dobrien@iastate.edu
-  site: https://davidmobrien.github.io/
-  img: david.png
+  email: rmanke@iastate.edu
+  site: https://tads.research.iastate.edu/people/ruchira-manke
+  img: ruchira.jpg
 
 - name: Giang Nguyen
   status: PhD
   email: gnguyen@iastate.edu
   site: https://www.cs.iastate.edu/gnguyen
   img: giang.jpeg
+  
+- name: David OBrien
+  status: PhD
+  email: dobrien@iastate.edu
+  site: https://davidmobrien.github.io/
+  img: david.png
 
 - name: Astha Singh
   status: PhD
@@ -87,10 +87,4 @@
 
 # Master's Students:
 
-- name: Ruchira Manke
-  status: MS
-  email: rmanke@iastate.edu
-  site: https://tads.research.iastate.edu/people/ruchira-manke
-  img: ruchira.jpg
-
 # Bachelor's Students:
@@ -38,5 +38,5 @@ kind: conference
 download_link: modularity.pdf
 publication_year: 2020
 tags:
-  - boa
+  - mdl
 ---
@@ -20,5 +20,5 @@ kind: conference
 download_link: cnnmodularity.pdf
 publication_year: 2022
 tags:
-  - boa
+  - mdl
 ---
@@ -0,0 +1,24 @@
+---
+key: ICSE-23a
+permalink: /papers/ICSE-23a/
+short_name: ICSE '23
+title: "Fairify: Fairness Verification of Neural Networks"
+bib: |
+  @inproceedings{biswas23fairify,
+    author = {Sumon Biswas and Hridesh Rajan},
+    title = {Fairify: Fairness Verification of Neural Networks},
+    booktitle = {ICSE'23: The 45th International Conference on Software Engineering},
+    location = {Melbourne, Australia},
+    month = {May 14-May 20},
+    year = {2023},
+    entrysubtype = {conference},
+    abstract = {
+      Fairness of machine learning (ML) software has become a major concern in the recent past. Although recent research on testing and improving fairness have demonstrated impact on real-world software, providing fairness guarantee in practice is still lacking. Certification of ML models is challenging because of the complex decision-making process of the models. In this paper, we proposed Fairify, an SMT-based approach to verify individual fairness property in neural network (NN) models. Individual fairness ensures that any two similar individuals get similar treatment irrespective of their protected attributes e.g., race, sex, age. Verifying this fairness property is hard because of the global checking and non-linear computation nodes in NN. We proposed sound approach to make individual fairness verification tractable for the developers. The key idea is that many neurons in the NN always remain inactive when a smaller part of the input domain is considered. So, Fairify leverages whitebox access to the models in production and then apply formal analysis based pruning. Our approach adopts input partitioning and then prunes the NN for each partition to provide fairness certification or counterexample. We leveraged interval arithmetic and activation heuristic of the neurons to perform the pruning as necessary. We evaluated Fairify on 25 real-world neural networks collected from four different sources, and demonstrated the effectiveness, scalability and performance over baseline and closely related work. Fairify is also configurable based on the domain and size of the NN. Our novel formulation of the problem can answer targeted verification queries with relaxations and counterexamples, which have practical implications.
+    }
+  }
+kind: conference
+download_link: fairify.pdf
+publication_year: 2023
+tags:
+  - d4
+---
@@ -0,0 +1,24 @@
+---
+key: ICSE-23b
+permalink: /papers/ICSE-23b/
+short_name: ICSE '23
+title: "Decomposing a Recurrent Neural Network into Modules for Enabling Reusability and Replacement"
+bib: |
+  @inproceedings{imtiaz23rnn,
+    author = {Sayem Mohammad Imtiaz and Fraol Batole and Astha Singh and Rangeet Pan and Breno Dantas Cruz and Hridesh Rajan},
+    title = {Decomposing a Recurrent Neural Network into Modules for Enabling Reusability and Replacement},
+    booktitle = {ICSE'23: The 45th International Conference on Software Engineering},
+    location = {Melbourne, Australia},
+    month = {May 14-May 20},
+    year = {2023},
+    entrysubtype = {conference},
+    abstract = {
+      Can we take a recurrent neural network (RNN) trained to translate between languages and augment it to support a new natural language without retraining the model from scratch? Can we fix the faulty behavior of the RNN by replacing portions associated with the faulty behavior? Recent works on decomposing a fully connected neural network (FCNN) and convolutional neural network (CNN) into modules have shown the value of engineering deep models in this manner, which is standard in traditional SE but foreign for deep learning models. However, prior works focus on the image-based multiclass classification problems and cannot be applied to RNN due to (a) different layer structures, (b) loop structures, (c) different types of input-output architectures, and (d) usage of both nonlinear and logistic activation functions. In this work, we propose the first approach to decompose an RNN into modules. We study different types of RNNs, i.e., Vanilla, LSTM, and GRU. Further, we show how such RNN modules can be reused and replaced in various scenarios. We evaluate our approach against 5 canonical datasets (i.e., Math QA, Brown Corpus, Wiki-toxicity, Clinc OOS, and Tatoeba) and 4 model variants for each dataset. We found that decomposing a trained model has a small cost (Accuracy: -0.6%, BLEU score: +0.10%). Also, the decomposed modules can be reused and replaced without needing to retrain.
+    }
+  }
+kind: conference
+download_link: rnn.pdf
+publication_year: 2023
+tags:
+  - mdl
+---