Skip to content

Commit 295bcfd

Browse files
committed
update member+project+paper
1 parent 463a192 commit 295bcfd

File tree

16 files changed

+287
-64
lines changed

16 files changed

+287
-64
lines changed

.DS_Store

4 KB
Binary file not shown.

_data/grants.yml

Lines changed: 64 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -610,4 +610,67 @@
610610
In today’s software-centric world, ultra-large-scale software repositories, e.g. GitHub, with hundreds of thousands of projects each, are the new library of Alexandria. They contain an enormous corpus of software and information about software. Scientists and engineers alike are interested in analyzing this wealth of information both for curiosity as well as for testing important research hypotheses. However, the current barrier to entry is prohibitive and only a few with well-established infrastructure and deep expertise can attempt such ultra-large-scale analysis. Necessary expertise includes: programmatically accessing version control systems, data storage and retrieval, data mining, and parallelization. The need to have expertise in these four different areas significantly increases the cost of scientific research that attempts to answer research questions involving ultra-large-scale software repositories. As a result, experiments are often not replicable, and reusability of experimental infrastructure low. Furthermore, data associated and produced by such experiments is often lost and becomes inaccessible and obsolete, because there is no systematic curation. Last but not least, building analysis infrastructure to process ultra-large-scale data efficiently can be very hard.
611611
612612
This project will continue to enhance the CISE research infrastructure called Boa to aid and assist with such research. This next version of Boa will be called Boa 2.0 and it will continue to be globally disseminated. The project will further develop the programming language also called Boa, that can hide the details of programmatically accessing version control systems, data storage and retrieval, data mining, and parallelization from the scientists and engineers and allow them to focus on the program logic. The project will also enhance the data mining infrastructure for Boa, and a BIGDATA repository containing millions of open source project for analyzing ultra-large-scale software repositories to help with such experiments. The project will integrate Boa 2.0 with the Center for Open Science Open Science Framework (OSF) to improve reproducibility and with the national computing resource XSEDE to improve scalability. The broader impacts of Boa 2.0 stem from its potential to enable developers, designers and researchers to build intuitive, multi-modal, user-centric, scientific applications that can aid and enable scientific research on individual, social, legal, policy, and technical aspects of open source software development. This advance will primarily be achieved by significantly lowering the barrier to entry and thus enabling a larger and more ambitious line of data-intensive scientific discovery in this area.
613-
613+
- key: grant-nsf-2223812
614+
agency: NSF
615+
primary: true
616+
title: "SHF:Small: More Modular Deep Learning"
617+
start_date: 2022-10-01 #Roughly
618+
url: "https://www.nsf.gov/awardsearch/showAward?AWD_ID=2223812&HistoricalAwards=false"
619+
amount: $580,000.00
620+
PI: Hridesh Rajan
621+
coPIs:
622+
end_date: 2025-09-30 #Roughly
623+
abstract: >
624+
This project will study a class of machine learning algorithms known as deep learning
625+
that has received much attention in academia and industry. Deep learning has a large
626+
number of important societal applications, from self-driving cars to question-answering
627+
systems such as Siri and Alexa. A deep learning algorithm uses multiple layers of
628+
transformation functions to convert inputs to outputs, each layer learning higher-level
629+
of abstractions in the data successively. The availability of large datasets has made it
630+
feasible to train deep learning models. Since the layers are organized in the form of a
631+
network, such models are also referred to as deep neural networks (DNN). While the jury
632+
is still out on the impact of deep learning on the overall understanding of software's
633+
behavior, a significant uptick in its usage and applications in wide-ranging areas and
634+
safety-critical systems, e.g., autonomous driving, aviation system, medical analysis,
635+
etc., combine to warrant research on software engineering practices in the presence of
636+
deep learning. One challenge is to enable the reuse and replacement of the parts of a
637+
DNN that has the potential to make DNN development more reliable. This project will
638+
investigate a comprehensive approach to systematically investigate the decomposition of
639+
deep neural networks into modules to enable reuse, replacement, and independent evolution
640+
of those modules. A module is an independent part of a software system that can be tested,
641+
validated, or utilized without a major change to the rest of the system. Allowing the
642+
reuse of DNN modules is expected to reduce energy and data intensive training efforts
643+
to construct DNN models. Allowing replacement is expected to help replace faulty
644+
functionality in DNN models without needing costly retraining steps.
645+
646+
The preliminary work of the investigator has shown that it is possible to decompose fully
647+
connected neural networks and CNN models into modules and conceptualize the notion of
648+
modules. The main goals and the intellectual merits of this project are to further expand
649+
this decomposition approach along three dimensions: (1) Does the decomposition approach
650+
generalize to large Natural Language Processing (NLP) models, where a huge reduction in CO2e
651+
emission is expected? (2) What criteria should be used for decomposing a DNN into modules?
652+
A better understanding of the decomposition criteria can help inform the design and
653+
implementation of DNNs and reduce the impact of changes. (3) While coarse-grained
654+
decomposition has worked well for FCNNs and CNNs, does a finer-grained decomposition of
655+
DNNs into modules connected using AND-OR-NOT primitives a la structured decomposition has
656+
the potential to both enable more reuse (especially for larger DNNs) and provide deeper
657+
insights into the behavior of DNNs? The project also incorporates a rigorous evaluation plan
658+
using widely studied datasets. The project is expected to broadly impact society by informing
659+
the science and practice of deep learning. A serious problem facing the current software
660+
development workforce is that deep learning is widely utilized in our software systems, but
661+
scientists and practitioners do not yet have a clear handle on critical problems such as
662+
explainability of DNN models, DNN reuse, replacement, independent testing, and independent
663+
development. There was no apparent need to investigate the notions of modularity as neural
664+
network models trained before the deep learning era were mostly small, trained on small
665+
datasets, and were mostly used as experimental features. The notion of DNN modules developed
666+
by this project, if successful, could help make significant advances on a number of open
667+
challenges in this area. DNN modules could enable the reuse of already trained DNN modules in
668+
another context. Viewing a DNN as a composition of DNN modules instead of a black box could
669+
enhance the explainability of a DNN's behavior. This project, if successful, will thus have a
670+
large positive impact on the productivity of these programmers, the understandability and
671+
maintainability of the DNN models that they deploy, and the scalability and correctness of
672+
software systems that they produce. Other impacts will include: research-based advanced
673+
training as well as enhancement in experimental and system-building expertise of future
674+
computer scientists, incorporation of research results into courses at Iowa State University
675+
as well as facilitating the integration of modularity research-related topics, and increased
676+
opportunities for the participation of underrepresented groups in research-based training.

_data/members.yml

Lines changed: 12 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,6 @@
3131
site: https://www.cs.iastate.edu/people/shibbir-ahmed
3232
img: sahmed.jpg
3333

34-
- name: Muhammad Arshad
35-
status: PhD
36-
email: arbab@iastate.edu
37-
site: https://www.cs.iastate.edu/people/arbab
38-
img: arbab.jpg
39-
4034
- name: Fraol Batole
4135
status: PhD
4236
email: fraol@iastate.edu
@@ -51,21 +45,27 @@
5145

5246
- name: Sayem Imtiaz
5347
status: PhD
54-
email: liyp0095@iastate.edu
48+
email: sayem@iastate.edu
5549
site: https://www.cs.iastate.edu/people/sayem-mohammad-imtiaz
5650
img: simtiaz.jpg
57-
58-
- name: David OBrien
51+
52+
- name: Ruchira Manke
5953
status: PhD
60-
email: dobrien@iastate.edu
61-
site: https://davidmobrien.github.io/
62-
img: david.png
54+
email: rmanke@iastate.edu
55+
site: https://tads.research.iastate.edu/people/ruchira-manke
56+
img: ruchira.jpg
6357

6458
- name: Giang Nguyen
6559
status: PhD
6660
email: gnguyen@iastate.edu
6761
site: https://www.cs.iastate.edu/gnguyen
6862
img: giang.jpeg
63+
64+
- name: David OBrien
65+
status: PhD
66+
email: dobrien@iastate.edu
67+
site: https://davidmobrien.github.io/
68+
img: david.png
6969

7070
- name: Astha Singh
7171
status: PhD
@@ -87,10 +87,4 @@
8787

8888
# Master's Students:
8989

90-
- name: Ruchira Manke
91-
status: MS
92-
email: rmanke@iastate.edu
93-
site: https://tads.research.iastate.edu/people/ruchira-manke
94-
img: ruchira.jpg
95-
9690
# Bachelor's Students:

_papers/ESEC-FSE-20b/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,5 +38,5 @@ kind: conference
3838
download_link: modularity.pdf
3939
publication_year: 2020
4040
tags:
41-
- boa
41+
- mdl
4242
---
-455 KB
Binary file not shown.

_papers/ICSE-22b/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,5 +20,5 @@ kind: conference
2020
download_link: cnnmodularity.pdf
2121
publication_year: 2022
2222
tags:
23-
- boa
23+
- mdl
2424
---

_papers/ICSE-23a/fairify.pdf

474 KB
Binary file not shown.

_papers/ICSE-23a/index.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
key: ICSE-23a
3+
permalink: /papers/ICSE-23a/
4+
short_name: ICSE '23
5+
title: "Fairify: Fairness Verification of Neural Networks"
6+
bib: |
7+
@inproceedings{biswas23fairify,
8+
author = {Sumon Biswas and Hridesh Rajan},
9+
title = {Fairify: Fairness Verification of Neural Networks},
10+
booktitle = {ICSE'23: The 45th International Conference on Software Engineering},
11+
location = {Melbourne, Australia},
12+
month = {May 14-May 20},
13+
year = {2023},
14+
entrysubtype = {conference},
15+
abstract = {
16+
Fairness of machine learning (ML) software has become a major concern in the recent past. Although recent research on testing and improving fairness have demonstrated impact on real-world software, providing fairness guarantee in practice is still lacking. Certification of ML models is challenging because of the complex decision-making process of the models. In this paper, we proposed Fairify, an SMT-based approach to verify individual fairness property in neural network (NN) models. Individual fairness ensures that any two similar individuals get similar treatment irrespective of their protected attributes e.g., race, sex, age. Verifying this fairness property is hard because of the global checking and non-linear computation nodes in NN. We proposed sound approach to make individual fairness verification tractable for the developers. The key idea is that many neurons in the NN always remain inactive when a smaller part of the input domain is considered. So, Fairify leverages whitebox access to the models in production and then apply formal analysis based pruning. Our approach adopts input partitioning and then prunes the NN for each partition to provide fairness certification or counterexample. We leveraged interval arithmetic and activation heuristic of the neurons to perform the pruning as necessary. We evaluated Fairify on 25 real-world neural networks collected from four different sources, and demonstrated the effectiveness, scalability and performance over baseline and closely related work. Fairify is also configurable based on the domain and size of the NN. Our novel formulation of the problem can answer targeted verification queries with relaxations and counterexamples, which have practical implications.
17+
}
18+
}
19+
kind: conference
20+
download_link: fairify.pdf
21+
publication_year: 2023
22+
tags:
23+
- d4
24+
---

_papers/ICSE-23b/index.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
key: ICSE-23b
3+
permalink: /papers/ICSE-23b/
4+
short_name: ICSE '23
5+
title: "Decomposing a Recurrent Neural Network into Modules for Enabling Reusability and Replacement"
6+
bib: |
7+
@inproceedings{imtiaz23rnn,
8+
author = {Sayem Mohammad Imtiaz and Fraol Batole and Astha Singh and Rangeet Pan and Breno Dantas Cruz and Hridesh Rajan},
9+
title = {Decomposing a Recurrent Neural Network into Modules for Enabling Reusability and Replacement},
10+
booktitle = {ICSE'23: The 45th International Conference on Software Engineering},
11+
location = {Melbourne, Australia},
12+
month = {May 14-May 20},
13+
year = {2023},
14+
entrysubtype = {conference},
15+
abstract = {
16+
Can we take a recurrent neural network (RNN) trained to translate between languages and augment it to support a new natural language without retraining the model from scratch? Can we fix the faulty behavior of the RNN by replacing portions associated with the faulty behavior? Recent works on decomposing a fully connected neural network (FCNN) and convolutional neural network (CNN) into modules have shown the value of engineering deep models in this manner, which is standard in traditional SE but foreign for deep learning models. However, prior works focus on the image-based multiclass classification problems and cannot be applied to RNN due to (a) different layer structures, (b) loop structures, (c) different types of input-output architectures, and (d) usage of both nonlinear and logistic activation functions. In this work, we propose the first approach to decompose an RNN into modules. We study different types of RNNs, i.e., Vanilla, LSTM, and GRU. Further, we show how such RNN modules can be reused and replaced in various scenarios. We evaluate our approach against 5 canonical datasets (i.e., Math QA, Brown Corpus, Wiki-toxicity, Clinc OOS, and Tatoeba) and 4 model variants for each dataset. We found that decomposing a trained model has a small cost (Accuracy: -0.6%, BLEU score: +0.10%). Also, the decomposed modules can be reused and replaced without needing to retrain.
17+
}
18+
}
19+
kind: conference
20+
download_link: rnn.pdf
21+
publication_year: 2023
22+
tags:
23+
- mdl
24+
---

_papers/ICSE-23b/rnn.pdf

781 KB
Binary file not shown.

0 commit comments

Comments
 (0)