You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _data/grants.yml
+64-1Lines changed: 64 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -610,4 +610,67 @@
610
610
In today’s software-centric world, ultra-large-scale software repositories, e.g. GitHub, with hundreds of thousands of projects each, are the new library of Alexandria. They contain an enormous corpus of software and information about software. Scientists and engineers alike are interested in analyzing this wealth of information both for curiosity as well as for testing important research hypotheses. However, the current barrier to entry is prohibitive and only a few with well-established infrastructure and deep expertise can attempt such ultra-large-scale analysis. Necessary expertise includes: programmatically accessing version control systems, data storage and retrieval, data mining, and parallelization. The need to have expertise in these four different areas significantly increases the cost of scientific research that attempts to answer research questions involving ultra-large-scale software repositories. As a result, experiments are often not replicable, and reusability of experimental infrastructure low. Furthermore, data associated and produced by such experiments is often lost and becomes inaccessible and obsolete, because there is no systematic curation. Last but not least, building analysis infrastructure to process ultra-large-scale data efficiently can be very hard.
611
611
612
612
This project will continue to enhance the CISE research infrastructure called Boa to aid and assist with such research. This next version of Boa will be called Boa 2.0 and it will continue to be globally disseminated. The project will further develop the programming language also called Boa, that can hide the details of programmatically accessing version control systems, data storage and retrieval, data mining, and parallelization from the scientists and engineers and allow them to focus on the program logic. The project will also enhance the data mining infrastructure for Boa, and a BIGDATA repository containing millions of open source project for analyzing ultra-large-scale software repositories to help with such experiments. The project will integrate Boa 2.0 with the Center for Open Science Open Science Framework (OSF) to improve reproducibility and with the national computing resource XSEDE to improve scalability. The broader impacts of Boa 2.0 stem from its potential to enable developers, designers and researchers to build intuitive, multi-modal, user-centric, scientific applications that can aid and enable scientific research on individual, social, legal, policy, and technical aspects of open source software development. This advance will primarily be achieved by significantly lowering the barrier to entry and thus enabling a larger and more ambitious line of data-intensive scientific discovery in this area.
title: "Fairify: Fairness Verification of Neural Networks"
6
+
bib: |
7
+
@inproceedings{biswas23fairify,
8
+
author = {Sumon Biswas and Hridesh Rajan},
9
+
title = {Fairify: Fairness Verification of Neural Networks},
10
+
booktitle = {ICSE'23: The 45th International Conference on Software Engineering},
11
+
location = {Melbourne, Australia},
12
+
month = {May 14-May 20},
13
+
year = {2023},
14
+
entrysubtype = {conference},
15
+
abstract = {
16
+
Fairness of machine learning (ML) software has become a major concern in the recent past. Although recent research on testing and improving fairness have demonstrated impact on real-world software, providing fairness guarantee in practice is still lacking. Certification of ML models is challenging because of the complex decision-making process of the models. In this paper, we proposed Fairify, an SMT-based approach to verify individual fairness property in neural network (NN) models. Individual fairness ensures that any two similar individuals get similar treatment irrespective of their protected attributes e.g., race, sex, age. Verifying this fairness property is hard because of the global checking and non-linear computation nodes in NN. We proposed sound approach to make individual fairness verification tractable for the developers. The key idea is that many neurons in the NN always remain inactive when a smaller part of the input domain is considered. So, Fairify leverages whitebox access to the models in production and then apply formal analysis based pruning. Our approach adopts input partitioning and then prunes the NN for each partition to provide fairness certification or counterexample. We leveraged interval arithmetic and activation heuristic of the neurons to perform the pruning as necessary. We evaluated Fairify on 25 real-world neural networks collected from four different sources, and demonstrated the effectiveness, scalability and performance over baseline and closely related work. Fairify is also configurable based on the domain and size of the NN. Our novel formulation of the problem can answer targeted verification queries with relaxations and counterexamples, which have practical implications.
title: "Decomposing a Recurrent Neural Network into Modules for Enabling Reusability and Replacement"
6
+
bib: |
7
+
@inproceedings{imtiaz23rnn,
8
+
author = {Sayem Mohammad Imtiaz and Fraol Batole and Astha Singh and Rangeet Pan and Breno Dantas Cruz and Hridesh Rajan},
9
+
title = {Decomposing a Recurrent Neural Network into Modules for Enabling Reusability and Replacement},
10
+
booktitle = {ICSE'23: The 45th International Conference on Software Engineering},
11
+
location = {Melbourne, Australia},
12
+
month = {May 14-May 20},
13
+
year = {2023},
14
+
entrysubtype = {conference},
15
+
abstract = {
16
+
Can we take a recurrent neural network (RNN) trained to translate between languages and augment it to support a new natural language without retraining the model from scratch? Can we fix the faulty behavior of the RNN by replacing portions associated with the faulty behavior? Recent works on decomposing a fully connected neural network (FCNN) and convolutional neural network (CNN) into modules have shown the value of engineering deep models in this manner, which is standard in traditional SE but foreign for deep learning models. However, prior works focus on the image-based multiclass classification problems and cannot be applied to RNN due to (a) different layer structures, (b) loop structures, (c) different types of input-output architectures, and (d) usage of both nonlinear and logistic activation functions. In this work, we propose the first approach to decompose an RNN into modules. We study different types of RNNs, i.e., Vanilla, LSTM, and GRU. Further, we show how such RNN modules can be reused and replaced in various scenarios. We evaluate our approach against 5 canonical datasets (i.e., Math QA, Brown Corpus, Wiki-toxicity, Clinc OOS, and Tatoeba) and 4 model variants for each dataset. We found that decomposing a trained model has a small cost (Accuracy: -0.6%, BLEU score: +0.10%). Also, the decomposed modules can be reused and replaced without needing to retrain.
0 commit comments