Project 2 (BNs)

XM_0059 - Knowledge Representation

Group 42: Nathan Jones (2762057), Ilse Feenstra (2608345) & Yauheniya Makarevich (2772189)

Program Overview

This is the source code accompanying the report entitled "Assessing the Risk of a Stroke for an Unhealthy Lifestyle Using Bayesian Networks".

Requirements

This project was implemented using Python 3. The list of required packages can be found in requirements.txt. To install them, run the following command from the root directory of this project (preferably in a venv).

pip install -r requirements.txt

Running the Code

The experiment described in the accompanying paper can be reproduced by running the following commands in order (the first step can be omitted if you wish to use the existing test set):

python gen_test_set.py

python experiment.py

python plot_results.py

For the last two commands, some additional command-line arguments can be used. To see what is available run --help after one of the last two commands.

The unit tests written for the BNReasoner class can be exectued by running:

pytest test.py

The inferences made on the modelled use case of strokes can be executed by running:

python experiment_stroke.py

Key Source Files

BaysNet.py
This class was extended with several other methods for querying the structure of the network (leaf nodes, reachable nodes etc.) as well as methods for randomly sampling network variables (randQe).

BNReasoner.py
Contains all the required algorithms for reasoning over BNs.

test.py
Unit tests for the functionality implemented in the BNReasoner class.

experiment.py
The driver code for running tests on the test set as well as generating and saving results.

experiment_stroke.py
The inferences made on our modelled use case.

gen_test_set.py
Generates the test set, consisting of random BNs according to the settings provided.

plot_results.py
Plots the results generated by experiment.py

Useful Pointers for Assignment 2 of KR21

BIFXML file format

The BIFXML file format is meant to provide an easy means of exchanging Bayesian networks. It works with standard XML tags. The detailed description of the format can be found at http://www.cs.cmu.edu/afs/cs/user/fgcozman/www/Research/InterchangeFormat/. Please read this carefully. Note that for our purposes it will be enough to only have nodes of type "nature", and we will not need the <Property> tag.

Be aware of the order of the values in the probability table tag. They should be ordered in a way that corresponds to boolean counting where the "For" variable is the least significant bit followed by the "Given" variables from bottom to top. Here an example of what this is supposed to mean. Let's assume we have the following CPT table in the BIFXML file:

<DEFINITION>
	<FOR>dog-out</FOR>
	<GIVEN>bowel-problem</GIVEN>
	<GIVEN>family-out</GIVEN>
	<TABLE>0.99 0.01 0.97 0.03 0.9 0.1 0.3 0.7 </TABLE>
</DEFINITION>

Then the order of the variables in the table (from left to right) is: "bowel-problem", "family-out", "dog-out". Filling in the values, this leads to the following table:

bowel-problem	family-out	dog-out	p
False	False	False	0.99
False	False	True	0.01
False	True	False	0.97
False	True	True	0.03
True	False	False	0.9
True	False	True	0.1
True	True	False	0.3
True	True	True	0.7

Table 1: CPT of the "dog-out" node of the "dog_problem.BIFXML" example

The BayesNet Class

Among the files of this project you can find the BayesNet class. We provided you with this class, so you don't need to worry much about the data structure in which the BN can be represented. This class provides you with (hopefully) useful functions for BNs such as loading them from a file and retrieving the CPT of a variable. We highly recommend using it. Of course, you are also free to implement your own methods and change the existing ones if they don't fit your purpose. For this class to work you will need to install (either with pip or anaconda) the following packages: networkx, matplotlib, pgmpy, pandas (see also requirements.txt).

Internally, the graphical structure of the BN is represented as a DiGraph object from the networkx package. The CPTs are modelled with DataFrames form the Pandas package. Each CPT is a DataFrame which corresponds to the form as is shown in Table 1.

We want to point out some methods which we deem especially useful here, but all methods come with their documentation in the comments fo the methods.

get_compatible_instantiations_table(instantiation, cpt): This method takes an instantiation as a pandas series and a CPT as a pandas DataFrame. It checks which rows of the provided CPT are compatible with the instantiation and returns only those rows.
reduce_factor(instantiation, cpt): This method takes an instantiation as a pandas series and a CPT as a pandas DataFrame. It returns a new CPT in which all the rows that are incompatible with the instantiation are set to 0.
get_interaction_graph(): Returns a new, undirected Graph object which corresponds to the interaction graph of the current BN.
draw_structure(): Plots the graph structure of the current BN.

Other useful methods

There are also a few other methods which might turn out useful during the implementation of this project. Note that they are completely optional to use, and it might well be the case that your implementation will work well even without them. We also provide them as some are used int the BayesNet class.

Pandas methods

dog_out_CPT['p'] returns the 'p' column of the dog_out_CPT DataFrame as a pandas Series. (0.99 0.01 0.97 0.03 0.9 0.1 0.3 0.7)
dog_out_CPT['p'].max() returns the maximum of the probability values. (0.99)
dog_out_CPT['family-out'] == dog_out_CPT['bowel-problem'] creates a pandas Series which is true for every index in which "bowel-problem" and "family-out" is the same in the dog_out_CPT. (True, True, False, False, False, False, True, True). The arguments in the brackets can also be lists of column names in which case multiple columns are compared.
dog_out_CPT.iloc[2] returns the third row of the DataFrame. (False, True, False, 0.97)
dog_out_CPT.loc[dog_out_CPT['family-out'] == dog_out_CPT['bowel-problem']] returns all rows of the CPT in which 'bowel-problem' == 'family-out'. (rows 0, 1, 6 and 7)
dog_out_CPT.loc[dog_out_CPT['family-out'] == dog_out_CPT['bowel-problem'], 'p'] = 0.0 sets the 'p' value of the above mentioned rows to 0.0. This usage of loc[row(s), column(s)] corresponds to direct access to a cell or subtable.
pd.Series({'Winter?': True, 'Sprinkler?': False}) creates a pandas Series in which 'Winter?' is set to True and 'Sprinkler?' is set to False. This is a useful format for passing evidence to some methods of this assignment
dog_out_CPT.iterrows() provides an iterator through all the rows of a DataFrame. This is useful for using in loops. Beware that the returned value is always a tuple of (row_number, row_content).

Networkx methods

It is likely, that you will not have to use this package at all. One possible methods that could be useful is networkx.neighbors(G, var) which provides a list of all neighbors of the variable 'var' in the graph 'G'.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
plots		plots
testing		testing
.gitignore		.gitignore
BNReasoner.py		BNReasoner.py
BayesNet.py		BayesNet.py
README.md		README.md
experiment.py		experiment.py
experiment_stroke.py		experiment_stroke.py
gen_test_set.py		gen_test_set.py
notes.py		notes.py
plot_results.py		plot_results.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
results.csv		results.csv
results2.csv		results2.csv
results3.csv		results3.csv
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project 2 (BNs)

XM_0059 - Knowledge Representation

Program Overview

Requirements

Running the Code

Key Source Files

Useful Pointers for Assignment 2 of KR21

BIFXML file format

Table 1: CPT of the "dog-out" node of the "dog_problem.BIFXML" example

The BayesNet Class

Other useful methods

Pandas methods

Networkx methods

About

Uh oh!

Releases

Packages

Languages

natmaxjon/KR2022_Project2

Folders and files

Latest commit

History

Repository files navigation

Project 2 (BNs)

XM_0059 - Knowledge Representation

Program Overview

Requirements

Running the Code

Key Source Files

Useful Pointers for Assignment 2 of KR21

BIFXML file format

Table 1: CPT of the "dog-out" node of the "dog_problem.BIFXML" example

The BayesNet Class

Other useful methods

Pandas methods

Networkx methods

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages