DataLake_Backend

The project uses Spring Boot framework and Hibernate for database connection.

Controller classes are the gateway for API calls from the frontend. These calls are passed on to the services layer where the business rules are located. When it comes to expanding and/or changing the code, we would recommend looking at the appropriate service class.

Dataset header combination for different datasets is made in the two switch cases located in the services class 'PatientService'. Dataset value combination and definition are being made in the entity class of the respective node. For example, the patient node class can be found in Entity/Nodes/Patient.java and modifications can be made here. The name of the dataset is also used in some cases to make a correction when two or more datasets have different meanings for the same value.

The pom.xml file contains all the dependencies the project uses, along with their respective versions. By loading Maven, these dependencies should be applied automatically. To run the project, execute the "DataLakeBackendApplication" function located in the "DataLakeBackendApplication" class.

The code contains Javadocs, and we have added extra comments where we felt they were needed. We have also included a short presentation that might help with understanding the application and its purpose.

The dataset we have used for development and testing purposes has been provided by "Project Data Sphere". Because these datasets are not public and have been provided to us through a user agreement, we have decided not to publicly release them on this GitHub. The access link for these datasets is provided below.

https://data.projectdatasphere.org/projectdatasphere/html/content/261 From dataset 261 was "c9732_demographic" used for the patient data and "c9732_ae" was used for symptoms

https://data.projectdatasphere.org/projectdatasphere/html/content/266 From dataset 266 was "c_chemo" as well as "a_eendpt" used for patient and "c_ae" was used for symptoms

Currently the code is formed to handle the datasets with the respectable names due to different column values meaning different things, so keep in mind to keep the names "261" and "266" or change it in the code, this is currently only implemented in Entity -> Nodes -> OverAllSurvivalStatus

PS: We believe that a greater number of connections could be made between nodes from different datasets if the imports from the dataset were made to be case-insensitive. The change should be rather simple and could be a good starting point for learning how to modify the application.

Author infromation:
Viktor Lindström Söraas, viktor.lindstrom00@gmail.com
Kasper Lindström, Onsik1998@gmail.com
Nonno Rydgren, nonnorydgren@gmail.com
Jonathan Lindqvist, linkanjontes@gmail.com

If you have any questions, feel free to send an email.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.mvn/wrapper		.mvn/wrapper
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Data lake presentation.pdf		Data lake presentation.pdf
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DataLake_Backend

About

Uh oh!

Languages

KTHHealthData/DataLake_Backend

Folders and files

Latest commit

History

Repository files navigation

DataLake_Backend

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages