Skip to content

20eddibae/projecthassan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StarterProject

Introduction:

In this project, you will be given a small histology dataset as an opportunity to learn and develop your data processing and deep learning skills in Python within the context of biomedical research. The project aims to emphasize the importance of these skills in medical deep learning research. You will be guided through the process of preprocessing and analyzing the data, as well as training a deep learning model using Python.

To easily share and track your work, consider cloning this repository and creating a private repository. This will give you full control over the repository and ensure that your work is secure. After cloning the repository, add Saeed (saeedhassanpour) and Naofumi (ntomita) as collaborators with a read-role so that we can access and review your work.

Environment setup:

To complete this project, you will need to set up your environment with the required packages. Start by reading the DeepSlide's README for instructions on how to install the necessary packages. You can also find a setup script in the scripts folder if you prefer to use a container. Follow the instructions carefully to ensure that your environment is properly set up and ready for data processing and deep learning tasks.

Hints:

  • If you don't have access to a computer environment suitable for deep learning (e.g., a machine with a reasonably good CPU, 16+ GB memory, ~500GB storage, and optionally an NVIDIA GPU), you can potentially utilize the Discovery cluster.

  • If you encounter errors while creating a Conda environment, try relaxing the version requirements for packages. This may help in resolving compatibility issues and allow the environment to be successfully set up.

Data processing:

The first step in preprocessing the dataset is to download it from Dropbox and unzip it. Then, write a script to copy the slides data to the wsi folder in a format that the scripts/generate_patches.py script can process. This script should take the slides data folder and the meta file data/partition.csv as inputs.

Dataset Link

In the second stage, use the generate_patches.py script to generate patches for model training. This script replaces the code/2_process_patches.py script to reduce computational requirements. Be sure to implement any values indicated as REPLACE_WITH_ACTUAL_VALUE in the script.

Hints:

  • If you believe it's not feasible to process all the slides on your environment, analyze the minimal requirement needed to run a deepslide model for training/validation/testing on the target classes. If you decide to revise the dataset, we suggest updating data/partition.csv and explaining the rationale, pros, and cons in a milestone report.
  • Discovery provides a /scratch space where students can store temporary large data for their projects.

Milestone 1

Please upload your milestone report to briefly describe your progress so far.


Model training:

The model training stage involves using the code/3_train.py script to train a deep learning model on the preprocessed dataset. To ensure proper training, it is important to read the DeepSlide readme and source code to set the appropriate flags.

It is important to keep in mind that all hyperparameters are subject to change in order to improve the performance of the model. When training the model, be prepared to experiment with different hyperparameter values and make adjustments as necessary.

Hints:

  • Carefully review the configurations shown when you start the training process. While most parameters can be left as default values, some may need to be updated to adjust to your dataset. Pay close attention to these settings to ensure the training process is tailored to your specific data.

Milestone 2

Please upload your milestone report to briefly describe your progress so far.


Model evaluation:

To determine the best performing model, use the validation set for evaluation. This will allow you to compare the performance of different models and select the one that performs the best. When evaluating the models, consider using various metrics to get a comprehensive understanding of their performance.

Final Reporting

Please upload your final report to describe your project and progress.

In the reporting stage, provide a brief summary of the following aspects of your project:

  1. Model Evaluation and Analysis: Present a thorough evaluation of the model's performance, including the metrics used to assess its performance and a detailed analysis of the results.

  2. Design Choices and Reasoning: Explain the design choices you made and the reasoning behind them, as well as any options you tried during the model development process.

  3. Performance Improvement Suggestions: Offer suggestions on how to improve the model's performance, including any changes that could be made to the model architecture or training process.

  4. (if any) Improvement Suggestions for DeepSlide Library: Provide suggestions for improvements that could be made to the DeepSlide library (or any scripts provided) to enhance its functionality and usability for future projects. If you have already implemented such improvements, please indicated those here.

Please add your report to your repository in markdown format.

About

Fixing and fine-tuning DeepSlide Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors