-
Notifications
You must be signed in to change notification settings - Fork 0
Presentation
isaackcr edited this page Apr 22, 2023
·
1 revision
Welcome to the CS685FinalProject wiki!
1-1.5 minutes for introducing the problem and what is/was the technical challenge.
- Problem: cancer diagnosis from clinical notes, unstructured data (narrative only)
- get statistic on unstructured data in clinical notes
- Challenge: difficult to get good performance from NLP methods
- data scarcity, availability of de-identified clinical text
- labeled data scarcity
- why is cancer diagnosis important?
- how is cancer diagnosed?
- why do we want to auto classify text?
- why is hard to read text
- propose to use Pretrained models and fine tune on smaller dataset
- clinical text is very long, how to identify important portions
- project started big, had to narrow down
1-1.5 baseline approaches.
- get baselines from other clinical text cancer diagnosis
- look at difficulties from other papers
- show BERT model performance
- explain BERT, explain difference to GPT models
Approximately 5 minutes (i.e., 50% time) for your technical contribution in this work.
- data
- preprocessed data
- baseline / low effort training
Anchit:
- hyperparameter tuning
- data truncation at end (diff chunks, look at data)
- look at training loss / validation loss plots, analyze way to get better results from training
2-3 minutes for sharing your findings and results.