Skip to content

Presentation

isaackcr edited this page Apr 22, 2023 · 1 revision

Welcome to the CS685FinalProject wiki!

1-1.5 minutes for introducing the problem and what is/was the technical challenge.

  • Problem: cancer diagnosis from clinical notes, unstructured data (narrative only)
    • get statistic on unstructured data in clinical notes
  • Challenge: difficult to get good performance from NLP methods
    • data scarcity, availability of de-identified clinical text
    • labeled data scarcity
    • why is cancer diagnosis important?
    • how is cancer diagnosed?
    • why do we want to auto classify text?
    • why is hard to read text
  • propose to use Pretrained models and fine tune on smaller dataset
  • clinical text is very long, how to identify important portions
  • project started big, had to narrow down

1-1.5 baseline approaches.

  • get baselines from other clinical text cancer diagnosis
  • look at difficulties from other papers
  • show BERT model performance
  • explain BERT, explain difference to GPT models

Approximately 5 minutes (i.e., 50% time) for your technical contribution in this work.

  • data
  • preprocessed data
  • baseline / low effort training

Anchit:

  • hyperparameter tuning
  • data truncation at end (diff chunks, look at data)
  • look at training loss / validation loss plots, analyze way to get better results from training

2-3 minutes for sharing your findings and results.

Clone this wiki locally