about

Hello

I am a second year Biostatistics PhD student at the University of Washington. I am interested the intersection of categorical and longitudinal data, and issues in high-dimensional space, especially in the context of Electronic Medical Records (EMR). Right now I work with Patrick Heagerty as a Research Assistant on statistical Natural Language Processing (NLP) in radiology text reports.

Translating Bioinformatics to Biostatistics

We like each other, and our work overlap often. However some terms they use, we just don't understand. Here are some commonly used Bioinformatics terms and their equivalent definitions in statistics:

Precision

Positve Predictive Value (PPV)

Recall

Sensitivity

F-score (F1)

weighted average of precision and recall. Ranges between 0 and 1 with 1 being the best score.

What happens when you say no?

a.k.a. the effect of negation in Error Analysis

This is how a classical 2 by 2 table looks like:

If there is negation, but we don't capture it, here's an example of how our naive perception of a dataset might look like:

When in reality, this table represents what our dataset REALLY looks like:

Some notes and justifications about this phenomena:

Number of diseased/no diseased (Gold Standard) stays constant as it is predetermined by the Gold Standard.
Number of test positive/test negative (Gold Standard) changes in the presences of a good negation detection algorithm.
The good negation detection algorithm moves outcomes in FP cell to TN cell.
Thus a good negation detection algorithm will increase predictability of model in general.

Statistical Natural Language Processing in Radiology Reports

This webpage is created to document and share our progress in Radiology NLP.

Hello

Translating Bioinformatics to Biostatistics

What happens when you say no?