Who are we?


We are a group of (bio)statisticians working on Statistical Natural Language Processing (NLP) in medical settings, specifically radiology NLP.



Why is it important?


Radiology data is often presented in chunks of texts, like this:



There is mild chondral thinning noted along the peripheral margin of the acetabulum with minimal subchondral cystic change. There is a prominent bony ridge located along the peripheral margin of the left femoral neck with a small subcortical synovial herniation pit. There is an extensive linear pattern tear of the acetabular labrum extending from the lateral quadrant into the anterior quadrant of the left hip. This represents an extensive area of chondrolabral separation with linear fluid signal extending deep to the acetabular labrum. No adjacent paralabral cyst. There is a moderate volume hip effusion but no intra-articular body noted.


Source: National Radiology



This is not a very useful format for data analysis, since we have no defined variables!

At the same time radiology reports contain a lot of useful information, for example, in stratifying our sample. We desire to navigate this data dimension to facilitate our subgrouping of population.