Tools

Lucene
Information retrieval (IR): reverse indexing system to do google-like search of "large corpus of documents" Entity/concept recognition True NLP

NLTK
True natural language understanding: Tagging, Chunking, Named-entity recognition

Stanford NLP tools
Statistical NLP toolkit

NegEx
Negation in radiology

MALLET
Java based Statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text

MetaMap
Matches strings in free text to biomedical concepts in UMLS through concept identification, to create structured data for classification/categorization etc.

Lingpipe
Computational linguistics i.e. find names of things in news, classify twitter search results, suggest correct spellings of queries Seems pretty robust with statistical analysis


Datasets/data banks

UMLS
Large medical thesaurus and associate lexical software tools

JAMA special issues on i2b2 challenges
Describe quantitative evaluations of a number of different approaches to specific NLP tasks. Concept extraction from clinical text.


Books

Natural Language Processing with Python (basics)

Authors: Bird, Steven; Klein, Ewan; Loper, Edward

Amazon summary: This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation. With it, you'll learn how to write Python programs that work with large collections of unstructured text. You'll access richly annotated datasets using a comprehensive range of linguistic data structures, and you'll understand the main algorithms for analyzing the content and structure of written communication.

Speech and Language Processing ("Bible")

Authors: Jurafsky, Daniel; Martin, James H.

Amazon summary: An explosion of Web-based language techniques, merging of distinct fields, availability of phone-based dialogue systems, and much more make this an exciting time in speech and language processing. The first of its kind to thoroughly cover language technology – at all levels and with all modern technologies – this book takes an empirical approach to the subject, based on applying statistical and other machine-learning algorithms to large corporations. Builds each chapter around one or more worked examples demonstrating the main idea of the chapter, usingthe examples to illustrate the relative strengths and weaknesses of various approaches. Adds coverage of statistical sequence labeling, information extraction, question answering and summarization, advanced topics in speech recognition, speech synthesis. Revises coverage of language modeling, formal grammars, statistical parsing, machine translation, and dialog processing. A useful reference for professionals in any of the areas of speech and language processing.