Natural Language Processing with Python
Download
Introduction
This is a book about Natural Language Processing. By “natural language” we mean a
language that is used for everyday communication by humans; languages such as English,
Hindi, or Portuguese. In contrast to artificial languages such as programming languages
and mathematical notations, natural languages have evolved as they pass from
generation to generation, and are hard to pin down with explicit rules. We will take
Natural Language Processing—or NLP for short—in a wide sense to cover any kind of
computer manipulation of natural language. At one extreme, it could be as simple as
counting word frequencies to compare different writing styles. At the other extreme,
NLP involves “understanding” complete human utterances, at least to the extent of
being able to give useful responses to them.
Technologies based on NLP are becoming increasingly widespread. For example,
phones and handheld computers support predictive text and handwriting recognition;
web search engines give access to information locked up in unstructured text; machine
translation allows us to retrieve texts written in Chinese and read them in Spanish. By
providing more natural human-machine interfaces, and more sophisticated access to
stored information, language processing has come to play a central role in the multilingual
information society.
This book provides a highly accessible introduction to the field of NLP. It can be used
for individual study or as the textbook for a course on natural language processing or
computational linguistics, or as a supplement to courses in artificial intelligence, text
mining, or corpus linguistics. The book is intensely practical, containing hundreds of
fully worked examples and graded exercises.
The book is based on the Python programming language together with an open source
library called the Natural Language Toolkit (NLTK). NLTK includes extensive software,
data, and documentation, all freely downloadable from http://www.nltk.org/.
Distributions are provided for Windows, Macintosh, and Unix platforms. We strongly
encourage you to download Python and NLTK, and try out the examples and exercises
along the way.
What this book covers
Chapter 1, Language Processing and Python
Chapter 2, Accessing Text Corpora and Lexical Resources
Chapter 3, Processing Raw Text
Chapter 4, Writing Structured Programs
Chapter 5, Categorizing and Tagging Words
Chapter 6, Learning to Classify Text
Chapter 7, Extracting Information from Text
Chapter 8, Analyzing Sentence Structure
Chapter 9, Building Feature-Based Grammars
Chapter 10, Analyzing the Meaning of Sentences
Chapter 11, Managing Linguistic Data