Python Text Processing With Nltk 2 Cookbook: Over 80 practical recipes for using Python's NLTK suite of
libraries to maximize your Natural Language Processing
capabilities.
Download
Introduction
Natural Language Processing is used everywhere—in search engines, spell checkers, mobile
phones, computer games, and even in your washing machine. Python's Natural Language
Toolkit (NLTK) suite of libraries has rapidly emerged as one of the most efficient tools for
Natural Language Processing. You want to employ nothing less than the best techniques in
Natural Language Processing—and this book is your answer.
Python Text Processing with NLTK 2.0 Cookbook is your handy and illustrative guide, which
will walk you through all the Natural Language Processing techniques in a step-by-step
manner. It will demystify the advanced features of text analysis and text mining using the
comprehensive NLTK suite.
This book cuts short the preamble and lets you dive right into the science of text processing
with a practical hands-on approach.
Get started off with learning tokenization of text. Receive an overview of WordNet and how
to use it. Learn the basics as well as advanced features of stemming and lemmatization.
Discover various ways to replace words with simpler and more common (read: more searched)
variants. Create your own corpora and learn to create custom corpus readers for data stored
in MongoDB. Use and manipulate POS taggers. Transform and normalize parsed chunks to
produce a canonical form without changing their meaning. Dig into feature extraction and text
classification. Learn how to easily handle huge amounts of data without any loss in efficiency
or speed.
This book will teach you all that and beyond, in a hands-on learn-by-doing manner. Make
yourself an expert in using the NLTK for Natural Language Processing with this handy
companion.
What this book covers
Chapter 1, Tokenizing Text and WordNet Basics, covers the basics of tokenizing text
and using WordNet.
Chapter 2, Replacing and Correcting Words, discusses various word replacement and
correction techniques. The recipes cover the gamut of linguistic compression, spelling
correction, and text normalization.
Chapter 3, Creating Custom Corpora, covers how to use corpus readers and create
custom corpora. At the same time, it explains how to use the existing corpus data that
comes with NLTK.
Chapter 4, Part-of-Speech Tagging, explains the process of converting a sentence,
in the form of a list of words, into a list of tuples. It also explains taggers, which
are trainable.
Chapter 5, Extracting Chunks, explains the process of extracting short phrases from a
part-of-speech tagged sentence. It uses Penn Treebank corpus for basic training and testing
chunk extraction, and the CoNLL 2000 corpus as it has a simpler and more flexible format
that supports multiple chunk types.
Chapter 6, Transforming Chunks and Trees, shows you how to do various transforms on both
chunks and trees. The functions detailed in these recipes modify data, as opposed to learning
from it.
Chapter 7, Text Classification, describes a way to categorize documents or pieces of text and,
by examining the word usage in a piece of text, classifiers decide what class label should be
assigned to it.
Chapter 8, Distributed Processing and Handling Large Datasets, discusses how to use
execnet to do parallel and distributed processing with NLTK. It also explains how to use the
Redis data structure server/database to store frequency distributions.
Chapter 9, Parsing Specific Data, covers parsing specific kinds of data, focusing primarily on
dates, times, and HTML.
Appendix, Penn Treebank Part-of-Speech Tags, lists a table of all the part-of-speech tags that
occur in the treebank corpus distributed with NLTK.
What you need for this book
In the course of this book, you will need the following software utilities to try out various code
examples listed:
• NLTK
• MongoDB
• PyMongo
• Redis
• redis-py
• execnet
• Enchant
• PyEnchant
• PyYAML
• dateutil
• chardet
• BeautifulSoup
• lxml
• SimpleParse
• mxBase
• lockfile
Who this book is for
This book is for Python programmers who want to quickly get to grips with using the
NLTK for Natural Language Processing. Familiarity with basic text processing concepts
is required. Programmers experienced in the NLTK will find it useful. Students of linguistics
will find it invaluable.
Home Web Development Python Text Processing With Nltk 2 Cookbook