Natural Language Toolkit

...software, data sets and tutorials for natural language processing...

Code

 

From NLTK

Jump to: navigation, search

NLTK includes the following software modules (~120k lines of Python code):

Corpus readers
interfaces to many Corpora
Tokenizers
whitespace, newline, blankline, word, wordpunct, treebank, sexpr, regexp, Punkt sentence segmenter
Stemmers
Porter, Lancaster, regexp
Taggers
regexp, n-gram, backoff, Brill, HMM
Parsers
recursive descent, shift-reduce, chunk, chart, feature-based, probabilistic, ...
Semantic interpretation
untyped lambda calculus, first-order models, parser interface
Wordnet
wordnet interface, lexical relations, similarity, interactive browser
Classifiers
decision tree, maximum entropy, naive Bayes, Weka interface, megam
Clusterers
expectation maximization, agglomerative, k-means
Evaluation
accuracy, precision, recall, windowdiff
Estimation
uniform, maximum likelihood, Lidstone, Laplace, expected likelihood, heldout, cross-validation, Good-Turing, Witten-Bell
Miscellaneous
feature detection, unification, chatbots, many utilities
NLTK-Contrib (less mature)
categorial grammar (Lambek, CCG), dependency parser, finite-state automata, glue semantics, hole semantics, hadoop (MapReduce), kimmo, readability, textual entailment, timex, TnT

Browse the source code: http://nltk.org/nltk/

Browse the subversion repository: http://code.google.com/p/nltk/source/browse/trunk/nltk

Personal tools