ftw. Text Modeller

Overview

The ftw. Text Modeller is software for fitting whole-sentence language models. The code has now been generalized and contributed to the Open Source project SciPy.

Routines using the SciPy maxentropy module specifically for modelling text are not available publicly at this time. If you are interested in using them, contact the author, Edward Schofield.

Whom is it for?

It is for engineers and scientists wishing to perform density estimation on high-dimensional continuous or discrete sample spaces. Modelling domains include sentences, acoustic features of speech signals, and images. Application domains include speech recognition, language translation, image annotation, and other examples of statistical pattern recognition.

What does it do?

It implements algorithms and provides a high-level interface for fitting and testing models. The framework is maximum entropy under linear moment constraints, as espoused by E. T. Jaynes (see here) and adopted for the Candide language translation system of IBM Research in the 1990s (described here).

More info on the modelling procedure is here.

How do I use it?

See the tutorial here.