TreeTagger installation into TXM tutorial 1

To be able to automatically lemmatize your corpora during the import process into TXM, follow one of the two tutorials A or B below.

A. TreeTagger installation into TXM 0.8.0 and versions above

Starting from TXM 0.8.0, two extensions dedicated to TreeTagger install automatically the TreeTagger software and the French and English models:

  1. Call the “File > Add an Extension” command
  2. Select the “TreeTagger software” and the “TreeTagger models” lines to install the TreeTagger software and French and English models
  3. Validate the next steps
  4. After TXM has restarted, TreeTagger is ready to be used
  5. To install more language models, follow the 4th section of the TreeTagger manual installation below
  6. The End

B. TreeTagger “manual” installation into TXM 0.7.9 and previous versions

This tutorial will guide you to:

  1. Download the TreeTagger software and some language specific model files
  2. Tell the TXM platform where TreeTagger and its model files are installed on your machine

B.1 Download files from the web and prepare them

While connected to the Internet:

  1. Download the TreeTagger software archive from the TreeTagger web site:
  2. Extract the content (bin, cmd, doc, FILES, LICENSE and README) to a folder named “treetagger” located in your applications folder 2. Depending on your system, in:

    Windows C:\Programs\treetagger
    Windows XP C:\Program Files\treetagger
    Mac OS X /Applications/treetagger
    Linux /usr/lib/treetagger

    Check: After extraction, the treetagger folder must contain the following files and directories : bin, cmd, doc, FILES and README.

    Note: This way of installing TreeTagger is specific to TXM. You really just need to extract the contents of the TreeTagger archive. You don’t need to follow any additionnal instructions found in any INSTALL.txt file that could be found in the archive.

  3. Create a “treetagger-models” folder in your ‘TXM-0.8.1’ folder 3. It will contain all the language specific model files.
  4. Download from TreeTagger website a language model file (compressed file: ‘*.gz’) for each language in which you may need to tag a text:
  1. Extract the downloaded model(s) archive(s) into the “treetagger-models” folder.
    Under Windows, if you don’t know how to extract ‘*.gz’ files, we recommend to use the 7-zip open-source software.
  2. Rename each model file according to the 2-letter ISO 639-1 language code standard. For instance:
    • ‘french.par’ to ‘fr.par’
    • ‘english.par’ to ‘en.par’
    • etc.
      With Windows and Mac OS X : The default behavior of these sytems is to hide file extensions they think they can manage. This may mislead the user when he rename a file (the name displayed is “fr.par” but the real file name is “fr.par.bin”
      In that case, you need to display and check the real file names in your Explorer/Finder:
      • Under Windows :
        1. Follow the official tutorial: Show or hide file name extensions
        2. You can now choose the appropriate file name.
      • Under Mac OS X :
        1. Double click on the file icon (Ctrl-click mouse or double-finger tap in the trackpad)
        2. Select the ‘Get Info’ menu entry
        3. Edit the ‘Name and Extension’ field : delete the ‘.bin’ extension.
        4. Close the “Info” window. Check: the ‘treetagger-models’ folder must contain some model files like the ‘fr.par’ file of size about 18 Mo or the ‘en.par’ file of size about 14.4 Mo.

B.2 In TXM

1. Set the TreeTagger preferences

  1. Select the ‘Edit / Preferences’ main menu entry
  2. Go to the ‘TXM / Advanced / NLP / TreeTagger’ page (see figure 1)
  3. Set the ‘Path to the install folder’ preference to the ‘treetagger’ folder path
  4. Set the ‘Path to the linguistic models folder’ preference to the ‘treetagger-models’ folder path
  5. Finish with the ‘OK’ button to save the preferences

Figure 1: TreeTagger preferences in TXM

2. Check the installation

  1. Copy the following text:

     Running SearchEngine in memory mode.
     Statistical Engine launched.connected.
     Reloading subcorpora and partitions...Done.
     No update available.
  2. In TXM launch the File > Import > Clipboard command

  3. Check in the console that the last lines are:

     pAttrs : [id, lbid, enpos, enlemma]
     sAttrs : [text:+id+path+base+project, s:+n, p:+id, txmcorpus:+lang]
     -- EDITION - Building edition
     Import done:3sec (3265 ms)
     Running SearchEngine in memory mode.
     Statistical Engine launched.connected.
     Reloading subcorpora and partitions...Done.
     TXM is ready.

    (Note that the first above line should contain enpos and enlemma. But the indication of time after “Import done” can of course be different.)

In case of difficulty you can find further help in the FAQ (in French).

If you can’t manage the installation process, please send your enquiries to the TXM users mailing list (txm-users AT after subscribing to the mailing list, or contact the TXM team by mail.


  1. TreeTagger licence prohibits the delivery of TreeTagger embedded in a commercial software. As TXM licence doesn’t prevent anyone to do business with TXM, we can not include TreeTagger in the TXM distribution. See TreeTagger web site 

  2. If you don’t have access rights to create the folder in the applications folder, you can create it in your home folder. 

  3. The ‘TXM-0.8.1’ folder is in your home directory.