TXM XSLT import processing library

This is a collection of XSLT (1.0 or 2.0) stylesheets that can be used to prepare various types of XML documents for import into TXM. Place them in the appropriate xsl/step subfolder when using XML-TEI Zero + CSV (XTZ) import module or use “Front XSLT” option in the import parameters interface to select the appropriate filter in the XML/w+CSV (XML/w) import module.

Filters are usually named according to the following pattern:

txm-filter-[input format]-[import module](-[option])?

[to download a file below, right clic on its link and call “Save link target as…”]

Table of Contents

  1. Stylesheets for use with the XML TEI Zero+CSV (XTZ) import module
    1. 1-split-merge step
    2. 2-front step
    3. 3-posttok step
    4. 4-edition step
  2. Basic stylesheets for filtering XML sources
  3. Basic stylesheets for adapting XML TEI P5 sources
  4. Additional stylesheets for particular corpora

Stylesheets for use with the XML TEI Zero+CSV (XTZ) import module

1-split-merge step

Due to a bug in TXM 0.7.8 and 0.7.9, this processing step is not working properly. The stylesheets mentioned below must be applied prior to the import using ExecXSL macro or any other XSLT 2.0 processor.

  • txm-rename-files-no-dots.xsl : This stylesheet is designed for TXM XTZ+CSV import module to replace dots with underscores in source file names. (A bug in TXM 0.7.8 prevented files containing dots in their names from being imported, this bug has been resolved in TXM 0.7.9).
  • txm-split-teicorpus : This stylesheet may be used to split a single file containing a teiCorpus into individual files for each TEI child.

2-front step

  • txm-front-teiHeader2textAtt.xsl : This stylesheet may be customized to extract metadata from teiHeader and create corresponding attributes of the text element.
  • txm-front-teitxm2xmlw.xsl : This stylesheet may be used to import TEI-TXM XML files with XML-TEI Zero+CSV (or XML/W + CSV) module. This module is more flexible than XML-TEI TXM. It allows re-tokenizing the texts, selecting and renaming annotations, and building synoptic editions.

3-posttok step

  • txm-posttok-addRef.xsl : This stylesheet may be customized to add a ref attribute to w elements which will be used as a default reference in TXM concordances.
  • txm-posttok-unbreakWords.xsl : This stylesheet may be customized to re-unite the words broken in the primary tokenization process (due to line or page breaks, for instance)
  • txm-posttok-structure2wordAtt.xsl : This stylesheet projects the number of nesting selected ancestor elements to attributes of the w element.Enter element names separated by | as the value of elementsToProject parameter.

4-edition step

  • 1-default-html.xsl : This is an alternative stylesheet for creating default editions with the XTZ module. It transforms every TEI element into an HTML span with @class. This stylesheet must be used in conjunction with 2-default-pager.xsl.
  • 2-defaut-pager.xsl : This stylesheet must be used in conjunction with 1-default-html.xsl to create edition pages.

Basic stylesheets for filtering XML sources

  • filter-keep-only-select.xsl : This stylesheet may be customized to filter out all the text and tags except the content of the specified element (select by default) and its ancestors.
  • filter-out-p.xsl : This stylesheet may be customized to filter out any particular xml element (p by default) and its content from the source document.
  • filter-out-sp.xsl : This stylesheet may be customized to filter out any particular xml element with a specific attribute value (sp with an attribute who with the value ‘enqueteur’ by default) and its content from the source document.

Basic stylesheets for adapting XML TEI P5 sources

  • txm-filter-teip5-teibfm.xsl : This stylesheet may be customized for use with any TEI P5 in the TEI BFM import module. Note that this module is experimental and may fail on documents that do not follow BFM encoding guidelines.
  • txm-filter-teip5-xmlw-preserve.xsl : This stylesheet may be customized for use with any TEI P5 in the XML/w import module. By default, it eliminates teiHeader and facsimile elements and their contents and preserves all other elements.
  • txm-filter-teip5-xmlw-simplify.xsl : This stylesheet may be customized for use with any TEI P5 in the XML/w import module. By default, it eliminates teiHeader, facsimile and all note elements and their contents and filters out all tags in the text body except ab, body, div, front, lb, p, pb, s, TEI, text and w.

Additional stylesheets for particular corpora

Please address any enquiries about the TXM XSLT library to textometrie AT groupes.renater.fr