TXM XSLT import processing library
This is a collection of XSLT (1.0 or 2.0) stylesheets that can be used to prepare various types of XML documents for import into TXM. Place them in the appropriate xsl/step subfolder when using XML-TEI Zero + CSV (XTZ) import module or use “Front XSLT” option in the import parameters interface to select the appropriate filter in the XML/w+CSV (XML/w) import module.
Filters are usually named according to the following pattern:
txm-filter-[input format]-[import module](-[option])?
[to download a file below, right clic on its link and call “Save link target as…”]
Table of Contents
- Stylesheets for use with the XML TEI Zero+CSV (XTZ) import module
- Basic stylesheets for filtering XML sources
- Basic stylesheets for adapting XML TEI P5 sources
- Additional stylesheets for particular corpora
Stylesheets for use with the XML TEI Zero+CSV (XTZ) import module
1-split-merge step
Due to a bug in TXM 0.7.8 and 0.7.9, this processing step is not working properly. The stylesheets mentioned below must be applied prior to the import using ExecXSL macro or any other XSLT 2.0 processor.
- txm-rename-files-no-dots.xsl : This stylesheet is designed for TXM XTZ+CSV import module to replace dots with underscores in source file names. (A bug in TXM 0.7.8 prevented files containing dots in their names from being imported, this bug has been resolved in TXM 0.7.9).
- txm-split-teicorpus : This stylesheet may be used to split a single file containing a teiCorpus into individual files for each TEI child.
2-front step
- txm-front-teiHeader2textAtt.xsl : This stylesheet may be customized to extract metadata from teiHeader and create corresponding attributes of the text element.
- txm-front-teitxm2xmlw.xsl : This stylesheet may be used to import TEI-TXM XML files with XML-TEI Zero+CSV (or XML/W + CSV) module. This module is more flexible than XML-TEI TXM. It allows re-tokenizing the texts, selecting and renaming annotations, and building synoptic editions.
3-posttok step
- txm-posttok-addRef.xsl : This stylesheet may be customized to add a ref attribute to w elements which will be used as a default reference in TXM concordances.
- txm-posttok-unbreakWords.xsl : This stylesheet may be customized to re-unite the words broken in the primary tokenization process (due to line or page breaks, for instance)
- txm-posttok-structure2wordAtt.xsl : This stylesheet projects the number of nesting selected ancestor elements to attributes of the w element.Enter element names separated by | as the value of elementsToProject parameter.
4-edition step
- 1-default-html.xsl : This is an alternative stylesheet for creating default editions with the XTZ module. It transforms every TEI element into an HTML span with @class. This stylesheet must be used in conjunction with 2-default-pager.xsl.
- 2-defaut-pager.xsl : This stylesheet must be used in conjunction with 1-default-html.xsl to create edition pages.
Basic stylesheets for filtering XML sources
- filter-keep-only-select.xsl : This stylesheet may be customized to filter out all the text and tags except the content of the specified element (select by default) and its ancestors.
- filter-out-p.xsl : This stylesheet may be customized to filter out any particular xml element (p by default) and its content from the source document.
- filter-out-sp.xsl : This stylesheet may be customized to filter out any particular xml element with a specific attribute value (sp with an attribute who with the value ‘enqueteur’ by default) and its content from the source document.
Basic stylesheets for adapting XML TEI P5 sources
- txm-filter-teip5-teibfm.xsl : This stylesheet may be customized for use with any TEI P5 in the TEI BFM import module. Note that this module is experimental and may fail on documents that do not follow BFM encoding guidelines.
- txm-filter-teip5-xmlw-preserve.xsl : This stylesheet may be customized for use with any TEI P5 in the XML/w import module. By default, it eliminates teiHeader and facsimile elements and their contents and preserves all other elements.
- txm-filter-teip5-xmlw-simplify.xsl : This stylesheet may be customized for use with any TEI P5 in the XML/w import module. By default, it eliminates teiHeader, facsimile and all note elements and their contents and filters out all tags in the text body except ab, body, div, front, lb, p, pb, s, TEI, text and w.
Additional stylesheets for particular corpora
- p4top5_perseus.xsl : This stylesheet is needed to convert Perseus TEI P4 files to TEI P5 prior to any import process.
- txm-edition-page-split.xsl : This styleheet should be used to create separate HTML pages for TXM editions.
- txm-edition-xmltxm-textgrid.xsl : This styleheet should be used to customize TXM editions of DARIAH-DE Textgrid texts.
- txm-edition-xtz-corpusakkadien-translit.xsl : This stylesheet should be used to customize translitterated TXM editions of cuneiform Akkadian tablets, see the project wiki.xsl for more details.
- txm-edition-xtz-cuneiform.xsl : This stylesheet should be used to create cuneiform TXM editions of Akkadian tablets, see the project wiki.xsl for more details.
- txm-filter-corpusakkadien-xmlw_syllabes-cuneiform.xsl : This stylesheet should be used to on a corpus of Akkadian tablets with the XML/w import module, see the project wiki.xsl for more details.
- txm-filter-perseustreebank-xmlw.xsl : This filter should be used on the Perseus Treebank corpus texts with the XML/w import module.
- txm-filter-qgraal_cm-xmlw.xsl : This styleheet should be used on the diffracted format of Quest del Saint Graal source files with the XML/w import module.
- txm-filter-rnc-xmlw.xsl : This filter should be used on the Russian National Corpus texts with the XML/w import module.
- txm-filter-teibrown-xmlw.xsl : This filter should be used on the TEI Brown corpus texts with the XML/w import module.
- txm-filter-teibvh-xmlw.xsl : This filter should be used on the TEI BVH texts with the XML/w import module.
- txm-filter-teibvh-xmlw-posttok.xsl : This styleheet should be used to fix the tokenization errors and to adjust word properties in the tokenized version of TEI BVH texts.
- txm-filter-teicorpustextgrid-xmlw.xsl : This styleheet should be used to prepare DARIAH-DE TEIcorpus xml files to TXM XML/w import process.
- txm-filter-teifrantext-teibfm.xsl : This filter should be used on TEI Frantext texts with the TEI BFM import module. It is automatically applied in the TEI Frantext import module. Note that this module is experimental and may fail on documents that do not follow BFM encoding guidelines.
- txm-filter-teifrantext-xmlw.xsl : This styleheet should be used on TEI Frantext texts with the XML/w import module.
- txm-filter-teiperseus-xmlw.xsl : This filter should be used on the TEI Perseus corpus texts with the XML/w import module (after conversion to TEI P5).
- txm-filter-teitextgrid-xmlw-posttok.xsl : This styleheet should be used to adjust word properties in the tokenized version of DARIAH-DE Textgrid texts.
- txm-front-idsHeader2textAtt.xsl : This stylesheet may be used to project metadata from idsHeader (Mannheim German Language Institute corpus, IDS-XCES schema) to text attributes
- txm-split-xces-ids-corpus2text.xsl : This stylesheet transforms a single file of a XCES-IDS corpus (Mannheim German Language Institute corpus) into as many files as separate texts for XTZ import module.
Please address any enquiries about the TXM XSLT library to textometrie AT groupes.renater.fr