A Basic Guide to Text Encoding

Encoding: getting started

1. The text selected to encode will need to be transferred to a text editor. Manually typing the text and OCR scanning of the text are most common methods of transfer. Maintaining the format of the original text and noting typographical characteristics are helpful for placing the TEI tags. Tags may be used to indicate paragraph and line breaks, pagination, and major divisions of the text such as chapter or section headings. In addition, tags may be placed around typographical characteristics such as underlined or italicized text, hyphenation, special characters such as the ampersand or dollar sign, and alternate spellings and misspellings.
2. The visual presentation of a TEI encoded document requires the use of a style sheet or other conversion program. The University of Virginia states in its "Guidelines for SGML Text Mark-up at the Electronic Text Center":
SGML texts are not, of course, designed to be read "in the raw". Ideally, one uses them through software tools that interpret the tags as database "fields" while searching and as a set of typographical layout instructions while displaying the results.
3. The UNL Libraries Electronic Text Center uses Text Encoding Initiative (TEI) tag sets and rules, a sub-set of Extensible Markup Language (XML), to encode texts. TEI tags describe structural divisions and characteristics of a given text. As stated from the University of Virginia's "Guidelines for SGML Text Mark-up at the Electronic Text Center":
By recording the structure of a text, such tags allow one to use an SGML [or XML] search program to constrain searches to particular elements: one cannot limit a search to a single chapter in a novel if there are no markers in the text for chapter divisions; one cannot view a quotation from a play in the context of a scene if the scenes are not delimited.
4. Additional characteristics to note about TEI encoding include the following:
1. There is always an opening tag and a closing tag.
2. These tags are case sensitive and must be nested properly.
3. Attributes may be used to further define the tags.
5. TEI Templates
Almost all TEI documents posses a basic set of tags:
- TEI Header
- Front Matter
- Body
- Back Matter

Because the presence of these tags is so consistent, the Electronic Text Center makes available a number of templates and examples to help you get started generating them:

1. Example of a simple TEI Letter Template And a useful parsing tool.

2. Example of a TEI Postcard Template

3. Example of a NoteTab Light Clipbook Library to Automatically Generate a TEI Postcard Template

4. Example of a NoteTab Light Clipbook Library to Automatically Generate a TEI Letter Template.doc

5. Example of a DTD Template

6. Example of an EAD Finding Aid Template

7. Example of an XSLT Stylesheet Template for an EAD Finding Aid.

What makes a TEI document unique is the tagging specific to what is being encoded. Although there are some tags that will be the same no matter what source is being encoded, there are also tags that are unique to a particular genre such as drama, poetry, or prose. Templates are empty TEI documents that possess the basic tag sets, and the specific tag sets make each template unique. Available templates may be used or customized to fit the goals of the project in question.

Electronic Text Center Home | UNL Libraries Home

Please send comments and questions about this page to etcenter@unlnotes.unl.edu.
© 2003-2005 Copyright of the University of Nebraska Board of Regents.