Skip to content

Text Encoding November 2020

Gabriel Bodard edited this page Nov 27, 2020 · 21 revisions

Text Encoding for Ancient and Modern Literature, Languages and History

November 23–30, 2020

Tutors: Jonathan Blaney, Gabriel Bodard, Christopher Ohge

Text encoding
  • This online workshop will use asynchronous teaching to introduce participants to a range of practices and issues in text encoding and annotation. We will work hands-on with HTML and Markdown or “Wikitext” encoding, and the Recogito annotation platform, and discuss the theory and context of text encoding in academic research and editing. We will mention XML and TEI but not work hands-on with this more complex form of text encoding.

Thank you for registering for the Text Encoding for Ancient and Modern Literature, Languages and History workshop jointly run by the ICS, IES, IHR and IMLR at the University of London. This training will be offered over two separate short meetings at 14:00 GMT on Monday, and at 12:00 GMT on Friday. It is essential that you commit to attending all three sessions, and to doing at least a couple of hours of preparation and practice between the sessions; there will also be group work and discussion.

In preparation for the first session on Monday, please install a text editor:

  • Default: download and install the Atom text editor (https://atom.io/). Then click on Atom > Preferences > +Install and type 'markdown-preview-enhanced' in the Search packages box.
  • If you have any problem with Atom, you can try: download and install Visual Studio Code (https://code.visualstudio.com/Download), or the web-based Markdown editing app, Dillinger (https://dillinger.io/).

For the Friday session you will also want to have created an account on the Recogito website (https://recogito.pelagios.org/).

We will provide you by email with the link to the YouTube channel in which the sessions will take place. You will also be provided with some online tutorials (video or text) to view after the first session, some optional readings, and details of an exercise to complete before the Friday session. For the feedback and discussion, we will use an online discussion forum, to which you will all be invited. You may unsubscribe from the forum at the end of the workshop or at any time.

Session 1: Markdown and HTML

Videos and Tutorials

Exercise

  1. Download the page of Alexander Pope's Dunciad here.
  2. Confer with your group on the document--discuss some of the challenges, what you would like to represent, and how.
  3. Either individually or in consultation with the whole group, encode the page in Markdown in the Atom text editor. (Hint: start with transcription and basic text structure first.)
  4. Preview the encoded page in the Atom text editor.
  5. Open the html rendering of the file in Atom.

Discussion questions

Using the GitHub Issues forum, please discuss the following questions with the other participants and instructors.

  1. Is Markdown or HTML better suited to encoding this example? (Is there any difference?)
  2. What might we want to be able to encode in this text that Markdown/HTML doesn’t allow?
  3. What features might you want to add that are only possible in the digital medium?
  4. Do you think you have to sacrifice display for semantics? (Or vice versa?)
  5. Who is the imagined audience of your web page? Does that affect the decisions you made?
  • If you need any additional technical help with this exercise, you may ask in this forum ticket.

Session 2: Annotation

Videos and Tutorials

Exercise

  1. We have prepared three texts, and assigned one to each group. Discuss with your group in advance how you are going to approach annotation.
    1. Document 1: Anabasis RAW (Groups 1, 4, 7, 10)
    2. Document 2: Anabasis NER (Groups 2, 5, 8, 11)
    3. Document 3: Al-Idrisi’s Tabula Rogeriana (Groups 3, 6, 9)
  2. Please annotate as many places and placenames as you can, and try to georesolve them with reference to an appropriate gazetteer.
  3. Think about some other feature you would like to annotate, free-form, using some combination of tags, keywords, or URIs in the Recogito pop-up. What information are you adding? What are you losing?
  4. Look at the internal map visualisation. Do you see any problems? Anything you can fix?
  5. Export your annotations in .CSV and open in Excel/Spreadsheets. What information is in this file? Is there anything you cannot identify? What is not in there?

Discussion questions

  1. Did you find the simplicity of the tags-free interface in Recogito contrasted with the transparency of adding Markdown tags directly in text?
  2. What do you want to do with the annotations? In what format? For what audience? How does that change the decisions you made about what/how to annotate?
  3. Did you learn anything from visualising the data—either the process or the result—that you would not have learned from studying the original document in a conventional way?
  4. A different person will annotate these documents differently, perhaps because of different interests, knowledge of context, or approach to annotation. Does that matter?
  5. How much did the automated NER help/hinder the process of annotating your text?
  6. Were there things you wanted to annotate that Recogito did not enable?

Session 3: Further resources