Skip to content

Text Encoding June 2020

Christopher Ohge edited this page Jun 24, 2020 · 39 revisions

Text Encoding for Ancient and Modern Literature, Languages and History

June 2, 5 & 8, 2020

Text encoding

Tutors: Jonathan Blaney, Gabriel Bodard, Christopher Ohge, Naomi Wells

  • This online workshop will use a mix of real-time and asynchronous teaching to introduce participants to a range of practices and issues in text encoding and annotation. We will work hands-on with HTML and Markdown or “Wikitext” encoding, and the Recogito annotation platform, and discuss the theory and context of text encoding in academic research and editing. We will mention XML and TEI but not work hands-on with this more complex form of text encoding.

Thank you for registering for the Text Encoding for Ancient and Modern Literature, Languages and History workshop jointly run by the ICS, IES, IHR and IMLR at the University of London. This training will be offered over three separate short meetings at 3pm (UK time), on Tuesday and Friday next week and the following Monday. It is essential that you commit to attending all three sessions, and to doing at least a couple of hours of preparation and practice between the sessions; there will also be group work and discussion.

In preparation for the first session on Tuesday, please install a text editor:

  • Mac users: download and install the Atom text editor (https://atom.io/). Then click on Atom > Preferences > +Install and type 'markdown-preview-enhanced' in the Search packages box.

  • Windows: download and install Visual Studio Code (https://code.visualstudio.com/Download).

Alternatively, you can use the web-based Markdown editing app, Dillinger (https://dillinger.io/).

For the Friday session you will also want to have created an account on the Recogito website (https://recogito.pelagios.org/).

We will provide you by email with the link and password to the Zoom channel in which the sessions will take place. You will also be provided with some online tutorials (video or text) to view after the first session, some optional readings, and details of an exercise to complete before the Friday session. For the feedback and discussion, we will use an online discussion forum, to which you will all be invited. You may unsubscribe from the forum at the end of the workshop or at any time.

Session 1: Markdown and HTML

Videos and Tutorials

Exercise

  1. Download the page of Alexander Pope's Dunciad here.
  2. Confer with your partner on the document--discuss some of the challenges, what you would like to represent, and how.
  3. Either individually or in consultation with your partner, encode the page in Markdown in the Atom text editor. (Hint: start with transcription and basic text structure first.)
  4. Preview the encoded page in the Atom text editor.
  5. Open the html rendering of the file in Atom.

Discussion questions

Using the Google forum to which you have all been invited, please discuss the following questions with the other participants and instructors. You may use the forum as a web interface, or set up your account as an email list, either works fine.

  1. Is Markdown or HTML better suited to encoding this example? (Is there any difference?)
  2. What might we want to be able to encode in this text that Markdown/HTML doesn’t allow?
  3. What features might you want to add that are only possible in the digital medium?
  4. Do you think you have to sacrifice display for semantics? (Or vice versa?)
  5. Who is the imagined audience of your web page? Does that affect the decisions you made?

Session 2: Annotation

Videos and Tutorials

Exercise

  1. We have prepared three texts, and assigned one to each group. Discuss with your group in advance how you are going to approach annotation.
    1. Group 1: Anabasis RAW
    2. Group 2: Anabasis NER
      Googlemap superimposing the two Anabasis versions as layers
    3. Group 3: Al-Idrisi’s Tabula Rogeriana
  2. Please annotate as many places and placenames as you can, and try to georesolve them with reference to an appropriate gazetteer.
  3. Think about some other feature you would like to annotate, free-form, using some combination of tags, keywords, or URIs in the Recogito pop-up. What information are you adding? What are you losing?
  4. Look at the internal map visualisation. Do you see any problems? Anything you can fix?
  5. Export your annotations in .CSV and open in Excel/Spreadsheets. What information is in this file? Is there anything you cannot identify? What is not in there?

Discussion questions

  1. Did you find the simplicity of the tags-free interface in Recogito contrasted with the transparency of adding Markdown tags directly in text?
  2. What do you want to do with the annotations? In what format? For what audience? How does that change the decisions you made about what/how to annotate?
  3. Did you learn anything from visualising the data—either the process or the result—that you would not have learned from studying the original document in a conventional way?
  4. A different person will annotate these documents differently, perhaps because of different interests, knowledge of context, or approach to annotation. Does that matter?
  5. How much did the automated NER help/hinder the process of annotating your text?
  6. Were there things you wanted to annotate that Recogito did not enable?

Session 3: Further resources