-
Notifications
You must be signed in to change notification settings - Fork 93
Description
Hello!
I was looking through this text (Aristotle's Physics) and found a few typos in the metadata. I intend to submit a pull request soon-ish, but documenting typos here.
Oooof, as I was checking possible errors against the source text, I was led down a rabbit-hole of what I think are two systematic errors. The 'False/Extra Line Number at Chapter Start' should be easy enough to regex fix. The '<lb> usage/numbering' question, if it is an error and not just my poor assumption on vocabulary meanings, would be a bit more difficult to fix.
This metadata is useful for a project I am doing, but I am curious - how critical are the line beginning marks for the Scaife viewer and other interfaces that use this data? Are there other digitized-to-text sources of this Greek text (with line numbers) that I could use in the meantime?
I don't have the time to correct all this right now, but I wanted to document this and I also wanted to get feedback on the two bigger systematic errors/questions I had before sinking time into them.
Source Text
Going off of:
| <ti:description xml:lang="mul">Aristotle. Aristotelis Physica. Ross, W.D., editor. Oxford: Clarendon, 1960.</ti:description> |
I am using this Internet Archive edition: https://archive.org/details/aristotlesphysic0000wdro/ (and also https://archive.org/details/aristotelisopera01arisrich as a rough check on text & line numbers)
Ok, I missed this link until later http://digital.slub-dresden.de/id416133894 at the head of the tei file, but that page is painfully slow, and I don't want to update all the links below now. A quick spot-check of a few pages show the same Greek text pages here and on Internet Archive (but no intro etc as in Internet Archive edition)
✅ False/Extra Line Number at Chapter Start
First1KGreek/data/tlg0086/tlg031/tlg0086.tlg031.1st1K-grc1.xml
Lines 1595 to 1597 in 6812dab
| <div type="textpart" subtype="chapter" n="6"> | |
| <p>Ὅτι δʼ εἰ μὴ ἔστιν ἄπειρον ἀπλῶς, πολλὰ ἀδύνατα <lb n="6"/> | |
| <lb n="10"/> συμβαίνει, δῆλον. τοῦ τε γὰρ χρόνου ἔσται τις ἀρχὴ καὶ |
on line 1596: <lb n="6"/> should be removed, it is the chapter number (which is captured in metadata one line above) See: https://archive.org/details/aristotlesphysic0000wdro/page/178/mode/1up
Oh, hmmm, and this seems like it is a systematic error. I was looking for this error above to get the line permalink, and instead I first found this error below and others like it. I'd imagine most chapter number marks have this problem?
TODO: look at other This did affect many chapters, fixed with regex, see commit message in linked PR.subtype="chapter" lines for spurious lb tags
First1KGreek/data/tlg0086/tlg031/tlg0086.tlg031.1st1K-grc1.xml
Lines 418 to 422 in 6812dab
| <div type="textpart" subtype="chapter" n="6"> | |
| <p>Ἐχόμενον δʼ ἂν εἴη λέγειν πότερον δύο ἢ τρεῖς ἢ πλείους <lb n="6"/> | |
| εἰσίν. μίαν μὲν γὰρ οὐχ οἷόν τε, ὅτι οὐχ ἓν τὰ ἐναντία, ἀπείρους <pb n="13"/> δʼ, ὅτι οὐκ ἐπιστητὸν τὸ ὂν ἔσται, μία τε ἐναντίωσις ἐν | |
| παντὶ γένε ἑνί, ἡ δʼ οὐσία ἕν τι γένος, καὶ ὅτι ἐνδέχεται ἐκ | |
| πεπερασμένων, βέλτιον δʼ ἐκ πεπερασμένων, ὥσπερ Ἐμπε· <lb n="15"/> |
on line 419: <lb n="6"/> should be removed, it is the chapter number (which is captured in metadata one line above)
✅ Line number typo
| ἔχειν τινὰ ὁμοιότητα <lb n="10"/> τῷ ὅλῳ. ἔστι γὰρ τὸ ἄπειρον τῆς τοῦ |
should be line begin 20, not 10. This <lb n="10"/> appears between a line 15 and then 25, so I assumed it should be 10. But I went to source to make sure it wasn't a typo there or something. And indeed, it should be 20 - see https://archive.org/details/aristotlesphysic0000wdro/page/181/mode/1up and https://archive.org/details/aristotelisopera01arisrich/page/207/mode/1up But those sources led me to another question...
❓ <lb> usage/numbering?
Is <lb> 'line beginning' as here: https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-lb.html ? Because I think the lb tags in this section of text (same as above, just more lines of context here) are misplaced:
First1KGreek/data/tlg0086/tlg031/tlg0086.tlg031.1st1K-grc1.xml
Lines 1661 to 1670 in 6812dab
| ἔξω, τοῦτʼ ἔστι τέλειον καὶ ὅλον· οὕτω γὰρ ὁριζόμεθα τὸ ὅλον, οὗ <lb n="10"/> μηδὲν ἄπεστιν, οἱον ἄνθρωπον ὅλον ἢ κιβώτιον. ὥσπερ δὲ <pb n="60"/> τὸ καθʼ ἕκαστον, οὕτω καὶ τὸ κυρίως, οἷον τὸ ὅλον οὗ μηδέν | |
| ἐστιν ἔξω· οὗ δʼ ἔστιν ἀπουσία ἔξω, οὐ πᾶν, ὅ τι ἂν ἀπῇ. ὅλον δὲ καὶ | |
| τέλειον ἢ τὸ αὐτὸ πάμπαν ἢ σύνεγγυς τὴν φύσιν. τέλειον δʼ οὐδὲν μὴ ἔχον | |
| τέλος· τὸ δὲ τέλος πέρας. διὸ βέλτιον οἰητέον Παρμενίδην Μελίσσου | |
| εἰρηκέναι· <lb n="15"/> ὁ μὲν γὰρ τὸ ἄπειρον ὅλον φησίν, ὁ δὲ τὸ ὅλον | |
| πεπεράν· θαι, “μεσσόθεν ἰσοπαλές ”. οὐ γὰρ λίνον λίνῳ συνάπτειν ἐστὶν τῷ | |
| ἅπαντι καὶ ὅλῳ τὸ ἄπειρον, ἐπεὶ ἐντεῦθέν γε λαμβάνουσι τὴν σεμνότητα | |
| κατὰ τοῦ ἀπείρου, τὸ πάντα περιέχειν καὶ τὸ πᾶν ἐν ἑαυτῷ ἔχειν, διὰ τὸ | |
| ἔχειν τινὰ ὁμοιότητα <lb n="10"/> τῷ ὅλῳ. ἔστι γὰρ τὸ ἄπειρον τῆς τοῦ | |
| μεγέθους τελειότητος ὕλη καὶ τὸ δυνάμει ὅλον, ἔντελεχείᾳ δʼ οὔ, |
the lb tag should corrected and moved to instead be like so (code-lines 1668-1669):
κατὰ τοῦ ἀπείρου, τὸ πάντα περιέχειν <lb n="20"/> καὶ τὸ πᾶν ἐν ἑαυτῷ ἔχειν, διὰ τὸ
ἔχειν τινὰ ὁμοιότητα τῷ ὅλῳ. ἔστι γὰρ τὸ ἄπειρον τῆς τοῦ
I've included more context because this looks like an error on all right-hand side pages (that I checked) of transcribing this source text (in the file, the even numbered pbs)
Important
TODO: see if this end-of-line vs beginning-of-line error occurs on all right-hand pages of this text
A quick scan of this text versus source images shows:
pb59: correct, haslbmarks at the beginning of the lines. see: https://archive.org/details/aristotlesphysic0000wdro/page/180/mode/1uppb60: incorrect, haslbmarks at end of line of text (said another way, all lb numbers are off by one, e.g. lb 15 would be accurate if it was instead lb 16). see: https://archive.org/details/aristotlesphysic0000wdro/page/181/mode/1uppb61: correct, haslbmarks at the beginning of the lines. see: https://archive.org/details/aristotlesphysic0000wdro/page/182/mode/1uppb62: incorrect, haslbmarks at end of line of text. see: https://archive.org/details/aristotlesphysic0000wdro/page/183/mode/1up
This error occurs with the <note type="marginal"> marks as well on right-hand side pages, depending on how those marks are used/interpreted. I was taking these to be an implied 'beginning of line number 1 of page 207 column a'. But as the tei file is now, it is technically correct - the text of the marginal note is correct, and the marginal note is correctly placed according to the source text. It is only 'wrongly' placed if we assume it to mark the beginning of the line. If all the 'incorrect' lb tags on the even pb pages were instead just <note type="marginal"> types, then they too would be in the correct place.
So the bigger question is: Is the intent of this file to accurately encode the location of marginal notes according to the source text? Or to accurately mark and number the beginning of lines of text? Both could be done, but not sure how these files are used/interpreted downstream. e.g. on left-hand side pages this tei file could, for marked line numbers, say: <note type="marginal">15</note><lb n="15"/>... and right-hand side pages would say: <lb n="20"/> καὶ τὸ πᾶν ἐν ἑαυτῷ ἔχειν, διὰ τὸ ἔχειν τινὰ ὁμοιότητα <note type="marginal">20</note>
Also ref: https://archive.org/details/aristotelisopera01arisrich/page/206/mode/1up, https://archive.org/details/aristotelisopera01arisrich/page/207/mode/1up
✅ Line Number in incorrect place
First1KGreek/data/tlg0086/tlg031/tlg0086.tlg031.1st1K-grc1.xml
Lines 1648 to 1652 in 6812dab
| καθαίρεσιν ἄπειρον ὑπάρχει (ἡ γὰρ μονὰς ἐλάχι· <lb n="33"/> στον, οὔτε | |
| <add cause="fix">τὸ</add> ἐπὶ τὴν αὔξην (μέχρι γὰρ δεκάδος ποιεῖ τὸν | |
| ἀριθμόν). <lg> | |
| <lb n="33"/> | |
| <l>συμβαίνει δὲ τοὐναντίον εἶναι ἄπειρον ἢ ὡς λέγουσιν.</l> |
should be:
καθαίρεσιν ἄπειρον ὑπάρχει (ἡ γὰρ μονὰς ἐλάχι· στον, οὔτε
<add cause="fix">τὸ</add> ἐπὶ τὴν αὔξην (μέχρι γὰρ δεκάδος ποιεῖ τὸν
<lb n="33"/> ἀριθμόν). <lg>
assuming that: if a word is split over a line ending, then place the lb mark at the beginning of the word (and un-split it in this tei file). If the lb mark should go after the end of the split word, then it should be: ἀριθμόν<lb n="33"/>). <lg> ? see: https://archive.org/details/aristotlesphysic0000wdro/page/180/mode/1up
❓ Confusing Marginal Note
| <note type="marginal">29a</note> |
This is technically a marginal note, but is really a line number where the source text has a deletion marked. Not sure if anything really needs to be done for this, it doesn't seem to mess up the Scaife Viewer of this portion
See: https://archive.org/details/aristotlesphysic0000wdro/page/179/mode/1up and
actual version used: https://digital.slub-dresden.de/werkansicht?tx_dlf%5Bid%5D=109594&tx_dlf%5Bpage%5D=67