raopr · raopr · Aug 10, 2024 · Aug 10, 2024 · Aug 10, 2024 · Aug 10, 2024
diff --git a/README.md b/README.md
@@ -7,28 +7,28 @@ and transcribed text (and metadata) of 160+ pages that were handwritten by two n
 Nicolas de Valdivia y Brisuela nearly 400 years ago. Through empirical evaluation, we demonstrate that our collection can be used to
 fine-tune Spanish LLMs for tasks such as classification and masked language modeling. 
 
-For further details, refer to our arXiv [pre-print](https://arxiv.org/pdf/2406.05812).
+
 
 # Table of Contents 
 
 1. [Dataset](#dataset)
 2. [Model](#model)
-3. [Acknowledgements](#acknowledgement)
+
 
 
 # Dataset 
 
 SANRlite had 162 pages containing 900+ sentences. Each sentence (or a group of sentences) was assigned one or more class labels and extended class labels. Extended class labels provide fined-grained representation. There are 33 class labels and 154 extended class labels that were assigned to the sentences. To semantically enrich the JSON metadata, for each class label, we searched Wikidata [31], a popular free and open knowledge base, to extract the uniform resource identifier (URI) for the class labels to precisely denote their meaning. The JSON metadata also includes the notary name, the year when the notary record was written, and the Rollo/image number. To download the dataset and utilize it, please follow the guideline given in [dataset-README.md](dataset/dataset-README.md)
 
 # Model
- We used SANR to do two down-stream task of language models using bert base language model. One is sentence classification and another is masked language model. We used bert base pretrained model to perform these tasks. For classification task, we used Multilingual BERT model, which is trained on text from multiple language along with Spanish. For MLM task, we used BETO: Spanish bert model, which is specifically trained on Spanish text. 
+ We used SANRlite to do two down-stream task of language models using bert base language model. One is sentence classification and another is masked language model. We used bert base pretrained model to perform these tasks. For classification task, we used Multilingual BERT model, which is trained on text from multiple language along with Spanish. For MLM task, we used BETO: Spanish bert model, which is specifically trained on Spanish text. 
 
 ## Download
 
-|                          |                  Model                   |                  Tokenizer                  |
-|:------------:|:----------------------------------------:|:-------------------------------------------:|
-| SANR Classification Model | [Model](https://mailmissouri-my.sharepoint.com/:f:/g/personal/sscx3_umsystem_edu/Em6J8fzd4KxLtVMo4YtoPywBn8OcPcG4NW1upggdcIJ5Cw?e=Gkud58) | [Tokenizer](https://mailmissouri-my.sharepoint.com/:f:/g/personal/sscx3_umsystem_edu/EkFVNqwHpDVOuFYT3hrxEEgBsG7ItzPm2NiMlbF5C1TxEQ?e=TZgkUC) |
-| SANR Masked Language Model | [Model](https://mailmissouri-my.sharepoint.com/:f:/g/personal/sscx3_umsystem_edu/El2jWbHfDs1Jtb0-bLA4BGgBCbBL_xAJ4ro65JCsCsILPg?e=j1efVP)  | [Tokenizer](https://mailmissouri-my.sharepoint.com/:f:/g/personal/sscx3_umsystem_edu/EhVwk6WAcudGsvaATfGAakEB3ccN6K4DMjl8e6Mew1zBSg?e=lYlCtY) |
+|                          |                  Model and  Tokenizer                  |
+|:------------:|:-------------------------------------------:|
+| SANRlite Classification  | [Link](https://drive.google.com/file/d/13pMvBPLlOjcGUEWjnfWgXzpt_F9HAr3V/view?usp=sharing) |
+| SANRlite Masked Language | [Link](https://drive.google.com/file/d/1PNE1Hdz_vM9lXiYC0wKvG7kccUPv7NIz/view?usp=sharing) |
 
 <!-- If you wish to download and use the model and tokenizer, please follow the steps mentioned in the [model-README.md](model/model-README.md). -->
 
@@ -106,6 +106,3 @@ Install the required libraries using pip:
     print(output)
 
 
-# Acknowlegdement
-This work was supported by the National Endowments for the Humanities Grant No. HAA-287903-22.
-
diff --git a/dataset/336920977-9f40fdcc-f8ed-443b-afda-866aec771730.png b/dataset/336920977-9f40fdcc-f8ed-443b-afda-866aec771730.png
diff --git a/dataset/336921934-30880d76-b0f1-4743-8b2f-6ac0dfe22182.png b/dataset/336921934-30880d76-b0f1-4743-8b2f-6ac0dfe22182.png
diff --git a/dataset/dataset-README.md b/dataset/dataset-README.md
@@ -12,22 +12,18 @@ This repository contains a dataset of images from 17th century American Spanish
 To view the annotations, you can use the labelImg software. Follow these steps to load the dataset and view the annotations:
 
 1. Download and install [LabelImg](https://github.com/HumanSignal/labelImg).
-2. Clone this repository to your local machine:
-   ```bash
-   git clone https://github.com/raopr/SpanishNotaryCollection.git
-
-
+2. Clone this repository to your local machine.
 3. The main page of LabelImg will look like the image shown below. At the beginning, you have to set the directory where your images and XML files are saved. After that, you need to set the directory where changes will be saved. To view the annotations in the LabelImg software, make sure that the scanned images and their corresponding XML files are in the same directory, as organized in the images directory. The annotations will look like the image below. The bounding boxes with the green circles in the corners represent the labeling or annotation process we performed.
 
-<img width="959" alt="Notary" src="https://github.com/raopr/SpanishNotaryCollection/assets/58792703/98732301-2875-44d8-999d-eba70bc038c4">
+<img width="959" alt="Notary" src="labelimg.png">
 
 
 
 ## Sample of Rollos
 Below are some sample images from rollos 40 and 38:
 
-<img width="565" alt="rollo" src="https://github.com/raopr/SpanishNotaryCollection/assets/58792703/9f40fdcc-f8ed-443b-afda-866aec771730">
+<img width="565" alt="rollo" src="336920977-9f40fdcc-f8ed-443b-afda-866aec771730.png">
 
 
-<img width="568" alt="Rolloss" src="https://github.com/raopr/SpanishNotaryCollection/assets/58792703/30880d76-b0f1-4743-8b2f-6ac0dfe22182">
+<img width="568" alt="Rolloss" src="336921934-30880d76-b0f1-4743-8b2f-6ac0dfe22182.png">
 
diff --git a/dataset/labelimg.png b/dataset/labelimg.png