From 59d4433d876ed0b8f6999c385d4953d160aaadee Mon Sep 17 00:00:00 2001 From: Yuanhe Tian Date: Tue, 5 Apr 2022 02:18:39 -0700 Subject: [PATCH 01/13] Add a paper published in EMNLP-2020 --- docs/constituency_parsing.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/constituency_parsing.md b/docs/constituency_parsing.md index 261ca82..66a6018 100644 --- a/docs/constituency_parsing.md +++ b/docs/constituency_parsing.md @@ -50,6 +50,7 @@ EM, F1, LP and LR can be calculated using the [Evalb](https://nlp.cs.nyu.edu/eva | [Liu and Zhang (2017)](https://www.aclweb.org/anthology/Q17-1029/) | 44.94 | 91.81 | - | - | [Github](https://github.com/dpfried/rnng-bert) | | [Zhou and Zhao (2019)](https://www.aclweb.org/anthology/P19-1230/) | - | 92.18 | 92.33 | 92.03 | [Github](https://github.com/DoodleJZ/HPSG-Neural-Parser) | | [Mrini et al. (2020)](https://arxiv.org/abs/1911.03875) | - | 92.64 | 93.45 | 91.85 | [Github](https://github.com/KhalilMrini/LAL-Parser) | +| [Tian et al. (2020)](https://aclanthology.org/2020.findings-emnlp.153/) | - | 92.66 | 92.50 | 92.83 | [Github](https://github.com/cuhksz-nlp/SAPar) | | [Yang and Deng (2020)](https://arxiv.org/abs/2010.14568) | 49.72 | 93.59 | 93.80 | 93.40 | [Github](https://github.com/princeton-vl/attach-juxtapose-parser) | From 2a527aad498ff29cf6e703b631dd0b914e333f97 Mon Sep 17 00:00:00 2001 From: Yuanhe Tian Date: Tue, 5 Apr 2022 02:32:17 -0700 Subject: [PATCH 02/13] Add new results on NLPCC DBQA 2016 --- docs/question_answering.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/question_answering.md b/docs/question_answering.md index 0bff637..5fa33fc 100644 --- a/docs/question_answering.md +++ b/docs/question_answering.md @@ -87,6 +87,7 @@ NLPCC DBQA 2016 | System | MRR | F1 | | --- | --- | --- | +| [ZEN 2.0](https://arxiv.org/abs/2105.01279) | 96.1 | 86.5 | | [ERNIE 2.0](https://arxiv.org/pdf/1907.12412.pdf) | 95.8 | 85.8 | | [Meng et. al. (2019)](https://arxiv.org/pdf/1901.10125.pdf) (Glyce + BERT) | - | 83.4 | | [ERNIE(baidu)](https://arxiv.org/pdf/1904.09223.pdf) | 95.1 | 82.7 | From 1b5201d3e0a2e64ea2880e70d55ff3f80d6093dd Mon Sep 17 00:00:00 2001 From: Yuanhe Tian Date: Tue, 5 Apr 2022 02:36:57 -0700 Subject: [PATCH 03/13] Add new results on ChnSentiCorp --- docs/sentiment_analysis.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/sentiment_analysis.md b/docs/sentiment_analysis.md index 1c4900a..67027ca 100644 --- a/docs/sentiment_analysis.md +++ b/docs/sentiment_analysis.md @@ -104,6 +104,7 @@ F1-score | | F1 | Accuracy | | --- | --- | --- | | [Chen et al., 2020: 3SiBert](https://www.aclweb.org/anthology/2020.lrec-1.293.pdf) | | 0.967 | https://www.aclweb.org/anthology/2020.lrec-1.293.pdf +| [ZEN 2.0](https://arxiv.org/abs/2105.01279) | | 0.965 | | [ERNIE 2.0](https://arxiv.org/pdf/1907.12412.pdf) | | 0.958 | | [ERNIE](https://arxiv.org/pdf/1904.09223.pdf) | | 0.954 | | BERT * | | 0.943 | From fe265529529a102107c756831e37414fe38ce2a4 Mon Sep 17 00:00:00 2001 From: Yuanhe Tian Date: Tue, 5 Apr 2022 02:49:24 -0700 Subject: [PATCH 04/13] Add new results on NER MSRA and Weibo --- docs/entity_tagging.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/entity_tagging.md b/docs/entity_tagging.md index b138787..e6ca27e 100644 --- a/docs/entity_tagging.md +++ b/docs/entity_tagging.md @@ -105,6 +105,7 @@ Paper summarizing the bakeoff: | System | F-score | | --- | --- | +| [Song et al. (2021)](https://arxiv.org/abs/2105.01279) | 96.2 | | [Liu et al (2020)](https://arxiv.org/pdf/1909.07606.pdf) | 95.7 | | [Meng et. al. (2019)](https://arxiv.org/abs/1901.10125) | 95.5 | | [Ma et al (2020)](https://www.aclweb.org/anthology/2020.acl-main.528.pdf) | 95.4 | @@ -142,6 +143,8 @@ Using the test split by http://www.aclweb.org/anthology/E17-2113: | System | F-score (name mentions) | F-score (nominal mentions) | F-score (Overall) | | --- | --- | --- | --- | | [Ma et al (2020)](https://www.aclweb.org/anthology/2020.acl-main.528.pdf) | 70.9 | 67.0 | 70.5 | +| [Nie et al. (2020)](https://aclanthology.org/2020.emnlp-main.107/)| | | 69.80 | +| [Nie et al. (2020)](https://aclanthology.org/2020.findings-emnlp.378/)| | | 69.78 | | [Meng et. al. (2019)](https://arxiv.org/abs/1901.10125) | 67.6 | | | | [Hu and Zheng (2020)](https://www.jstage.jst.go.jp/article/transinf/E103.D/7/E103.D_2019EDP7253/_pdf/-char/ja) | 56.4 | | | | [Sui et al. (2019)](https://www.aclweb.org/anthology/D19-1396/) | 56.45 | 68.32 | 63.09 | From 3fb1037cbc48dc6a8893b8db1e628fc33d20ef11 Mon Sep 17 00:00:00 2001 From: Yuanhe Tian Date: Tue, 5 Apr 2022 02:59:04 -0700 Subject: [PATCH 05/13] update the results on SIGHAN 2005 --- docs/word_segmentation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/word_segmentation.md b/docs/word_segmentation.md index f9d1deb..1ac5f29 100644 --- a/docs/word_segmentation.md +++ b/docs/word_segmentation.md @@ -53,8 +53,8 @@ F1 = 0.857 | Model | AS | CITYU | MSRA | PKU | | --- | --- | --- | --- | --- | | [Ke et al. (2021)](https://aclanthology.org/2021.naacl-main.436/) | 97.0 | 98.2 | 98.5 | 96.9 | -| [Qiu, Pei, Yan, Huang (2020)](https://aclanthology.org/2020.findings-emnlp.260/) | 96.4 | 96.9 | 98.1 | 96.4 | | [Tian, Song, Xia, Zhang, Wang (2020)](https://www.aclweb.org/anthology/2020.acl-main.734/) | 96.6 | 97.9 | 98.4 | 96.5 | +| [Qiu, Pei, Yan, Huang (2020)](https://aclanthology.org/2020.findings-emnlp.260/) | 96.4 | 96.9 | 98.1 | 96.4 | | [Meng et al. (2019)](https://arxiv.org/abs/1901.10125) | 96.7* | 97.9* | 98.3 | 96.7 | | [Huang et al. (2019)](https://arxiv.org/abs/1903.04190)| 96.6 | 97.6 | 97.9 | 96.6 | | [Ma et al. (2018)](http://aclweb.org/anthology/D18-1529) | 96.2 | 97.2 | 97.4 | 96.1 | From 77bfb8e300b6ab7316193453c3af506950355206 Mon Sep 17 00:00:00 2001 From: Yuanhe Tian Date: Tue, 5 Apr 2022 16:01:12 -0700 Subject: [PATCH 06/13] Update other resources --- docs/word_embedding.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/word_embedding.md b/docs/word_embedding.md index 08e5429..c8f6b55 100644 --- a/docs/word_embedding.md +++ b/docs/word_embedding.md @@ -146,6 +146,7 @@ Given “France : Paris :: China : ?”, a system should come up with the answer | Name | Additional features | Training Corpus Size | Source | | --- | --- | --- | --- | +| [Tencent Embedding](https://ai.tencent.com/ailab/nlp/en/embedding.html) | 8M Chinese words, 200 dimension | | [Song et al. (2018)](https://aclanthology.org/N18-2028/) | | FastText | - | 374M characters | [Grave et al., 2018](https://arxiv.org/pdf/1802.06893.pdf) | | Mimick | Interpolate between similar characters to improve rare words, multilingual | | [Pinter et al., 2017](https://www.aclweb.org/anthology/D17-1010.pdf) | | Glyph2vec | Uses character bitmaps, canjie to address OOV problem | 10M chars | [Chen et al., 2020](https://www.aclweb.org/anthology/2020.acl-main.256.pdf) | From ae23d81af653eee0b1b1c25cd53af224b6b4470d Mon Sep 17 00:00:00 2001 From: Yuanhe Tian Date: Tue, 5 Apr 2022 16:19:50 -0700 Subject: [PATCH 07/13] update other resources --- docs/word_embedding.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/word_embedding.md b/docs/word_embedding.md index c8f6b55..6ca27ac 100644 --- a/docs/word_embedding.md +++ b/docs/word_embedding.md @@ -146,7 +146,7 @@ Given “France : Paris :: China : ?”, a system should come up with the answer | Name | Additional features | Training Corpus Size | Source | | --- | --- | --- | --- | -| [Tencent Embedding](https://ai.tencent.com/ailab/nlp/en/embedding.html) | 8M Chinese words, 200 dimension | | [Song et al. (2018)](https://aclanthology.org/N18-2028/) | +| [Tencent Embedding](https://ai.tencent.com/ailab/nlp/en/embedding.html) | 8M Chinese words, 200 dimension | | [Song et al. 2018](https://aclanthology.org/N18-2028/) | | FastText | - | 374M characters | [Grave et al., 2018](https://arxiv.org/pdf/1802.06893.pdf) | | Mimick | Interpolate between similar characters to improve rare words, multilingual | | [Pinter et al., 2017](https://www.aclweb.org/anthology/D17-1010.pdf) | | Glyph2vec | Uses character bitmaps, canjie to address OOV problem | 10M chars | [Chen et al., 2020](https://www.aclweb.org/anthology/2020.acl-main.256.pdf) | From 5667db31bf23b8fcd0b39091e4cd6c0a2af50231 Mon Sep 17 00:00:00 2001 From: Yuanhe Tian Date: Tue, 5 Apr 2022 16:20:15 -0700 Subject: [PATCH 08/13] update other resources --- docs/word_embedding.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/word_embedding.md b/docs/word_embedding.md index 6ca27ac..d694357 100644 --- a/docs/word_embedding.md +++ b/docs/word_embedding.md @@ -146,7 +146,7 @@ Given “France : Paris :: China : ?”, a system should come up with the answer | Name | Additional features | Training Corpus Size | Source | | --- | --- | --- | --- | -| [Tencent Embedding](https://ai.tencent.com/ailab/nlp/en/embedding.html) | 8M Chinese words, 200 dimension | | [Song et al. 2018](https://aclanthology.org/N18-2028/) | +| [Tencent Embedding](https://ai.tencent.com/ailab/nlp/en/embedding.html) | 8M Chinese words, 200 dimension | | [Song et al., 2018](https://aclanthology.org/N18-2028/) | | FastText | - | 374M characters | [Grave et al., 2018](https://arxiv.org/pdf/1802.06893.pdf) | | Mimick | Interpolate between similar characters to improve rare words, multilingual | | [Pinter et al., 2017](https://www.aclweb.org/anthology/D17-1010.pdf) | | Glyph2vec | Uses character bitmaps, canjie to address OOV problem | 10M chars | [Chen et al., 2020](https://www.aclweb.org/anthology/2020.acl-main.256.pdf) | From 35facb6e1a88b41d8ebe721649e70b19cd4abd05 Mon Sep 17 00:00:00 2001 From: Yuanhe Tian Date: Tue, 5 Apr 2022 21:44:27 -0700 Subject: [PATCH 09/13] Add new results for word embeddings --- docs/word_embedding.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/word_embedding.md b/docs/word_embedding.md index d694357..b03648c 100644 --- a/docs/word_embedding.md +++ b/docs/word_embedding.md @@ -61,7 +61,9 @@ See e.g. [Torregrossa et al., 2020](https://www.aclweb.org/anthology/2020.lrec-1 | System | wordsim-240 (⍴) | wordsim-296 (⍴) | | --- | --- | --- | | [Sun et. al. (2019)](https://arxiv.org/pdf/1902.08795.pdf) (VCWE) | 57.81 | 61.29 | +| [Song et. al. (2018)](https://www.ijcai.org/Proceedings/2018/0608.pdf) | 54.14 | 57.04 | | [Yu et. al. (2017)](https://www.aclweb.org/anthology/D17-1027) (JWE) | 51.92 | 59.84 | +| Baseline (CBOW) | 51.01 | 53.65 | @@ -146,7 +148,7 @@ Given “France : Paris :: China : ?”, a system should come up with the answer | Name | Additional features | Training Corpus Size | Source | | --- | --- | --- | --- | -| [Tencent Embedding](https://ai.tencent.com/ailab/nlp/en/embedding.html) | 8M Chinese words, 200 dimension | | [Song et al., 2018](https://aclanthology.org/N18-2028/) | +| DSG | Leverage directional information to improve skip-gram algorithm | | [Song et al., 2018](https://aclanthology.org/N18-2028/) | | FastText | - | 374M characters | [Grave et al., 2018](https://arxiv.org/pdf/1802.06893.pdf) | | Mimick | Interpolate between similar characters to improve rare words, multilingual | | [Pinter et al., 2017](https://www.aclweb.org/anthology/D17-1010.pdf) | | Glyph2vec | Uses character bitmaps, canjie to address OOV problem | 10M chars | [Chen et al., 2020](https://www.aclweb.org/anthology/2020.acl-main.256.pdf) | From 4aa964dcf12a215a94bcb85a48e6598f2f0edfb8 Mon Sep 17 00:00:00 2001 From: Yuanhe Tian Date: Tue, 5 Apr 2022 21:53:16 -0700 Subject: [PATCH 10/13] Add new results for constituency parsing --- docs/constituency_parsing.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/constituency_parsing.md b/docs/constituency_parsing.md index 66a6018..a28d5df 100644 --- a/docs/constituency_parsing.md +++ b/docs/constituency_parsing.md @@ -48,6 +48,7 @@ EM, F1, LP and LR can be calculated using the [Evalb](https://nlp.cs.nyu.edu/eva | System | EM | F1 | LP | LR | code | | --- | --- | --- | --- | --- | --- | | [Liu and Zhang (2017)](https://www.aclweb.org/anthology/Q17-1029/) | 44.94 | 91.81 | - | - | [Github](https://github.com/dpfried/rnng-bert) | +| [Fried et al. (2019)](https://aclanthology.org/P19-1031/) | 44.42 | 92.14 | - | - | [Github](https://github.com/dpfried/rnng-bert) | | [Zhou and Zhao (2019)](https://www.aclweb.org/anthology/P19-1230/) | - | 92.18 | 92.33 | 92.03 | [Github](https://github.com/DoodleJZ/HPSG-Neural-Parser) | | [Mrini et al. (2020)](https://arxiv.org/abs/1911.03875) | - | 92.64 | 93.45 | 91.85 | [Github](https://github.com/KhalilMrini/LAL-Parser) | | [Tian et al. (2020)](https://aclanthology.org/2020.findings-emnlp.153/) | - | 92.66 | 92.50 | 92.83 | [Github](https://github.com/cuhksz-nlp/SAPar) | From 1d0e0ee62e4ca85a192737c70d2f844446b0b54b Mon Sep 17 00:00:00 2001 From: Yuanhe Tian Date: Tue, 5 Apr 2022 22:09:30 -0700 Subject: [PATCH 11/13] Add new results on NER --- docs/entity_tagging.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/entity_tagging.md b/docs/entity_tagging.md index e6ca27e..8e0fce5 100644 --- a/docs/entity_tagging.md +++ b/docs/entity_tagging.md @@ -109,6 +109,7 @@ Paper summarizing the bakeoff: | [Liu et al (2020)](https://arxiv.org/pdf/1909.07606.pdf) | 95.7 | | [Meng et. al. (2019)](https://arxiv.org/abs/1901.10125) | 95.5 | | [Ma et al (2020)](https://www.aclweb.org/anthology/2020.acl-main.528.pdf) | 95.4 | +| [Diao et al. (2020)](https://aclanthology.org/2020.findings-emnlp.425/)| 95.3 | | [Sun et al (2020)](https://arxiv.org/pdf/1907.12412.pdf) | 95.0 | | [Yan et al (2020)](https://ieeexplore.ieee.org/abstract/document/9141551) | 94.1 | | [Liu et. al. (2019)](https://www.aclweb.org/anthology/N19-1247) | 93.74 | From f4eba9e4a37d415595f12b8590f651ab3e219173 Mon Sep 17 00:00:00 2001 From: Yuanhe Tian Date: Tue, 5 Apr 2022 23:23:58 -0700 Subject: [PATCH 12/13] Add new topic classification results --- docs/topic_classification.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/topic_classification.md b/docs/topic_classification.md index b94a295..9b06524 100644 --- a/docs/topic_classification.md +++ b/docs/topic_classification.md @@ -40,6 +40,9 @@ Sina News RSS subscription channel data from 2005 to 2011, which contains 74 mil | | Accuracy | | --- | --- | | [J. Chen, C. Cao, X. Jiang](https://www.aclweb.org/anthology/2020.lrec-1.293.pdf) | 98.7% | +| [Song, et al., (2021)](https://arxiv.org/abs/2105.01279) | 97.93% | +| [Cui, et al., (2020)](https://arxiv.org/pdf/2004.13922.pdf) | 97.9% | +| [Diao, et al., (2020)](https://aclanthology.org/2020.findings-emnlp.425/) | 97.64% | | [Y. Song](https://iopscience.iop.org/article/10.1088/1742-6596/1453/1/012156/pdf)| 97.56% | | [W. Liu, P. Zhou, et al](https://www.aclweb.org/anthology/2020.acl-main.537.pdf) | 96.71% | | [S. Xin](https://iopscience.iop.org/article/10.1088/1742-6596/1549/2/022011/pdf) | 96.04% | From e3e79a2d9158b25c55cb622108144a5dd9037c9c Mon Sep 17 00:00:00 2001 From: Yuanhe Tian Date: Thu, 7 Apr 2022 15:25:17 -0700 Subject: [PATCH 13/13] Add new sentiment analysis results --- docs/sentiment_analysis.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/sentiment_analysis.md b/docs/sentiment_analysis.md index 67027ca..ffecb53 100644 --- a/docs/sentiment_analysis.md +++ b/docs/sentiment_analysis.md @@ -105,6 +105,7 @@ F1-score | --- | --- | --- | | [Chen et al., 2020: 3SiBert](https://www.aclweb.org/anthology/2020.lrec-1.293.pdf) | | 0.967 | https://www.aclweb.org/anthology/2020.lrec-1.293.pdf | [ZEN 2.0](https://arxiv.org/abs/2105.01279) | | 0.965 | +| [ZEN](https://aclanthology.org/2020.findings-emnlp.425/) | | 0.961 | | [ERNIE 2.0](https://arxiv.org/pdf/1907.12412.pdf) | | 0.958 | | [ERNIE](https://arxiv.org/pdf/1904.09223.pdf) | | 0.954 | | BERT * | | 0.943 |