ECOOP 2021-2025 redux: fix unicode #91
Conversation
|
Thanks! |
|
Thank you very much for rebasing my MR. I totally forgot about doing that myself. |
|
@robbertkrebbers no problem, thanks for preparing the initial PR! It's a pity that pcminer requires HTML entities instead of Unicode I think. It'd be great to teach it UTF-8 one day @msridhar... |
PRs welcome! 🙂 I believe DBLP using HTML encoding so the code is just trying to make it easier to match that. Some kind of normalization of everything would of course be great |
|
Right. I suspected that the dblp html pages are the root of this issue. But there are good news: unlike researchr, dblp actually entered 2020s and exposes machine-readable interfaces. E.g. the whole database is available as one XML file: https://dblp.org/faq/How+can+I+download+the+whole+dblp+dataset.html There's also a search API but Google AI is telling me that it's "less reliable" (which makes sense for a search). |
this superceeds #87
I took #87, rebased it on the current master, ran recode as suggested in the readme, and ran
./gradlew run. I can see new data throughui/index.htmllocally.