Skip to content

Commit f36005c

Browse files
authored
Documentation (#6)
*simplify notebooks *improve documentation *link to binder
1 parent 2e13a12 commit f36005c

12 files changed

+2396
-18
lines changed

README.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,33 @@
11
# pidgraph-notebooks
2+
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/Project-TAPIR/pidgraph-notebooks/main)
3+
24
A collection of Jupyter notebooks with examples of querying different PID providers like [ORCID](https://orcid.org/), [ROR](https://ror.readme.io/), [Crossref](https://www.crossref.org/) and PID graphs like the [FREYA PID Graph](https://blog.datacite.org/powering-the-pid-graph/) and [OpenAlex](https://openalex.org/about) for connected objects.
35

46
Currently included connections:
57
* organization-organization
8+
* input: ROR
9+
* output: hierarchy of sub-organizations starting at given organization, each identified by their ROR
10+
* data sources: ROR
611
* organization-people
12+
* input: ROR
13+
* output: list of people affiliated with the organization, each identified by their ORCID iD
14+
* data sources: FREYA PID Graph, OpenAlex, ORCID
715
* person-works
16+
* input: ORCID
17+
* output: list of works authored/created by the person, each identified by their DOI
18+
* data sources: Crossref, FREYA PID Graph, OpenAlex, ORCID
19+
20+
21+
Please navigate into the respective folder to see the list of available notebooks.
822

9-
Please navigate into the respective folder to see the list of available notebooks. While GitHub renders Jupyter notebooks as static HTML files (not executable), each folder includes a README with links to launch the notebooks in Google Colaboratory where you can execute and modify them.
23+
### Run notebooks
24+
While GitHub renders Jupyter notebooks as static HTML files (not executable),
25+
you can use this link to launch the notebooks on Binder where you can execute and modify them:
26+
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/Project-TAPIR/pidgraph-notebooks/main)
1027

1128
----------------------------
1229

1330
### Background
1431
In the joint project [TAPIR](https://projects.tib.eu/tapir/en/) (Partially Automated Persistent Identifier-based Reporting), partially automated procedures for research reporting are being tested in the context of university and non-university research. To this end, the question is being investigated :
1532

16-
To what extent can the necessary data aggregation be carried out on the basis of openly available research information using persistent identifiers?
33+
To what extent can the necessary data aggregation be carried out on the basis of openly available research information using persistent identifiers?

organization-organization/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22

33
A Jupyter notebook showing an example of using a persistent identifier for an organization (here ROR ID) as input for retrieving all sub-organizations (also identified by a ROR ID) connected to it.
44

5-
* [ROR organigram](https://ror.readme.io/) [![Google Colab](https://badgen.net/badge/Launch/on%20Google%20Colab/blue?icon=terminal)](https://colab.research.google.com/github/Project-TAPIR/pidgraph-notebooks/blob/main/organization-organization/ror-organigram.ipynb)
5+
* [ROR](https://ror.readme.io/) organigram
Lines changed: 213 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,213 @@
1-
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"ror-organigram.ipynb","provenance":[{"file_id":"1yJn6R6ixeEZFU47XyeZsTiqhU-W7AS2H","timestamp":1643279199327}],"authorship_tag":"ABX9TyPEl1r8wtGqaTN4CJj87Lso"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["### Query ROR for an organization and the hierarchy of sub-organizations below it\n","\n","This notebook queries the [ROR API](https://ror.readme.io/) to construct a hierarchy of sub-organizations starting at a given organization. It takes a ROR URL or ID which is used to retrieve the organizations specified in the metadata field \"`relationships`\" with `\"type\"=Child` recursively to build a tree structure. The tree represents the hierarchy of sub-organizations and will be outputted.\n"],"metadata":{"id":"0-62NAVa8DL0"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"VaYUrh5n2iq9"},"outputs":[],"source":["# needed dependency to make HTTP calls\n","import requests\n","# dependency to construct tree structure\n","!pip install anytree\n","from anytree import Node, RenderTree"]},{"cell_type":"markdown","source":["The input value for all following queries is a ROR ID or ROR URL."],"metadata":{"id":"YmmrIwjv3QqG"}},{"cell_type":"code","source":["# input parameter\n","example_ror=\"https://ror.org/03vek6s52\""],"metadata":{"id":"UhY0RQcU3Q1Z","executionInfo":{"status":"ok","timestamp":1643279169086,"user_tz":-60,"elapsed":19,"user":{"displayName":"Sandra M","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjjehryRcYlqHNFf_9Q6slGN_VZPE0y5QkvOxzG=s64","userId":"04602594913862593282"}}},"execution_count":2,"outputs":[]},{"cell_type":"markdown","source":["We may use it to query the ROR API once for the organization's metadata."],"metadata":{"id":"4MApaFfD6knr"}},{"cell_type":"code","source":["# URL to ROR API\n","ROR_API_ENDPOINT = \"https://api.ror.org/organizations\"\n","\n","# query ROR API for organization's metadata\n","def query_ror_api(ror):\n"," response = requests.get(url=requests.utils.requote_uri(ROR_API_ENDPOINT + \"/\" + ror),\n"," headers={'Content-Type': 'application/json'})\n"," result=response.json()\n"," return result\n","\n","#---- example execution\n","# uncomment following lines to see the metadata for specified example_ror\n","#import pprint\n","#organization = query_ror_api(example_ror)\n","#pprint.pprint(organization)"],"metadata":{"id":"YpwZ3mrC3dHO","executionInfo":{"status":"ok","timestamp":1643279169087,"user_tz":-60,"elapsed":16,"user":{"displayName":"Sandra M","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjjehryRcYlqHNFf_9Q6slGN_VZPE0y5QkvOxzG=s64","userId":"04602594913862593282"}}},"execution_count":3,"outputs":[]},{"cell_type":"markdown","source":["But in this notebook we use it as a starting point to recursively query the ROR API using the relationship type \"`Child`\" to construct the organizational hierarchy below it."],"metadata":{"id":"T25jtRBf3c1T"}},{"cell_type":"code","source":["# construct organizational tree recursively starting at given ROR\n","def construct_tree(ror, parent=None):\n"," organization = query_ror_api(ror)\n"," current_node = Node(organization[\"name\"], parent=parent)\n","\n"," for rel in organization['relationships']:\n"," if rel[\"type\"]==\"Child\":\n"," construct_tree(rel[\"id\"], current_node)\n","\n"," return current_node\n","\n","\n","#---- example execution\n","organigram = construct_tree(example_ror)\n","print(RenderTree(organigram))"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"4jh1liI56A4x","executionInfo":{"status":"ok","timestamp":1643279175634,"user_tz":-60,"elapsed":6559,"user":{"displayName":"Sandra M","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjjehryRcYlqHNFf_9Q6slGN_VZPE0y5QkvOxzG=s64","userId":"04602594913862593282"}},"outputId":"ea1cbf18-6c90-4aa3-bfcb-10addcaa9e8f"},"execution_count":4,"outputs":[{"output_type":"stream","name":"stdout","text":["Node('/Harvard University')\n","├── Node('/Harvard University/Athinoula A. Martinos Center for Biomedical Imaging')\n","├── Node('/Harvard University/Berenson Allen Center for Noninvasive Brain Stimulation')\n","├── Node('/Harvard University/Center for Astrophysics Harvard & Smithsonian')\n","│ ├── Node('/Harvard University/Center for Astrophysics Harvard & Smithsonian/Harvard College Observatory')\n","│ └── Node('/Harvard University/Center for Astrophysics Harvard & Smithsonian/Smithsonian Astrophysical Observatory')\n","├── Node('/Harvard University/Center for Systems Biology')\n","├── Node('/Harvard University/Center for Vascular Biology Research')\n","├── Node('/Harvard University/Gordon Center for Medical Imaging')\n","├── Node('/Harvard University/Harvard Stem Cell Institute')\n","├── Node('/Harvard University/Harvard University Press')\n","├── Node('/Harvard University/MIT-Harvard Center for Ultracold Atoms')\n","├── Node('/Harvard University/Ragon Institute of MGH, MIT and Harvard')\n","├── Node('/Harvard University/Sleep and Human Health Institute')\n","└── Node('/Harvard University/The NSF AI Institute for Artificial Intelligence and Fundamental Interactions')\n"]}]}]}
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {
6+
"id": "0-62NAVa8DL0"
7+
},
8+
"source": [
9+
"### Query ROR for an organization and the hierarchy of sub-organizations below it\n",
10+
"\n",
11+
"This notebook queries the [ROR API](https://ror.readme.io/) to construct a hierarchy of sub-organizations starting at a given organization. It takes a ROR URL or ID which is used to retrieve the organizations specified in the metadata field \"`relationships`\" with `\"type\"=Child` recursively to build a tree structure. The tree represents the hierarchy of sub-organizations and will be outputted.\n"
12+
]
13+
},
14+
{
15+
"cell_type": "code",
16+
"execution_count": null,
17+
"metadata": {
18+
"id": "VaYUrh5n2iq9",
19+
"scrolled": true
20+
},
21+
"outputs": [],
22+
"source": [
23+
"# Prerequisites:\n",
24+
"!pip install requests\n",
25+
"import requests # dependency to make HTTP calls\n",
26+
"!pip install anytree\n",
27+
"from anytree import Node, RenderTree # dependency to construct tree structure"
28+
]
29+
},
30+
{
31+
"cell_type": "markdown",
32+
"metadata": {
33+
"id": "YmmrIwjv3QqG"
34+
},
35+
"source": [
36+
"The input value for all following queries is a ROR URL or ID, e.g. '`https://ror.org/03vek6s52`' or '`03vek6s52`'."
37+
]
38+
},
39+
{
40+
"cell_type": "code",
41+
"execution_count": 2,
42+
"metadata": {
43+
"executionInfo": {
44+
"elapsed": 19,
45+
"status": "ok",
46+
"timestamp": 1643279169086,
47+
"user": {
48+
"displayName": "Sandra M",
49+
"photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GjjehryRcYlqHNFf_9Q6slGN_VZPE0y5QkvOxzG=s64",
50+
"userId": "04602594913862593282"
51+
},
52+
"user_tz": -60
53+
},
54+
"id": "UhY0RQcU3Q1Z"
55+
},
56+
"outputs": [],
57+
"source": [
58+
"# input parameter\n",
59+
"example_ror=\"https://ror.org/03vek6s52\""
60+
]
61+
},
62+
{
63+
"cell_type": "markdown",
64+
"metadata": {
65+
"id": "4MApaFfD6knr"
66+
},
67+
"source": [
68+
"We may use it to query the ROR API once for the organization's metadata..."
69+
]
70+
},
71+
{
72+
"cell_type": "code",
73+
"execution_count": 3,
74+
"metadata": {
75+
"executionInfo": {
76+
"elapsed": 16,
77+
"status": "ok",
78+
"timestamp": 1643279169087,
79+
"user": {
80+
"displayName": "Sandra M",
81+
"photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GjjehryRcYlqHNFf_9Q6slGN_VZPE0y5QkvOxzG=s64",
82+
"userId": "04602594913862593282"
83+
},
84+
"user_tz": -60
85+
},
86+
"id": "YpwZ3mrC3dHO"
87+
},
88+
"outputs": [],
89+
"source": [
90+
"# URL to ROR API\n",
91+
"ROR_API_ENDPOINT = \"https://api.ror.org/organizations\"\n",
92+
"\n",
93+
"# query ROR API for organization's metadata\n",
94+
"def query_ror_api(ror):\n",
95+
" complete_url=requests.utils.requote_uri(ROR_API_ENDPOINT + \"/\" + ror)\n",
96+
" response = requests.get(url=complete_url,\n",
97+
" headers={'Accept': 'application/json'})\n",
98+
" response.raise_for_status()\n",
99+
" result=response.json()\n",
100+
" return result\n",
101+
"\n",
102+
"\n",
103+
"#---- example execution\n",
104+
"# uncomment following lines to see the metadata for specified example_ror\n",
105+
"#import pprint\n",
106+
"#organization = query_ror_api(example_ror)\n",
107+
"#pprint.pprint(organization)"
108+
]
109+
},
110+
{
111+
"cell_type": "markdown",
112+
"metadata": {
113+
"id": "T25jtRBf3c1T"
114+
},
115+
"source": [
116+
"But in this notebook we use it as a starting point to recursively query the ROR API using the relationship type \"`Child`\" to construct the organizational hierarchy below it."
117+
]
118+
},
119+
{
120+
"cell_type": "code",
121+
"execution_count": 4,
122+
"metadata": {
123+
"colab": {
124+
"base_uri": "https://localhost:8080/"
125+
},
126+
"executionInfo": {
127+
"elapsed": 6559,
128+
"status": "ok",
129+
"timestamp": 1643279175634,
130+
"user": {
131+
"displayName": "Sandra M",
132+
"photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GjjehryRcYlqHNFf_9Q6slGN_VZPE0y5QkvOxzG=s64",
133+
"userId": "04602594913862593282"
134+
},
135+
"user_tz": -60
136+
},
137+
"id": "4jh1liI56A4x",
138+
"outputId": "ea1cbf18-6c90-4aa3-bfcb-10addcaa9e8f"
139+
},
140+
"outputs": [
141+
{
142+
"name": "stdout",
143+
"output_type": "stream",
144+
"text": [
145+
"Node('/Harvard University')\n",
146+
"├── Node('/Harvard University/Athinoula A. Martinos Center for Biomedical Imaging')\n",
147+
"├── Node('/Harvard University/Berenson Allen Center for Noninvasive Brain Stimulation')\n",
148+
"├── Node('/Harvard University/Center for Astrophysics Harvard & Smithsonian')\n",
149+
"│ ├── Node('/Harvard University/Center for Astrophysics Harvard & Smithsonian/Harvard College Observatory')\n",
150+
"│ └── Node('/Harvard University/Center for Astrophysics Harvard & Smithsonian/Smithsonian Astrophysical Observatory')\n",
151+
"├── Node('/Harvard University/Center for Systems Biology')\n",
152+
"├── Node('/Harvard University/Center for Vascular Biology Research')\n",
153+
"├── Node('/Harvard University/Gordon Center for Medical Imaging')\n",
154+
"├── Node('/Harvard University/Harvard Stem Cell Institute')\n",
155+
"├── Node('/Harvard University/Harvard University Press')\n",
156+
"├── Node('/Harvard University/MIT-Harvard Center for Ultracold Atoms')\n",
157+
"├── Node('/Harvard University/Ragon Institute of MGH, MIT and Harvard')\n",
158+
"├── Node('/Harvard University/Sleep and Human Health Institute')\n",
159+
"└── Node('/Harvard University/The NSF AI Institute for Artificial Intelligence and Fundamental Interactions')\n"
160+
]
161+
}
162+
],
163+
"source": [
164+
"# construct organizational tree recursively starting at given ROR\n",
165+
"def construct_tree(ror, parent=None):\n",
166+
" organization = query_ror_api(ror)\n",
167+
" current_node = Node(organization[\"name\"], parent=parent)\n",
168+
"\n",
169+
" for rel in organization['relationships']:\n",
170+
" if rel[\"type\"]==\"Child\":\n",
171+
" construct_tree(rel[\"id\"], current_node)\n",
172+
"\n",
173+
" return current_node\n",
174+
"\n",
175+
"\n",
176+
"#---- example execution\n",
177+
"organigram = construct_tree(example_ror)\n",
178+
"print(RenderTree(organigram))"
179+
]
180+
}
181+
],
182+
"metadata": {
183+
"colab": {
184+
"authorship_tag": "ABX9TyPEl1r8wtGqaTN4CJj87Lso",
185+
"name": "ror-organigram.ipynb",
186+
"provenance": [
187+
{
188+
"file_id": "1yJn6R6ixeEZFU47XyeZsTiqhU-W7AS2H",
189+
"timestamp": 1643279199327
190+
}
191+
]
192+
},
193+
"kernelspec": {
194+
"display_name": "Python 3 (ipykernel)",
195+
"language": "python",
196+
"name": "python3"
197+
},
198+
"language_info": {
199+
"codemirror_mode": {
200+
"name": "ipython",
201+
"version": 3
202+
},
203+
"file_extension": ".py",
204+
"mimetype": "text/x-python",
205+
"name": "python",
206+
"nbconvert_exporter": "python",
207+
"pygments_lexer": "ipython3",
208+
"version": "3.9.6"
209+
}
210+
},
211+
"nbformat": 4,
212+
"nbformat_minor": 1
213+
}

organization-people/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,6 @@
33
A collection of Jupyter notebooks showing examples of using a persistent identifier for an organization (here ROR ID) as input for different APIs of PID providers or PID Graphs and retrieving all people (identified by an ORCID iD) connected to it.
44

55
Currently available PID Graphs:
6-
* [FREYA PID Graph](https://blog.datacite.org/powering-the-pid-graph/) [![Google Colab](https://badgen.net/badge/Launch/on%20Google%20Colab/blue?icon=terminal)](https://colab.research.google.com/github/Project-TAPIR/pidgraph-notebooks/blob/main/organization-people/freya_get_people_by_organization.ipynb)
7-
* [OpenAlex](https://openalex.org/about)[![Google Colab](https://badgen.net/badge/Launch/on%20Google%20Colab/blue?icon=terminal)](https://colab.research.google.com/github/Project-TAPIR/pidgraph-notebooks/blob/main/organization-people/openalex_get_people_by_organization.ipynb)
8-
* [ORCID](https://orcid.org/)[![Google Colab](https://badgen.net/badge/Launch/on%20Google%20Colab/blue?icon=terminal)](https://colab.research.google.com/github/Project-TAPIR/pidgraph-notebooks/blob/main/organization-people/orcid_get_people_by_organization.ipynb)
6+
* [FREYA PID Graph](https://blog.datacite.org/powering-the-pid-graph/)
7+
* [OpenAlex](https://openalex.org/about)
8+
* [ORCID](https://orcid.org/)

0 commit comments

Comments
 (0)