Skip to content

Commit 0259863

Browse files
authored
Organization people (#2)
* query for openalex * query for freya * query orcid for current employees * add readme with google colab links
1 parent d84029b commit 0259863

File tree

4 files changed

+11
-0
lines changed

4 files changed

+11
-0
lines changed

organization-people/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
## organization-people
2+
3+
A collection of Jupyter notebooks showing examples of using a persistent identifier for an organization (here ROR ID) as input for different APIs of PID providers or PID Graphs and retrieving all people (identified by an ORCID iD) connected to it.
4+
5+
Currently available PID Graphs:
6+
* [FREYA PID Graph](https://blog.datacite.org/powering-the-pid-graph/) [![Google Colab](https://badgen.net/badge/Launch/on%20Google%20Colab/blue?icon=terminal)](https://colab.research.google.com/github/Project-TAPIR/pidgraph-notebooks/blob/main/organization-people/freya_get_people_by_organization.ipynb)
7+
* [OpenAlex](https://openalex.org/about)[![Google Colab](https://badgen.net/badge/Launch/on%20Google%20Colab/blue?icon=terminal)](https://colab.research.google.com/github/Project-TAPIR/pidgraph-notebooks/blob/main/organization-people/openalex_get_people_by_organization.ipynb)
8+
* [ORCID](https://orcid.org/)[![Google Colab](https://badgen.net/badge/Launch/on%20Google%20Colab/blue?icon=terminal)](https://colab.research.google.com/github/Project-TAPIR/pidgraph-notebooks/blob/main/organization-people/orcid_get_people_by_organization.ipynb)
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Kopie von Kopie von freya_get_people_by_organization.ipynb","provenance":[{"file_id":"https://github.com/Project-TAPIR/pidgraph-notebooks/blob/organization-people/organization-people/freya_get_people_by_organization.ipynb","timestamp":1643208926409}],"authorship_tag":"ABX9TyOPyixqZithrfY0TncA4o1K"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["### Query the FREYA PID Graph for all people affiliated with an organization\n","\n","This notebook queries the [FREYA PID Graph](https://blog.datacite.org/powering-the-pid-graph/) via [Datacite's GraphQL API](https://api.datacite.org/graphql) to retrieve all people affiliated with an organization. It takes a ROR URL as input which is used to retrieve the according Grid and Ringgold ID of the organization and query the ORCID API with it [for affiliated people](https://info.orcid.org/faq/how-do-i-find-orcid-record-holders-at-my-institution/). From the resulting list of people we output the ORCID iDs."],"metadata":{"id":"etxiXTW668ZD"}},{"cell_type":"code","source":["# needed dependency to make HTTP calls\n","import requests\n","# dependencies for dealing with json\n","!pip install python-benedict\n","from benedict import benedict"],"metadata":{"id":"8Mk7-aYc7x3A"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["The input for the query is a ROR URL."],"metadata":{"id":"J31_ejB6bWqd"}},{"cell_type":"code","source":["# input parameter for all further computations\n","example_ror=\"https://ror.org/021k10z87\""],"metadata":{"id":"UwYUsbnMbZnI","executionInfo":{"status":"ok","timestamp":1643208788232,"user_tz":-60,"elapsed":15,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":2,"outputs":[]},{"cell_type":"markdown","source":["We use it to query Datacite's GraphQL API for the organization's metadata and all people connected to it. Since the API uses pagination, we need to loop through all pages to get the complete result set.\n"],"metadata":{"id":"ba_A3Anpbl4P"}},{"cell_type":"code","source":["# Datacite's GraphQL endpoint for the FREYA PID Graph\n","DATACITE_GRAPHQL_API = \"https://api.datacite.org/graphql\"\n","\n","# Query to retrieve an organization and all its affiliated people\n","QUERY_ORGA2PEOPLE = \"\"\"query organization($ror :ID!, $after:String){\n","organization(id: $ror) {\n"," people(first: 1000, after: $after) {\n"," totalCount\n"," pageInfo {\n"," endCursor\n"," hasNextPage\n"," }\n","\n"," nodes {\n"," id\n"," name\n"," givenName\n"," }\n"," }\n"," }\n","}\"\"\"\n","\n","# query all people that are connected to given ROR\n","def download_data(ror):\n"," continue_paginating = True\n"," cursor=\"\"\n"," while continue_paginating:\n"," vars = {'ror': ror, 'after': cursor}\n"," response = requests.post(url=DATACITE_GRAPHQL_API,\n"," json={'query': QUERY_ORGA2PEOPLE, 'variables': vars},\n"," headers={'Content-Type': 'application/json'})\n"," result=response.json()\n","\n"," # check if next page exists and set cursor to next page\n"," continue_paginating = has_next_page(result)\n"," cursor = next_cursor(result)\n"," yield result\n","\n","# check if there is another page with results to query\n","def has_next_page(response_data):\n"," resp_dict = benedict.from_json(response_data)\n"," has_next_page = resp_dict.get(\"data.organization.people.pageInfo.hasNextPage\")\n"," return has_next_page\n","\n","# set cursor to next value\n","def next_cursor(response_data):\n"," resp_dict = benedict.from_json(response_data)\n"," cursor = resp_dict.get(\"data.organization.people.pageInfo.endCursor\")\n"," return cursor\n","\n","\n","#--- example execution\n","list_of_pages=download_data(example_ror)"],"metadata":{"id":"7FAu2l388OeD","executionInfo":{"status":"ok","timestamp":1643208819281,"user_tz":-60,"elapsed":226,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":4,"outputs":[]},{"cell_type":"markdown","source":["From the returned pages we extract the list of people."],"metadata":{"id":"2lR-J8vUcI5-"}},{"cell_type":"code","source":["# from the result pages we get from the GraphQL API, extract the data about the people\n","def extract_people_from_pages(list_of_pages):\n"," for page in list_of_pages:\n"," page_dict=benedict.from_json(page)\n"," for person in page_dict.get('data.organization.people.nodes'):\n"," yield person\n","\n","#--- example execution\n","people=extract_people_from_pages(list_of_pages)"],"metadata":{"id":"lQqnqydz2hUh","executionInfo":{"status":"ok","timestamp":1643208827139,"user_tz":-60,"elapsed":261,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":5,"outputs":[]},{"cell_type":"markdown","source":["From each person's metadata we extract and print out their name and ORCID iD."],"metadata":{"id":"FwJxfB_12wtY"}},{"cell_type":"code","source":["# extract ORCID from person\n","def extract_orcid(person):\n"," person_dict = benedict.from_json(person)\n"," orcid = person_dict.get('id').replace(\"https://orcid.org/\", \"\")\n"," name = person_dict.get('name')\n"," return orcid, name\n","\n","#--- example execution\n","for person in people:\n"," orcid, name = extract_orcid(person)\n"," print(f\"{orcid}, {name}\")"],"metadata":{"id":"aCYx1t4P3Bpu","executionInfo":{"status":"ok","timestamp":1643208836439,"user_tz":-60,"elapsed":2988,"user":{"displayName":"","photoUrl":"","userId":""}},"outputId":"1c350aa6-6659-4ff9-990d-e0309706941b","colab":{"base_uri":"https://localhost:8080/"}},"execution_count":6,"outputs":[{"output_type":"stream","name":"stdout","text":["0000-0002-3783-6130, Irene Weipert-Fenner\n","0000-0002-5452-0488, Hans-Joachim Spanger\n","0000-0001-6746-1248, Anton Peez\n","0000-0001-6731-5304, Julia Eckert\n","0000-0003-1575-9688, Hendrik Simon\n","0000-0002-1712-2624, Julian Junk\n","0000-0003-0035-5840, Raphael Oidtmann\n","0000-0002-8739-2486, Elvira Rosert\n","0000-0002-5925-043X, Ariadne Natal\n","0000-0002-7012-6739, Peter Kreuzer\n","0000-0001-7843-4480, Dirk Peters\n","0000-0003-0039-9827, Eldad Ben Aharon\n","0000-0001-6823-6819, Janna Lisa Chalmovsky\n","0000-0003-1940-8877, Mikhail Polianskii\n","0000-0002-4259-6071, Felix S. Bethke\n","0000-0001-7286-3575, Paul Chambers\n"]}]}]}
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"metadata":{"language_info":{"name":"python","version":"3.7.8","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kernelspec":{"name":"python3","display_name":"Python 3","language":"python"},"colab":{"name":"Kopie von Kopie von openalex_get_people_by_organization.ipynb","provenance":[{"file_id":"https://github.com/Project-TAPIR/pidgraph-notebooks/blob/organization-people/organization-people/openalex_get_people_by_organization.ipynb","timestamp":1643210429142}],"collapsed_sections":[]}},"nbformat_minor":5,"nbformat":4,"cells":[{"cell_type":"markdown","source":["### Query OpenAlex for all people affiliated with an organization\n","This script queries the [OpenAlex API](https://docs.openalex.org/api) via its '`/authors`' endpoint for all authors affiliated with an organization.\n","It takes a ROR URL as input which is used to retrieve all authors that specified the ROR ID in their metadata field '`last_known_institution.ror`'. From the resulting list of people we output their respective ORCID iDs."],"metadata":{"id":"ac7bedaf-05fb-4eb0-9bf5-e4d1d68a08c3"},"id":"ac7bedaf-05fb-4eb0-9bf5-e4d1d68a08c3"},{"cell_type":"code","source":["# needed dependency to make HTTP calls\n","import requests"],"metadata":{"id":"IUqshUWKwSk2","executionInfo":{"status":"ok","timestamp":1643210415322,"user_tz":-60,"elapsed":8,"user":{"displayName":"","photoUrl":"","userId":""}}},"id":"IUqshUWKwSk2","execution_count":1,"outputs":[]},{"cell_type":"markdown","source":["The input for the query is a ROR URL."],"metadata":{"id":"nSJjdkxGdWll"},"id":"nSJjdkxGdWll"},{"cell_type":"code","source":["# input parameter\n","example_ror=\"https://ror.org/021k10z87\""],"metadata":{"id":"7EryzPledIp6","executionInfo":{"status":"ok","timestamp":1643210415322,"user_tz":-60,"elapsed":6,"user":{"displayName":"","photoUrl":"","userId":""}}},"id":"7EryzPledIp6","execution_count":2,"outputs":[]},{"cell_type":"markdown","source":["We use it to query the OpenAlex API for authors that specified the organization's ROR ID in the field '`last_known_institution.ror`'. Since the OpenAlex API uses [pagination](https://docs.openalex.org/api/get-lists-of-entities#pagination), we need to loop through all pages to get the complete result set."],"metadata":{"id":"MiXVDKXid9tq"},"id":"MiXVDKXid9tq"},{"cell_type":"code","source":["# OpenAlex endpoint to query for authors\n","OPENALEX_API_AUTHORS = \"https://api.openalex.org/authors\"\n","\n","# query all people that are connected to given ROR\n","def download_data(ror):\n"," page = 1\n"," max_page = 1\n"," while page <= max_page:\n"," params = {'filter': 'last_known_institution.ror:'+ror, 'page': page}\n","\n"," response = requests.get(url=OPENALEX_API_AUTHORS,\n"," params=params,\n"," headers= {'Content-Type': 'application/json'})\n"," result=response.json()\n","\n"," # calculate max page number in first loop\n"," if max_page == 1:\n"," max_page = determine_max_page(result)\n"," page = page + 1\n"," yield result\n","\n","# calculate max number of result pages\n","def determine_max_page(response_data):\n"," item_count = response_data['meta']['count']\n"," items_per_page = response_data['meta']['per_page']\n"," max_page_ceil = item_count // items_per_page + bool(item_count % items_per_page)\n"," return max_page_ceil\n","\n","\n","#-- example execution\n","list_of_pages=download_data(example_ror)"],"metadata":{"trusted":true,"id":"8b608640-96a8-47d1-9de7-b7d3f6fd5a47","executionInfo":{"status":"ok","timestamp":1643210415323,"user_tz":-60,"elapsed":5,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":3,"outputs":[],"id":"8b608640-96a8-47d1-9de7-b7d3f6fd5a47"},{"cell_type":"markdown","source":["From the resulting list of people we extract and print out each ORCID and name."],"metadata":{"id":"CwRzvAQweuoW"},"id":"CwRzvAQweuoW"},{"cell_type":"code","source":["# extract all ORCIDs from the result\n","def extract_orcids(data):\n"," for author in data['results']:\n"," try:\n"," orcid=author['ids']['orcid'].replace(\"https://orcid.org/\", \"\")\n"," name=author['display_name']\n"," yield orcid, name\n"," except (KeyError,AttributeError) as e:\n"," pass\n","\n","#-- example execution\n","for page in list_of_pages:\n"," for orcid,name in extract_orcids(page):\n"," print(f\"{orcid}, {name}\")"],"metadata":{"trusted":true,"colab":{"base_uri":"https://localhost:8080/"},"id":"1c36737c-4dcf-42d5-80e2-802f0a7a8326","outputId":"5efd986b-0b92-4b0d-e5cc-a65aeae2785e","executionInfo":{"status":"ok","timestamp":1643210418504,"user_tz":-60,"elapsed":3186,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":4,"outputs":[{"output_type":"stream","name":"stdout","text":["0000-0002-3824-5375, Nicole Deitelhoff\n","0000-0002-7348-7206, Jonas Wolff\n","0000-0002-6891-770X, Francis O’Connor\n","0000-0002-3536-8898, Felix Anderl\n","0000-0002-4259-6071, Felix S. Bethke\n","0000-0002-3136-0901, Thorsten Gromes\n","0000-0001-9698-2616, Annika Elena Poppe\n","0000-0002-3783-6130, Irene Weipert-Fenner\n","0000-0002-4793-9010, Arvid Bell\n","0000-0002-7012-6739, Peter Kreuzer\n","0000-0002-0143-5183, Christina Kohler\n"]}],"id":"1c36737c-4dcf-42d5-80e2-802f0a7a8326"}]}

0 commit comments

Comments
 (0)