+{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Kopie von Kopie von freya_get_people_by_organization.ipynb","provenance":[{"file_id":"https://github.com/Project-TAPIR/pidgraph-notebooks/blob/organization-people/organization-people/freya_get_people_by_organization.ipynb","timestamp":1643208926409}],"authorship_tag":"ABX9TyOPyixqZithrfY0TncA4o1K"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["### Query the FREYA PID Graph for all people affiliated with an organization\n","\n","This notebook queries the [FREYA PID Graph](https://blog.datacite.org/powering-the-pid-graph/) via [Datacite's GraphQL API](https://api.datacite.org/graphql) to retrieve all people affiliated with an organization. It takes a ROR URL as input which is used to retrieve the according Grid and Ringgold ID of the organization and query the ORCID API with it [for affiliated people](https://info.orcid.org/faq/how-do-i-find-orcid-record-holders-at-my-institution/). From the resulting list of people we output the ORCID iDs."],"metadata":{"id":"etxiXTW668ZD"}},{"cell_type":"code","source":["# needed dependency to make HTTP calls\n","import requests\n","# dependencies for dealing with json\n","!pip install python-benedict\n","from benedict import benedict"],"metadata":{"id":"8Mk7-aYc7x3A"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["The input for the query is a ROR URL."],"metadata":{"id":"J31_ejB6bWqd"}},{"cell_type":"code","source":["# input parameter for all further computations\n","example_ror=\"https://ror.org/021k10z87\""],"metadata":{"id":"UwYUsbnMbZnI","executionInfo":{"status":"ok","timestamp":1643208788232,"user_tz":-60,"elapsed":15,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":2,"outputs":[]},{"cell_type":"markdown","source":["We use it to query Datacite's GraphQL API for the organization's metadata and all people connected to it. Since the API uses pagination, we need to loop through all pages to get the complete result set.\n"],"metadata":{"id":"ba_A3Anpbl4P"}},{"cell_type":"code","source":["# Datacite's GraphQL endpoint for the FREYA PID Graph\n","DATACITE_GRAPHQL_API = \"https://api.datacite.org/graphql\"\n","\n","# Query to retrieve an organization and all its affiliated people\n","QUERY_ORGA2PEOPLE = \"\"\"query organization($ror :ID!, $after:String){\n","organization(id: $ror) {\n"," people(first: 1000, after: $after) {\n"," totalCount\n"," pageInfo {\n"," endCursor\n"," hasNextPage\n"," }\n","\n"," nodes {\n"," id\n"," name\n"," givenName\n"," }\n"," }\n"," }\n","}\"\"\"\n","\n","# query all people that are connected to given ROR\n","def download_data(ror):\n"," continue_paginating = True\n"," cursor=\"\"\n"," while continue_paginating:\n"," vars = {'ror': ror, 'after': cursor}\n"," response = requests.post(url=DATACITE_GRAPHQL_API,\n"," json={'query': QUERY_ORGA2PEOPLE, 'variables': vars},\n"," headers={'Content-Type': 'application/json'})\n"," result=response.json()\n","\n"," # check if next page exists and set cursor to next page\n"," continue_paginating = has_next_page(result)\n"," cursor = next_cursor(result)\n"," yield result\n","\n","# check if there is another page with results to query\n","def has_next_page(response_data):\n"," resp_dict = benedict.from_json(response_data)\n"," has_next_page = resp_dict.get(\"data.organization.people.pageInfo.hasNextPage\")\n"," return has_next_page\n","\n","# set cursor to next value\n","def next_cursor(response_data):\n"," resp_dict = benedict.from_json(response_data)\n"," cursor = resp_dict.get(\"data.organization.people.pageInfo.endCursor\")\n"," return cursor\n","\n","\n","#--- example execution\n","list_of_pages=download_data(example_ror)"],"metadata":{"id":"7FAu2l388OeD","executionInfo":{"status":"ok","timestamp":1643208819281,"user_tz":-60,"elapsed":226,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":4,"outputs":[]},{"cell_type":"markdown","source":["From the returned pages we extract the list of people."],"metadata":{"id":"2lR-J8vUcI5-"}},{"cell_type":"code","source":["# from the result pages we get from the GraphQL API, extract the data about the people\n","def extract_people_from_pages(list_of_pages):\n"," for page in list_of_pages:\n"," page_dict=benedict.from_json(page)\n"," for person in page_dict.get('data.organization.people.nodes'):\n"," yield person\n","\n","#--- example execution\n","people=extract_people_from_pages(list_of_pages)"],"metadata":{"id":"lQqnqydz2hUh","executionInfo":{"status":"ok","timestamp":1643208827139,"user_tz":-60,"elapsed":261,"user":{"displayName":"","photoUrl":"","userId":""}}},"execution_count":5,"outputs":[]},{"cell_type":"markdown","source":["From each person's metadata we extract and print out their name and ORCID iD."],"metadata":{"id":"FwJxfB_12wtY"}},{"cell_type":"code","source":["# extract ORCID from person\n","def extract_orcid(person):\n"," person_dict = benedict.from_json(person)\n"," orcid = person_dict.get('id').replace(\"https://orcid.org/\", \"\")\n"," name = person_dict.get('name')\n"," return orcid, name\n","\n","#--- example execution\n","for person in people:\n"," orcid, name = extract_orcid(person)\n"," print(f\"{orcid}, {name}\")"],"metadata":{"id":"aCYx1t4P3Bpu","executionInfo":{"status":"ok","timestamp":1643208836439,"user_tz":-60,"elapsed":2988,"user":{"displayName":"","photoUrl":"","userId":""}},"outputId":"1c350aa6-6659-4ff9-990d-e0309706941b","colab":{"base_uri":"https://localhost:8080/"}},"execution_count":6,"outputs":[{"output_type":"stream","name":"stdout","text":["0000-0002-3783-6130, Irene Weipert-Fenner\n","0000-0002-5452-0488, Hans-Joachim Spanger\n","0000-0001-6746-1248, Anton Peez\n","0000-0001-6731-5304, Julia Eckert\n","0000-0003-1575-9688, Hendrik Simon\n","0000-0002-1712-2624, Julian Junk\n","0000-0003-0035-5840, Raphael Oidtmann\n","0000-0002-8739-2486, Elvira Rosert\n","0000-0002-5925-043X, Ariadne Natal\n","0000-0002-7012-6739, Peter Kreuzer\n","0000-0001-7843-4480, Dirk Peters\n","0000-0003-0039-9827, Eldad Ben Aharon\n","0000-0001-6823-6819, Janna Lisa Chalmovsky\n","0000-0003-1940-8877, Mikhail Polianskii\n","0000-0002-4259-6071, Felix S. Bethke\n","0000-0001-7286-3575, Paul Chambers\n"]}]}]}
0 commit comments