From ffc781dce38fc24cf79576e256602b738db54a03 Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Fri, 6 Feb 2026 13:05:51 -0500 Subject: [PATCH 01/17] Tried to improve text. --- documentation/Deployment.md | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/documentation/Deployment.md b/documentation/Deployment.md index 5fc144d4..a28e4ae6 100644 --- a/documentation/Deployment.md +++ b/documentation/Deployment.md @@ -34,20 +34,24 @@ instance or from Translator. 6. Start the Solr and NameRes pods by running `docker-compose up`. By default, Docker Compose will download and start the relevant pods and show you logs from both sources. You may press `Ctrl+C` to stop the pods. -7. Look for a line similar to `Uvicorn running on http://0.0.0.0:2433 (Press CTRL+C to quit)`, - which tells you where NameRes is running. - * By default, the web frontend (http://0.0.0.0:2433/docs) defaults to using the - [NameRes RENCI Dev](https://name-resolution-sri.renci.org/docs) — you will need to - change the "Servers" setting to use your local NameRes instance. - * Note that looking up http://0.0.0.0:2433/status will give you an error (`Expected core not found.`). - This is because the Solr database and indexes have not yet been loaded. -8. Run the Solr restore script using `bash`, i.e. `bash solr-restore/restore.sh`. This script +7. Wait for the Solr pod to start up. Once it's ready, you will need to run the Solr restore + script by running `bash data-loading/setup-and-load-solr.sh`. This will set up the types + and indexes used by NameRes. +8. Trigger the Solr restore by running the restore script using `bash`, i.e. `bash solr-restore/restore.sh`. This script assumes that the Solr pod is available on `localhost:8983` and contains a `var/solr/data/snapshot.backup` directory with the data to restore. 9. Look for the script to end properly (`Solr restore complete!`). Look up http://localhost:2433/status - to ensure that the database has been loaded as expected, and use http://localhost:2433/docs (after - changing the server) to try some test queries to make sure NameRes is working properly. -10. You can now delete the uncompressed database backup in `$SOLR_DATA/var` to save disk space. + to ensure that the database has been loaded as expected. You can now delete the uncompressed database + backup in `$SOLR_DATA/var` to save disk space. +10. With the default settings, NameRes should be running on localhost on port 2433 (i.e. http://localhost:2433/). + You should see a message in the NameRes pod log saying something like + `Uvicorn running on http://0.0.0.0:2433 (Press CTRL+C to quit)` to confirm this. + * By default, the web frontend (http://0.0.0.0:2433/docs) defaults to using the + [NameRes RENCI Dev](https://name-resolution-sri.renci.org/docs) — you will need to + change the "Servers" setting to use your local NameRes instance. + * If you try this before the restore has finished, looking up http://0.0.0.0:2433/status will give you an error + (`Expected core not found.`). This is because the Solr database and indexes have not yet been loaded. + Once this is finished, the NameRes instance should be ready to use. #### Loading from synonyms files From db300018d4bdb5c1b1c8ec6cf6e5fccb7b7a80be Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Fri, 6 Feb 2026 13:36:32 -0500 Subject: [PATCH 02/17] Fixed instructions. --- documentation/Deployment.md | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/documentation/Deployment.md b/documentation/Deployment.md index a28e4ae6..99d2cd73 100644 --- a/documentation/Deployment.md +++ b/documentation/Deployment.md @@ -34,24 +34,23 @@ instance or from Translator. 6. Start the Solr and NameRes pods by running `docker-compose up`. By default, Docker Compose will download and start the relevant pods and show you logs from both sources. You may press `Ctrl+C` to stop the pods. -7. Wait for the Solr pod to start up. Once it's ready, you will need to run the Solr restore - script by running `bash data-loading/setup-and-load-solr.sh`. This will set up the types - and indexes used by NameRes. -8. Trigger the Solr restore by running the restore script using `bash`, i.e. `bash solr-restore/restore.sh`. This script - assumes that the Solr pod is available on `localhost:8983` and contains a - `var/solr/data/snapshot.backup` directory with the data to restore. -9. Look for the script to end properly (`Solr restore complete!`). Look up http://localhost:2433/status +7. Trigger the Solr restore by running the restore script using `bash`, i.e. + `bash solr-restore/restore.sh`. This script assumes that the Solr pod is available on `localhost:8983` + and contains a `var/solr/data/snapshot.backup` directory with the data to restore. It will set up + some data types needed by NameRes and then triggering a restore of a backup. It will then go into a + sleep loop until the restore is complete, which should take 15-20 minutes. +8. Check that the script ended properly (`Solr restore complete!`). Look up http://localhost:2433/status to ensure that the database has been loaded as expected. You can now delete the uncompressed database backup in `$SOLR_DATA/var` to save disk space. -10. With the default settings, NameRes should be running on localhost on port 2433 (i.e. http://localhost:2433/). - You should see a message in the NameRes pod log saying something like - `Uvicorn running on http://0.0.0.0:2433 (Press CTRL+C to quit)` to confirm this. - * By default, the web frontend (http://0.0.0.0:2433/docs) defaults to using the - [NameRes RENCI Dev](https://name-resolution-sri.renci.org/docs) — you will need to - change the "Servers" setting to use your local NameRes instance. - * If you try this before the restore has finished, looking up http://0.0.0.0:2433/status will give you an error - (`Expected core not found.`). This is because the Solr database and indexes have not yet been loaded. - Once this is finished, the NameRes instance should be ready to use. +9. With the default settings, NameRes should be running on localhost on port 2433 (i.e. http://localhost:2433/). + You should see a message in the NameRes pod log saying something like + `Uvicorn running on http://0.0.0.0:2433 (Press CTRL+C to quit)` to confirm this. + * By default, the web frontend (http://0.0.0.0:2433/docs) defaults to using the + [NameRes RENCI Dev](https://name-resolution-sri.renci.org/docs) — you will need to + change the "Servers" setting to use your local NameRes instance. + * If you try this before the restore has finished, looking up http://0.0.0.0:2433/status will give you an error + (`Expected core not found.`). This is because the Solr database and indexes have not yet been loaded. + Once this is finished, the NameRes instance should be ready to use. #### Loading from synonyms files From f4f9aa6fe997452c6cbd2f8473a053d3ade89445 Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Fri, 6 Feb 2026 13:37:22 -0500 Subject: [PATCH 03/17] Improved section title. --- documentation/Deployment.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/Deployment.md b/documentation/Deployment.md index 99d2cd73..af61abe7 100644 --- a/documentation/Deployment.md +++ b/documentation/Deployment.md @@ -8,7 +8,7 @@ file, although you will need either (1) a set of synonyms files generated by Bab to load into Solr, or (2) a Solr database backup to load into Solr. The following instructions will work whichever of the two approaches you need to follow. -### Starting NameRes locally with loading from a Solr backup +### Starting NameRes locally by loading a Solr backup The simplest way to run NameRes locally is by using a Solr backup from another NameRes instance or from Translator. From 8abd5b6b24398f519382ded31dac75137a08200e Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Fri, 6 Feb 2026 13:48:42 -0500 Subject: [PATCH 04/17] Consolidate the database creation script. Closes #239. --- data-loading/setup-and-load-solr.sh | 141 ++------------------------- data-loading/setup_solr.sh | 145 ++++++++++++++++++++++++++++ solr-restore/restore.sh | 143 +-------------------------- 3 files changed, 154 insertions(+), 275 deletions(-) create mode 100644 data-loading/setup_solr.sh diff --git a/data-loading/setup-and-load-solr.sh b/data-loading/setup-and-load-solr.sh index 7b47fc82..ce487fbe 100755 --- a/data-loading/setup-and-load-solr.sh +++ b/data-loading/setup-and-load-solr.sh @@ -1,10 +1,10 @@ #!/usr/bin/env bash -SOLR_PORT=8983 +SOLR_SERVER="http://localhost:8983" is_solr_up(){ - echo "Checking if solr is up on http://localhost:$SOLR_PORT/solr/admin/cores" - http_code=`echo $(curl -s -o /dev/null -w "%{http_code}" "http://localhost:$SOLR_PORT/solr/admin/cores")` + echo "Checking if solr is up on http://$SOLR_SERVER/solr/admin/cores" + http_code=`echo $(curl -s -o /dev/null -w "%{http_code}" "http://$SOLR_SERVER/solr/admin/cores")` echo $http_code return `test $http_code = "200"` } @@ -17,136 +17,7 @@ wait_for_solr(){ wait_for_solr -# add collection -curl -X POST 'http://localhost:8983/solr/admin/collections?action=CREATE&name=name_lookup&numShards=1&replicationFactor=1' - -# do not autocreate fields -curl 'http://localhost:8983/solr/name_lookup/config' -d '{"set-user-property": {"update.autoCreateFields": "false"}}' - -# add lowercase text type -curl -X POST -H 'Content-type:application/json' --data-binary '{ - "add-field-type" : { - "name": "LowerTextField", - "class": "solr.TextField", - "positionIncrementGap": "100", - "analyzer": { - "tokenizer": { - "class": "solr.StandardTokenizerFactory" - }, - "filters": [{ - "class": "solr.LowerCaseFilterFactory" - }] - } - } -}' 'http://localhost:8983/solr/name_lookup/schema' - -# add exactish text type (as described at https://stackoverflow.com/a/29105025/27310) -curl -X POST -H 'Content-type:application/json' --data-binary '{ - "add-field-type" : { - "name": "exactish", - "class": "solr.TextField", - "positionIncrementGap": "100", - "analyzer": { - "tokenizer": { - "class": "solr.KeywordTokenizerFactory" - }, - "filters": [{ - "class": "solr.LowerCaseFilterFactory" - }] - } - } -}' 'http://localhost:8983/solr/name_lookup/schema' - - - -# add fields -curl -X POST -H 'Content-type:application/json' --data-binary '{ - "add-field": [ - { - "name":"names", - "type":"LowerTextField", - "indexed":true, - "stored":true, - "multiValued":true - }, - { - "name":"names_exactish", - "type":"exactish", - "indexed":true, - "stored":false, - "multiValued":true - }, - { - "name":"curie", - "type":"string", - "stored":true - }, - { - "name":"preferred_name", - "type":"LowerTextField", - "stored":true - }, - { - "name":"preferred_name_exactish", - "type":"exactish", - "indexed":true, - "stored":false, - "multiValued":false - }, - { - "name":"types", - "type":"string", - "stored":true - "multiValued":true - }, - { - "name":"shortest_name_length", - "type":"pint", - "stored":true - }, - { - "name":"curie_suffix", - "type":"plong", - "docValues":true, - "stored":true, - "required":false, - "sortMissingLast":true - }, - { - "name":"taxa", - "type":"string", - "stored":true, - "multiValued":true - }, - { - "name":"taxon_specific", - "type":"boolean", - "stored":true, - "multiValued":false, - "sortMissingLast":true - }, - { - "name":"clique_identifier_count", - "type":"pint", - "stored":true - } - ] }' 'http://localhost:8983/solr/name_lookup/schema' - -# Add a copy field to copy names into names_exactish. -curl -X POST -H 'Content-type:application/json' --data-binary '{ - "add-copy-field": { - "source": "names", - "dest": "names_exactish" - } -}' 'http://localhost:8983/solr/name_lookup/schema' - -# Add a copy field to copy preferred_name into preferred_name_exactish. -curl -X POST -H 'Content-type:application/json' --data-binary '{ - "add-copy-field": { - "source": "preferred_name", - "dest": "preferred_name_exactish" - } -}' 'http://localhost:8983/solr/name_lookup/schema' +source "setup_solr.sh" # add data for f in $1; do @@ -154,9 +25,9 @@ for f in $1; do # curl -d @$f needs to load the entire file into memory before uploading it, whereas # curl -X POST -T $f will stream it. See https://github.com/TranslatorSRI/NameResolution/issues/194 curl -H 'Content-Type: application/json' -X POST -T $f \ - 'http://localhost:8983/solr/name_lookup/update/json/docs?processor=uuid&uuid.fieldName=id&commit=true' + 'http://$SOLR_SERVER/solr/name_lookup/update/json/docs?processor=uuid&uuid.fieldName=id&commit=true' sleep 30 done echo "Check solr" -curl -s --negotiate -u: 'localhost:8983/solr/name_lookup/query?q=*:*&rows=0' +curl -s --negotiate -u: '$SOLR_SERVER/solr/name_lookup/query?q=*:*&rows=0' diff --git a/data-loading/setup_solr.sh b/data-loading/setup_solr.sh new file mode 100644 index 00000000..5c8cde6f --- /dev/null +++ b/data-loading/setup_solr.sh @@ -0,0 +1,145 @@ +#!/usr/bin/env bash +# +# Set up the fields and types needed by NameRes. +# +# This file should be sourced, not called directly. + +# require sourcing +[[ "${BASH_SOURCE[0]}" != "$0" ]] || { + echo "Must be sourced: source $0" >&2 + exit 1 +} + +# require SOLR_SERVER +: "${SOLR_SERVER:?SOLR_SERVER must be set}" + +# add collection +curl -X POST "http://$SOLR_SERVER/solr/admin/collections?action=CREATE&name=name_lookup&numShards=1&replicationFactor=1" + +# do not autocreate fields +curl "http://$SOLR_SERVER/solr/name_lookup/config" -d '{"set-user-property": {"update.autoCreateFields": "false"}}' + +# add lowercase text type +curl -X POST -H 'Content-type:application/json' --data-binary '{ + "add-field-type" : { + "name": "LowerTextField", + "class": "solr.TextField", + "positionIncrementGap": "100", + "analyzer": { + "tokenizer": { + "class": "solr.StandardTokenizerFactory" + }, + "filters": [{ + "class": "solr.LowerCaseFilterFactory" + }] + } + } +}' "http://$SOLR_SERVER/solr/name_lookup/schema" + +# add exactish text type (as described at https://stackoverflow.com/a/29105025/27310) +curl -X POST -H 'Content-type:application/json' --data-binary '{ + "add-field-type" : { + "name": "exactish", + "class": "solr.TextField", + "positionIncrementGap": "100", + "analyzer": { + "tokenizer": { + "class": "solr.KeywordTokenizerFactory" + }, + "filters": [{ + "class": "solr.LowerCaseFilterFactory" + }] + } + } +}' "http://$SOLR_SERVER/solr/name_lookup/schema" + + + +# add fields +curl -X POST -H 'Content-type:application/json' --data-binary '{ + "add-field": [ + { + "name":"names", + "type":"LowerTextField", + "indexed":true, + "stored":true, + "multiValued":true + }, + { + "name":"names_exactish", + "type":"exactish", + "indexed":true, + "stored":false, + "multiValued":true + }, + { + "name":"curie", + "type":"string", + "stored":true + }, + { + "name":"preferred_name", + "type":"LowerTextField", + "stored":true + }, + { + "name":"preferred_name_exactish", + "type":"exactish", + "indexed":true, + "stored":false, + "multiValued":false + }, + { + "name":"types", + "type":"string", + "stored":true + "multiValued":true + }, + { + "name":"shortest_name_length", + "type":"pint", + "stored":true + }, + { + "name":"curie_suffix", + "type":"plong", + "docValues":true, + "stored":true, + "required":false, + "sortMissingLast":true + }, + { + "name":"taxa", + "type":"string", + "stored":true, + "multiValued":true + }, + { + "name":"taxon_specific", + "type":"boolean", + "stored":true, + "multiValued":false, + "sortMissingLast":true + }, + { + "name":"clique_identifier_count", + "type":"pint", + "stored":true + } + ] }' "http://$SOLR_SERVER/solr/name_lookup/schema" + +# Add a copy field to copy names into names_exactish. +curl -X POST -H 'Content-type:application/json' --data-binary '{ + "add-copy-field": { + "source": "names", + "dest": "names_exactish" + } +}' "http://$SOLR_SERVER/solr/name_lookup/schema" + +# Add a copy field to copy preferred_name into preferred_name_exactish. +curl -X POST -H 'Content-type:application/json' --data-binary '{ + "add-copy-field": { + "source": "preferred_name", + "dest": "preferred_name_exactish" + } +}' "http://$SOLR_SERVER/solr/name_lookup/schema" diff --git a/solr-restore/restore.sh b/solr-restore/restore.sh index 69386fb5..2401424f 100644 --- a/solr-restore/restore.sh +++ b/solr-restore/restore.sh @@ -31,26 +31,10 @@ until [ "$response" = "200" ]; do done echo "SOLR is up and running at ${SOLR_SERVER}." -# Step 2. Create the COLLECTION_NAME if it doesn't exist. +# Step 3. Create fields for search. +source "../data-loading/setup_solr.sh" -EXISTS=$(wget -O - ${SOLR_SERVER}/solr/admin/collections?action=LIST | grep ${COLLECTION_NAME}) - -# create collection / shard if it doesn't exist. -if [ -z "$EXISTS" ] -then - wget -O- ${SOLR_SERVER}/solr/admin/collections?action=CREATE'&'name=${COLLECTION_NAME}'&'numShards=1'&'replicationFactor=1 - sleep 3 -fi - -# Step 3. Begin restoring the data. - -# Setup fields for search -wget --post-data '{"set-user-property": {"update.autoCreateFields": "false"}}' \ - --header='Content-Type:application/json' \ - -O- ${SOLR_SERVER}/solr/${COLLECTION_NAME}/config -sleep 1 - -# Restore data +# Step 4. Restore the data CORE_NAME=${COLLECTION_NAME}_shard1_replica_n1 RESTORE_URL="${SOLR_SERVER}/solr/${CORE_NAME}/replication?command=restore&location=/var/solr/data/var/solr/data/&name=${BACKUP_NAME}" wget -O - "$RESTORE_URL" @@ -63,124 +47,3 @@ until [ ! -z "$RESTORE_STATUS" ] ; do sleep 10 done echo "Solr restore complete" - -# Step 4. Create fields for search. -# (It might be possible to do this before the restore, but I'm going to follow the existing code for now.) -wget --post-data '{ - "add-field-type" : { - "name": "LowerTextField", - "class": "solr.TextField", - "positionIncrementGap": "100", - "analyzer": { - "tokenizer": { - "class": "solr.StandardTokenizerFactory" - }, - "filters": [{ - "class": "solr.LowerCaseFilterFactory" - }] - } - }}' \ - --header='Content-Type:application/json' \ - -O- ${SOLR_SERVER}/solr/${COLLECTION_NAME}/schema -sleep 1 -# exactish type taken from https://stackoverflow.com/a/29105025/27310 -wget --post-data '{ - "add-field-type" : { - "name": "exactish", - "class": "solr.TextField", - "analyzer": { - "tokenizer": { - "class": "solr.KeywordTokenizerFactory" - }, - "filters": [{ - "class": "solr.LowerCaseFilterFactory" - }] - } - }}' \ - --header='Content-Type:application/json' \ - -O- ${SOLR_SERVER}/solr/${COLLECTION_NAME}/schema -sleep 1 -wget --post-data '{ - "add-field": [ - { - "name":"names", - "type":"LowerTextField", - "stored": true, - "multiValued": true - }, - { - "name":"names_exactish", - "type":"exactish", - "indexed":true, - "stored":true, - "multiValued":true - }, - { - "name":"curie", - "type":"string", - "stored":true - }, - { - "name": "preferred_name", - "type": "LowerTextField", - "stored": true - }, - { - "name": "preferred_name_exactish", - "type": "exactish", - "indexed": true, - "stored": false, - "multiValued": false - }, - { - "name": "types", - "type": "string", - "stored": true, - "multiValued": true - }, - { - "name": "shortest_name_length", - "type": "pint", - "stored": true - }, - { - "name": "curie_suffix", - "type": "plong", - "docValues": true, - "stored": true, - "required": false, - "sortMissingLast": true - }, - { - "name": "taxa", - "type": "string", - "stored": true, - "multiValued": true - }, - { - "name": "clique_identifier_count", - "type": "pint", - "stored": true - } - ] - }' \ - --header='Content-Type:application/json' \ - -O- ${SOLR_SERVER}/solr/${COLLECTION_NAME}/schema -sleep 1 -wget --post-data '{ - "add-copy-field" : { - "source": "names", - "dest": "names_exactish" - }}' \ - --header='Content-Type:application/json' \ - -O- ${SOLR_SERVER}/solr/${COLLECTION_NAME}/schema -wget --post-data '{ - "add-copy-field" : { - "source": "preferred_name", - "dest": "preferred_name_exactish" - }}' \ - --header='Content-Type:application/json' \ - -O- ${SOLR_SERVER}/solr/${COLLECTION_NAME}/schema -sleep 1 - -echo "Solr restore complete!" From b63c4d15fa7165235bc985e99f105dcf794dcb92 Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Fri, 6 Feb 2026 15:35:49 -0500 Subject: [PATCH 05/17] Improved things. --- docker-compose.yml | 3 ++- documentation/Deployment.md | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/docker-compose.yml b/docker-compose.yml index 78a38e74..9f398cfe 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -19,11 +19,12 @@ services: nameres: container_name: nameres + platform: linux/amd64 environment: - SOLR_HOST=name_solr - BABEL_VERSION= # e.g. 2025mar31 - BABEL_VERSION_URL= # The URL of the Babel version URL - - LOCATION_VALUE=RENCI + - LOCATION_VALUE=localhost - MATURITY_VALUE=development ports: - '2433:2433' diff --git a/documentation/Deployment.md b/documentation/Deployment.md index af61abe7..d5d38649 100644 --- a/documentation/Deployment.md +++ b/documentation/Deployment.md @@ -31,7 +31,7 @@ instance or from Translator. * By default, the Docker Compose file will use the latest released version of NameRes as the frontend. To use the source code in this repository, you will need to change the build instructions for the `nameres` service in the Docker Compose file. -6. Start the Solr and NameRes pods by running `docker-compose up`. By default, Docker Compose +6. Start the Solr and NameRes pods by running `docker compose up`. By default, Docker Compose will download and start the relevant pods and show you logs from both sources. You may press `Ctrl+C` to stop the pods. 7. Trigger the Solr restore by running the restore script using `bash`, i.e. From a14f1d509d657a411d09f4a55e120a98ed044da6 Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Fri, 6 Feb 2026 15:41:38 -0500 Subject: [PATCH 06/17] Renamed nameres to nameres_web to clarify things. --- docker-compose.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docker-compose.yml b/docker-compose.yml index 9f398cfe..52e41b37 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -17,8 +17,8 @@ services: source: ./data/solr target: /var/solr/data - nameres: - container_name: nameres + nameres_web: + container_name: nameres_web platform: linux/amd64 environment: - SOLR_HOST=name_solr From 4377ffa8c2a3263c2f5efedda776043ebdf996a7 Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Fri, 6 Feb 2026 15:42:05 -0500 Subject: [PATCH 07/17] Renamed nameres_solr to make things clearer. --- docker-compose.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docker-compose.yml b/docker-compose.yml index 52e41b37..eeb04739 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1,6 +1,6 @@ services: solr: - container_name: name_solr + container_name: nameres_solr image: solr:9.1 environment: # Change this setting to control how much memory you would like your Solr setup to have. @@ -21,7 +21,7 @@ services: container_name: nameres_web platform: linux/amd64 environment: - - SOLR_HOST=name_solr + - SOLR_HOST=nameres_solr - BABEL_VERSION= # e.g. 2025mar31 - BABEL_VERSION_URL= # The URL of the Babel version URL - LOCATION_VALUE=localhost From de5d15ddba07936bd70dbf9f33eec9e9d6faf4c7 Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Fri, 6 Feb 2026 15:58:19 -0500 Subject: [PATCH 08/17] Improved deployment. --- docker-compose.yml | 5 +++-- documentation/Deployment.md | 4 ++++ 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/docker-compose.yml b/docker-compose.yml index eeb04739..1286253c 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1,11 +1,12 @@ services: - solr: + nameres_solr: container_name: nameres_solr image: solr:9.1 + mem_limit: 18G environment: # Change this setting to control how much memory you would like your Solr setup to have. # Note that your Docker will need to be configured to allow this amount of memory. - SOLR_JAVA_MEM: '-Xms25G -Xmx25G' + SOLR_JAVA_MEM: '-Xmx16G' ports: - '8983:8983' command: ['-DzkRun'] diff --git a/documentation/Deployment.md b/documentation/Deployment.md index d5d38649..8fc98fa0 100644 --- a/documentation/Deployment.md +++ b/documentation/Deployment.md @@ -31,6 +31,10 @@ instance or from Translator. * By default, the Docker Compose file will use the latest released version of NameRes as the frontend. To use the source code in this repository, you will need to change the build instructions for the `nameres` service in the Docker Compose file. + * By default, Solr will be given 16G of memory, which seems sufficient for testing. + If you want to run many Solr queries, you might want to increase this. To do this, + you will need to change BOTH the `mem_limit` setting in the `nameres_solr` service in + `docker-compose.yml` and the `SOLR_JAVA_MEM` setting. 6. Start the Solr and NameRes pods by running `docker compose up`. By default, Docker Compose will download and start the relevant pods and show you logs from both sources. You may press `Ctrl+C` to stop the pods. From 658a0002a08d8dca6ed44fc126877fcc17324e9f Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Fri, 6 Feb 2026 16:08:13 -0500 Subject: [PATCH 09/17] Tried to clean up/improve scripts. --- data-loading/setup-and-load-solr.sh | 30 +++++++++++++---------------- data-loading/setup_solr.sh | 16 ++++++++------- solr-restore/restore.sh | 11 ++++++----- 3 files changed, 28 insertions(+), 29 deletions(-) diff --git a/data-loading/setup-and-load-solr.sh b/data-loading/setup-and-load-solr.sh index ce487fbe..b5160d19 100755 --- a/data-loading/setup-and-load-solr.sh +++ b/data-loading/setup-and-load-solr.sh @@ -2,32 +2,28 @@ SOLR_SERVER="http://localhost:8983" -is_solr_up(){ - echo "Checking if solr is up on http://$SOLR_SERVER/solr/admin/cores" - http_code=`echo $(curl -s -o /dev/null -w "%{http_code}" "http://$SOLR_SERVER/solr/admin/cores")` - echo $http_code - return `test $http_code = "200"` -} - -wait_for_solr(){ - while ! is_solr_up; do - sleep 3 - done -} - -wait_for_solr +# Step 1. Make sure the Solr service is up and running. +HEALTH_ENDPOINT="${SOLR_SERVER}/solr/admin/cores?action=STATUS" +response=$(wget --spider --server-response ${HEALTH_ENDPOINT} 2>&1 | grep "HTTP/" | awk '{ print $2 }') >&2 +until [ "$response" = "200" ]; do + response=$(wget --spider --server-response ${HEALTH_ENDPOINT} 2>&1 | grep "HTTP/" | awk '{ print $2 }') >&2 + echo " -- SOLR is unavailable - sleeping" + sleep 3 +done +echo "SOLR is up and running at ${SOLR_SERVER}." +# Step 2. Create fields for search. source "setup_solr.sh" -# add data +# Step 3. Load specified files. for f in $1; do echo "Loading $f..." # curl -d @$f needs to load the entire file into memory before uploading it, whereas # curl -X POST -T $f will stream it. See https://github.com/TranslatorSRI/NameResolution/issues/194 curl -H 'Content-Type: application/json' -X POST -T $f \ - 'http://$SOLR_SERVER/solr/name_lookup/update/json/docs?processor=uuid&uuid.fieldName=id&commit=true' + "$SOLR_SERVER/solr/name_lookup/update/json/docs?processor=uuid&uuid.fieldName=id&commit=true" sleep 30 done echo "Check solr" -curl -s --negotiate -u: '$SOLR_SERVER/solr/name_lookup/query?q=*:*&rows=0' +curl -s --negotiate -u: "$SOLR_SERVER/solr/name_lookup/query?q=*:*&rows=0" diff --git a/data-loading/setup_solr.sh b/data-loading/setup_solr.sh index 5c8cde6f..68c7ae99 100644 --- a/data-loading/setup_solr.sh +++ b/data-loading/setup_solr.sh @@ -13,11 +13,13 @@ # require SOLR_SERVER : "${SOLR_SERVER:?SOLR_SERVER must be set}" +echo "We are here with SOLR_SERVER='$SOLR_SERVER'" + # add collection -curl -X POST "http://$SOLR_SERVER/solr/admin/collections?action=CREATE&name=name_lookup&numShards=1&replicationFactor=1" +curl -X POST "$SOLR_SERVER/solr/admin/collections?action=CREATE&name=name_lookup&numShards=1&replicationFactor=1" # do not autocreate fields -curl "http://$SOLR_SERVER/solr/name_lookup/config" -d '{"set-user-property": {"update.autoCreateFields": "false"}}' +curl "$SOLR_SERVER/solr/name_lookup/config" -d '{"set-user-property": {"update.autoCreateFields": "false"}}' # add lowercase text type curl -X POST -H 'Content-type:application/json' --data-binary '{ @@ -34,7 +36,7 @@ curl -X POST -H 'Content-type:application/json' --data-binary '{ }] } } -}' "http://$SOLR_SERVER/solr/name_lookup/schema" +}' "$SOLR_SERVER/solr/name_lookup/schema" # add exactish text type (as described at https://stackoverflow.com/a/29105025/27310) curl -X POST -H 'Content-type:application/json' --data-binary '{ @@ -51,7 +53,7 @@ curl -X POST -H 'Content-type:application/json' --data-binary '{ }] } } -}' "http://$SOLR_SERVER/solr/name_lookup/schema" +}' "$SOLR_SERVER/solr/name_lookup/schema" @@ -126,7 +128,7 @@ curl -X POST -H 'Content-type:application/json' --data-binary '{ "type":"pint", "stored":true } - ] }' "http://$SOLR_SERVER/solr/name_lookup/schema" + ] }' "$SOLR_SERVER/solr/name_lookup/schema" # Add a copy field to copy names into names_exactish. curl -X POST -H 'Content-type:application/json' --data-binary '{ @@ -134,7 +136,7 @@ curl -X POST -H 'Content-type:application/json' --data-binary '{ "source": "names", "dest": "names_exactish" } -}' "http://$SOLR_SERVER/solr/name_lookup/schema" +}' "$SOLR_SERVER/solr/name_lookup/schema" # Add a copy field to copy preferred_name into preferred_name_exactish. curl -X POST -H 'Content-type:application/json' --data-binary '{ @@ -142,4 +144,4 @@ curl -X POST -H 'Content-type:application/json' --data-binary '{ "source": "preferred_name", "dest": "preferred_name_exactish" } -}' "http://$SOLR_SERVER/solr/name_lookup/schema" +}' "$SOLR_SERVER/solr/name_lookup/schema" diff --git a/solr-restore/restore.sh b/solr-restore/restore.sh index 2401424f..0f97e9fd 100644 --- a/solr-restore/restore.sh +++ b/solr-restore/restore.sh @@ -12,7 +12,7 @@ # This script should only require the `wget` program. # # TODO: This script does not currently implement any Blocklists. -set -xa +set -euo pipefail # Configuration options SOLR_SERVER="http://localhost:8983" @@ -31,11 +31,12 @@ until [ "$response" = "200" ]; do done echo "SOLR is up and running at ${SOLR_SERVER}." -# Step 3. Create fields for search. -source "../data-loading/setup_solr.sh" +# Step 2. Create fields for search. +SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" +source "$SCRIPT_DIR/../data-loading/setup_solr.sh" -# Step 4. Restore the data -CORE_NAME=${COLLECTION_NAME}_shard1_replica_n1 +# Step 3. Restore the data +CORE_NAME="${COLLECTION_NAME}_shard1_replica_n1" RESTORE_URL="${SOLR_SERVER}/solr/${CORE_NAME}/replication?command=restore&location=/var/solr/data/var/solr/data/&name=${BACKUP_NAME}" wget -O - "$RESTORE_URL" sleep 10 From 1e77a9198a76dba3c83f5f1aefbd811cf4a809e8 Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Fri, 6 Feb 2026 16:08:50 -0500 Subject: [PATCH 10/17] Cleaned up output. --- data-loading/setup_solr.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/data-loading/setup_solr.sh b/data-loading/setup_solr.sh index 68c7ae99..0ea2842f 100644 --- a/data-loading/setup_solr.sh +++ b/data-loading/setup_solr.sh @@ -13,7 +13,7 @@ # require SOLR_SERVER : "${SOLR_SERVER:?SOLR_SERVER must be set}" -echo "We are here with SOLR_SERVER='$SOLR_SERVER'" +echo "Setting up Solr database with SOLR_SERVER='$SOLR_SERVER'" # add collection curl -X POST "$SOLR_SERVER/solr/admin/collections?action=CREATE&name=name_lookup&numShards=1&replicationFactor=1" From 34fedac1877409bc064d12e3577e081b2db46717 Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Fri, 6 Feb 2026 16:17:21 -0500 Subject: [PATCH 11/17] Improved (?) restore.sh. --- solr-restore/restore.sh | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/solr-restore/restore.sh b/solr-restore/restore.sh index 0f97e9fd..1816048f 100644 --- a/solr-restore/restore.sh +++ b/solr-restore/restore.sh @@ -34,17 +34,20 @@ echo "SOLR is up and running at ${SOLR_SERVER}." # Step 2. Create fields for search. SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" source "$SCRIPT_DIR/../data-loading/setup_solr.sh" +echo Solr database has been set up. # Step 3. Restore the data CORE_NAME="${COLLECTION_NAME}_shard1_replica_n1" +echo "Starting Solr restore on core ${CORE_NAME}, with status at ${SOLR_SERVER}/solr/${CORE_NAME}/replication?command=restorestatus" RESTORE_URL="${SOLR_SERVER}/solr/${CORE_NAME}/replication?command=restore&location=/var/solr/data/var/solr/data/&name=${BACKUP_NAME}" wget -O - "$RESTORE_URL" sleep 10 -RESTORE_STATUS=$(wget -q -O - ${SOLR_SERVER}/solr/${CORE_NAME}/replication?command=restorestatus 2>&1 | grep "success") >&2 +RESTORE_STATUS_URL="${SOLR_SERVER}/solr/${CORE_NAME}/replication?command=restorestatus" +RESTORE_STATUS=$(wget -q -O - "$RESTORE_STATUS_URL" 2>&1 | grep "success") >&2 echo "Restore status: ${RESTORE_STATUS}" -until [ ! -z "$RESTORE_STATUS" ] ; do - echo "Solr restore in progress. Note: if this takes too long please check solr health." - RESTORE_STATUS=$(wget -O - ${SOLR_SERVER}/solr/${CORE_NAME}/replication?command=restorestatus 2>&1 | grep "success") >&2 +until [ -n "$RESTORE_STATUS" ] ; do + echo "Solr restore in progress. Note: if this takes too long please check Solr health." + RESTORE_STATUS=$(wget -O - "$RESTORE_STATUS_URL" 2>&1 | grep "success") >&2 sleep 10 done echo "Solr restore complete" From 4fadc2808dddbadcda1e43ea43afa5f0ffa61e81 Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Fri, 6 Feb 2026 16:23:42 -0500 Subject: [PATCH 12/17] Fixed some source code references. --- data-loading/setup-and-load-solr.sh | 4 +++- solr-restore/restore.sh | 5 +++-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/data-loading/setup-and-load-solr.sh b/data-loading/setup-and-load-solr.sh index b5160d19..db73b3e8 100755 --- a/data-loading/setup-and-load-solr.sh +++ b/data-loading/setup-and-load-solr.sh @@ -13,7 +13,9 @@ done echo "SOLR is up and running at ${SOLR_SERVER}." # Step 2. Create fields for search. -source "setup_solr.sh" +SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" +source "$SCRIPT_DIR/data-loading/setup_solr.sh" +echo Solr database has been set up. # Step 3. Load specified files. for f in $1; do diff --git a/solr-restore/restore.sh b/solr-restore/restore.sh index 1816048f..0efee5fe 100644 --- a/solr-restore/restore.sh +++ b/solr-restore/restore.sh @@ -12,7 +12,9 @@ # This script should only require the `wget` program. # # TODO: This script does not currently implement any Blocklists. -set -euo pipefail + +# We don't use set -e because the loop test relies on failures being ignored. +set -uo pipefail # Configuration options SOLR_SERVER="http://localhost:8983" @@ -44,7 +46,6 @@ wget -O - "$RESTORE_URL" sleep 10 RESTORE_STATUS_URL="${SOLR_SERVER}/solr/${CORE_NAME}/replication?command=restorestatus" RESTORE_STATUS=$(wget -q -O - "$RESTORE_STATUS_URL" 2>&1 | grep "success") >&2 -echo "Restore status: ${RESTORE_STATUS}" until [ -n "$RESTORE_STATUS" ] ; do echo "Solr restore in progress. Note: if this takes too long please check Solr health." RESTORE_STATUS=$(wget -O - "$RESTORE_STATUS_URL" 2>&1 | grep "success") >&2 From d9b637e5523d6055b15360d0a058a6dd7d21dbb9 Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Fri, 6 Feb 2026 16:45:46 -0500 Subject: [PATCH 13/17] Improved sleep stuff. --- data-loading/setup-and-load-solr.sh | 2 +- solr-restore/restore.sh | 17 +++++++++++------ 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/data-loading/setup-and-load-solr.sh b/data-loading/setup-and-load-solr.sh index db73b3e8..bb444aad 100755 --- a/data-loading/setup-and-load-solr.sh +++ b/data-loading/setup-and-load-solr.sh @@ -24,7 +24,7 @@ for f in $1; do # curl -X POST -T $f will stream it. See https://github.com/TranslatorSRI/NameResolution/issues/194 curl -H 'Content-Type: application/json' -X POST -T $f \ "$SOLR_SERVER/solr/name_lookup/update/json/docs?processor=uuid&uuid.fieldName=id&commit=true" - sleep 30 + sleep 60 done echo "Check solr" curl -s --negotiate -u: "$SOLR_SERVER/solr/name_lookup/query?q=*:*&rows=0" diff --git a/solr-restore/restore.sh b/solr-restore/restore.sh index 0efee5fe..29be2552 100644 --- a/solr-restore/restore.sh +++ b/solr-restore/restore.sh @@ -18,6 +18,7 @@ set -uo pipefail # Configuration options SOLR_SERVER="http://localhost:8983" +SLEEP_INTERVAL=60 # Please don't change these values unless you change NameRes appropriately! COLLECTION_NAME="name_lookup" @@ -43,12 +44,16 @@ CORE_NAME="${COLLECTION_NAME}_shard1_replica_n1" echo "Starting Solr restore on core ${CORE_NAME}, with status at ${SOLR_SERVER}/solr/${CORE_NAME}/replication?command=restorestatus" RESTORE_URL="${SOLR_SERVER}/solr/${CORE_NAME}/replication?command=restore&location=/var/solr/data/var/solr/data/&name=${BACKUP_NAME}" wget -O - "$RESTORE_URL" -sleep 10 +sleep "$SLEEP_INTERVAL" RESTORE_STATUS_URL="${SOLR_SERVER}/solr/${CORE_NAME}/replication?command=restorestatus" -RESTORE_STATUS=$(wget -q -O - "$RESTORE_STATUS_URL" 2>&1 | grep "success") >&2 +RESTORE_STATUS=$(wget -q -O - "$RESTORE_STATUS_URL" 2>&1 | grep "success") +RESTORE_STATUS="" until [ -n "$RESTORE_STATUS" ] ; do - echo "Solr restore in progress. Note: if this takes too long please check Solr health." - RESTORE_STATUS=$(wget -O - "$RESTORE_STATUS_URL" 2>&1 | grep "success") >&2 - sleep 10 + echo "Solr restore in progress. If this takes longer than 30 minutes, please visit ${SOLR_SERVER} with your browser to check Solr." + RESTORE_STATUS=$(wget -q -O - "$RESTORE_STATUS_URL" 2>&1 | grep "success") + sleep "$SLEEP_INTERVAL" done -echo "Solr restore complete" +echo "Solr restore complete!" + +echo "Solr contents:" +curl -s --negotiate -u: "$SOLR_SERVER/solr/name_lookup/query?q=*:*&rows=0" From 93aef57ccc9aa7db927f92294d158a97d4bfaa7d Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Fri, 6 Feb 2026 16:48:35 -0500 Subject: [PATCH 14/17] Fixed setup-and-load-solr.sh. --- data-loading/setup-and-load-solr.sh | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/data-loading/setup-and-load-solr.sh b/data-loading/setup-and-load-solr.sh index bb444aad..2b0a9bec 100755 --- a/data-loading/setup-and-load-solr.sh +++ b/data-loading/setup-and-load-solr.sh @@ -1,5 +1,9 @@ #!/usr/bin/env bash +# We don't use set -e because the loop test relies on failures being ignored. +set -uo pipefail + +# Configuration options SOLR_SERVER="http://localhost:8983" # Step 1. Make sure the Solr service is up and running. @@ -14,7 +18,7 @@ echo "SOLR is up and running at ${SOLR_SERVER}." # Step 2. Create fields for search. SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" -source "$SCRIPT_DIR/data-loading/setup_solr.sh" +source "$SCRIPT_DIR/setup_solr.sh" echo Solr database has been set up. # Step 3. Load specified files. From 1c4d9312e0be85043635b125c09d9b7910222fe9 Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Mon, 9 Feb 2026 16:52:14 -0500 Subject: [PATCH 15/17] Added information on the data/ directory volume mount. --- documentation/Deployment.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/documentation/Deployment.md b/documentation/Deployment.md index 8fc98fa0..52b38f59 100644 --- a/documentation/Deployment.md +++ b/documentation/Deployment.md @@ -28,13 +28,19 @@ instance or from Translator. the downloaded file (`snapshot.backup.tar.gz`) once it has been decompressed. 5. Check the [docker-compose.yml](./docker-compose.yml) file to ensure that it is as you expect. - * By default, the Docker Compose file will use the latest released version of NameRes + * The Docker Compose file will use the latest released version of NameRes as the frontend. To use the source code in this repository, you will need to change the build instructions for the `nameres` service in the Docker Compose file. - * By default, Solr will be given 16G of memory, which seems sufficient for testing. + * Solr will be given 16G of memory, which seems sufficient for testing. If you want to run many Solr queries, you might want to increase this. To do this, you will need to change BOTH the `mem_limit` setting in the `nameres_solr` service in `docker-compose.yml` and the `SOLR_JAVA_MEM` setting. + * The `docker-compose.yml` file also mounts the local `data/` directory into the Solr + container as `/var/solr`. This will allow you to start a new NameRes from the same + directory in the future. If you want to use a different directory, please change + the `volumes` setting in the `nameres_solr` service in `docker-compose.yml`. Removing + the binding will cause the Solr data to be stored in the Docker instance, and the + data will be lost when the container is stopped. 6. Start the Solr and NameRes pods by running `docker compose up`. By default, Docker Compose will download and start the relevant pods and show you logs from both sources. You may press `Ctrl+C` to stop the pods. From 479276dc85fe2b2449116500546e3f3a376946f2 Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Mon, 9 Feb 2026 16:59:48 -0500 Subject: [PATCH 16/17] Made something a bit more explicit. --- documentation/Deployment.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/Deployment.md b/documentation/Deployment.md index 52b38f59..d4ba344e 100644 --- a/documentation/Deployment.md +++ b/documentation/Deployment.md @@ -21,7 +21,7 @@ instance or from Translator. storage of approx 400G: 104G of the downloaded file (which can be deleted once decompressed), 147G of uncompressed backup (both of which can be deleted once restored) and 147G of Apache Solr databases. -3. Download the Solr backup URL you want to use into your Solr data directory. It should be +3. Download the Solr backup URL you want to use and save it in `./data/solr`. It should be approximately 104G in size. 4. Uncompress the Solr backup file. It should produce a `var/solr/data/snapshot.backup` directory in the Solr data (by default, `./data/solr/var/solr/data/snapshot.backup`). You can delete From fb1c728d770b2fd2364aef1c0f4a0c16ce7d8444 Mon Sep 17 00:00:00 2001 From: Gaurav Vaidya Date: Mon, 9 Feb 2026 17:21:37 -0500 Subject: [PATCH 17/17] Add a test for the Solr directory. The user might have stored this elsewhere, but at least it'll warn people if they uncompressed the file into the wrong place. --- solr-restore/restore.sh | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/solr-restore/restore.sh b/solr-restore/restore.sh index 29be2552..4bc6133c 100644 --- a/solr-restore/restore.sh +++ b/solr-restore/restore.sh @@ -24,6 +24,11 @@ SLEEP_INTERVAL=60 COLLECTION_NAME="name_lookup" BACKUP_NAME="backup" +# Step 0. Make sure the Solr data directory looks like it contains the uncompressed backup. +if [ ! -d "./data/solr/var" ]; then + echo 'WARNING: No ./data/solr/var directory found; are you sure you uncompressed the NameRes backup into the Solr data directory?' >&2 +fi + # Step 1. Make sure the Solr service is up and running. HEALTH_ENDPOINT="${SOLR_SERVER}/solr/admin/cores?action=STATUS" response=$(wget --spider --server-response ${HEALTH_ENDPOINT} 2>&1 | grep "HTTP/" | awk '{ print $2 }') >&2