A unified search system for biological imaging databases. Aggregates study-level metadata from multiple sources into a single searchable index.
Requires Python 3.13+ and uv.
uv syncTo run, requires some local continer runtime with docker, e.g. colima.
There are two main areas of the package:
- A data transformation pipeline, to create ro-crates, produce a list of elastic search documents, and index them.
- Code to control an elasticsearch index, and an api to serve data from elasticsearch.
There are three steps to get data into elastic search:
- Transforming data from a source repository into a valid GIDE ro-crate.
- Transforming a GIDE ro-crate into a list of indexable documents
- Indexing these with elasticsearch
You can see the list of commands by running
uv run gide-search data --helpA typical sequence could look like:
uv run gide-search data generate-bia-rocrate
uv run gide-search data transform-to-index This should generate ro-crates under output/ro-crate/ and an index document under output/index/.
If elasticsearch is running (commands for that below) you can index this with:
uv run gide-search data indexA local ElasticSearch instance is required for indexing and searching.
Quick start (macOS with colima):
Start your local container runtime, e.g. for colima:
colima startThen you can start elastisearch with:
docker run -d \
--name elasticsearch \
-p 9200:9200 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
-e "http.cors.enabled=true" \
-e "http.cors.allow-origin=http://localhost:1358" \
-e "http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization" \
-e "http.cors.allow-credentials=true" \
docker.elastic.co/elasticsearch/elasticsearch:8.11.0These settings are used so you can also start Dejavu with the following command. Dejava is a browser based elasticsearch inspector.
docker run -p 1358:1358 -d appbaseio/dejavuYou can connect to ES with url: http://elastic:test@localhost:9200 and index: *, though note this will fail if no data has been indexed. To index the data in output/index/index.json:
uv run gide-search data indexTo search using elasticsearch directly (e.g. for debugging), you can use:
uv run gide-search search "mouse"To setup the api, run:
uv run gide-search serveYou can then search it with cli command:
uv run gide-search search-api "mouse"or directly use the endpoints
| Endpoint | Description |
|---|---|
GET /health |
Health check |
GET /search |
Search with facets |
GET /study/{id} |
Get single study |
GET /docs |
OpenAPI documentation |
| Parameter | Type | Description |
|---|---|---|
q |
string | Free-text search query |
publisher |
string[] | Filter by source (IDR, SSBD, BIA) |
organism |
string[] | Filter by organism name |
imaging_method |
string[] | Filter by imaging method |
year_from |
int | Filter by release year (from) |
year_to |
int | Filter by release year (to) |
size |
int | Results per page (1-100, default 20) |
offset |
int | Result offset for pagination |
Example:
curl "http://localhost:8000/search?q=fluorescence&source=BIA&size=5"Start ElasticSearch locally using colima (macOS):
# Start colima (container runtime)
colima start
# Run ElasticSearch
docker run -d \
--name elasticsearch \
-p 9200:9200 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
docker.elastic.co/elasticsearch/elasticsearch:8.11.0
docker run -d \
--name elasticsearch \
-p 9200:9200 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
-e "http.cors.enabled=true" \
-e "http.cors.allow-origin=http://localhost:1358" \
-e "http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization" \
-e "http.cors.allow-credentials=true" \
docker.elastic.co/elasticsearch/elasticsearch:8.11.0
# Verify it's running
curl http://localhost:9200/_cluster/health
# Stop when done
docker stop elasticsearch
colima stop# Run with uv
uv run gide-search --help
# Run Python interactively
uv run python