Shape analysis of Zebrafish scales.
Fish scales are bones - the shape of a fish's scales tells us about bone growth in the organism. We can use this to measure the effect of mutations, age, sex, etc. on bone growth and regeneration.
- Install
uv. - Run
uv syncto install the environment. - Run the scripts in
scripts. More detail is below.
This is very much still an exploratory codebase, so there's lots of half-finished analysis
scripts, old ways of doing things and unused code around. The scripts in scripts/ should
do everything you need to just do the analysis, but there's lots of other bits around too.
Probably don't worry about them too much, I just didn't have time to tidy everything up.
Step 1: segmentation
The zeroth thing to do is to take some pictures with a microscope or something and save them somewhere, e.g. the RDSF.
Once you have done that, you will want to segment the scales out. We can do this mostly-automatically, using the Segment Anything model.
The script to do this will:
- read in scale images from the directory you point it at
- attempt to find a "prior" for the segmentation
- use the segmentation model (and the prior) to segment out the scale
By default, this script runs on a GPU (because this is much faster than using a CPU).
If you don't have a GPU on your laptop then you could try running this on a remote machine
that does have one - the biomed group has access to the scampi server which you can get
access to by asking your PI.
Run the script as follows:
uv run scripts/1-automatic_segmentation.py my_scale_dir/ my_segmentations_dir/
The following options can also be provided:
-h,--help: print help message--prior-type: whether we are segmenting ALP stained or confocal image scales.--debug-plot-dir: if provided, will plot some extra outputs (like the prior) here--device: defaults tocuda, but can also be set tocpu.
A complete example is:
uv run scripts/1-automatic_segmentation.py my_scale_dir/ my_segmentations_dir/ --prior-type confocal --debug-plot-dir my_debug_plots/ --device cpu
The segmentation sometimes doesn't "just work" if we simply pass the raw scales to the model. This is because there's often dark/light parts to the scale (the bits that overlap/don't) and the model detects these as different objects.
To get around this, we pass a "prior" to the model - this is some extra information that it can use to decide where the object of interest is in the image. This might be some points that we demand are contained within the segmentation mask, or a bounding box around it (or both).
For the confocal image, the prior is simple - we simply assume that the scale is in the middle
of the image, and define five points that we will enforce are in the segmentation mask.
The prior points might be too close/far apart depending on the magnification of your images - you'll have to change a number
in the script to change this.
The ALP prior is more complicated; we first do a rough segmentation by attempting to threshhold out the scale, and then use this to build our prior. In order:
- We make a rough segmentation of the scale by thresholding
- We attempt to clean this segmentation by filling holes, removing small objects and removing objects touching the border of the image
- We sample some points randomly from this mask
- We find the bounding box of this mask
The points and bounding box from steps 3. and 4. are used as the prior.

Confocal segmentation was just something we quickly tried once. It might not work well.
Step 2: cleaning segmentations
Once you have some rough segmentations by running step 1, you will want to clean them. From experience, around 10% of the automatic segmentation masks will have some problem that needs to be corrected - this might be because the AI model misidentified the object in the image, because the boundary was not clear or because some other object (a bubble, hair, blob of loose dye) got caught up in the segmentation too.
This script will open a GUI for interactively cleaning the scale segmentations: it opens a MS-paint like interface where you can draw/erase/fill/etc. the segmentation mask in an interactive way.
This is what it looks like:
To open the GUI, run:
uv run scripts/2-clean_segmentations.py --img_dir my_img_dir/ --mask_dir my_mask_dir --output_dir my_output_dir/
You will need to run this locally - i.e. on your laptop, not on a remote
server like scampi (since it might not have a display).
It's probably possible to get this to work on a remote machine with X11
forwarding, but I haven't tried.
Once you have run this and cleaned the segmentations, you should have a directory of nice clean segmentations. This is what we will use for the analysis.
Step 3: run a quick analysis
We care about the shape of our scales; a simple way to quantify these is by taking some simple features (size and aspect ratio) and plotting them on a scatter plot.
To do this, starting from a directory of cleaned segmentations:
uv run scripts/3-quick_ellipse_analysis.py my_clean_seg_dir/ sex
When making these plots, we need to somehow find the sex/age/onto-regen status/... of the fish. We do this by looking at the filename - the scales (usually) follow a naming convention that allows us to extract the
If you only want to run on a subset of the data (e.g plot a scatter plot of age showing only female fish), you'll have to change the source code to just pick out the right scales for you. It isn't a difficult change, but it still does require changing the code.
Step 4: run the EFA analysis
This is where the code gets a bit more "exploratory".
The idea here is that, once we have segmented the scales out, we can use a Elliptic Fourier Analysis (EFA) to fully describe their shape and then use PCA/LDA to tell us, in an interpretable way, which features of the scale vary most with some metadata like sex/age/etc.
The script runs through this pipeline and makes some plots. The mechanics of the EFA parameterisation happens in the library code; I have spent some time making sure that the parameterisation is independent of the scale's location/rotation/starting point of the contour, and that all the coefficients (bar the first) are independent of the scale's size. This, however, does mean that the coefficients are in a slightly non-standard form.
EFA Parameterisation
We can parameterise an ellipse with four coefficients (intuitively this would be the centroid and minor/major axes lengths).
In EFA, we describe our shape as a sum of ellipses (effectively) - here's some formulae:
$$ \begin{aligned} x(t) &= a_0 + \sum_{n=1}^{N} \big[a_n \cos(n t) + b_n \sin(n t)\big],\ y(t) &= c_0 + \sum_{n=1}^{N} \big[c_n \cos(n t) + d_n \sin(n t)\big], \qquad t \in [0, 2\pi]. \end{aligned} $$
with:
$$ \begin{aligned} a_0 = \frac{1}{2\pi}\int_{0}^{2\pi} x(t),dt,\qquad c_0 = \frac{1}{2\pi}\int_{0}^{2\pi} y(t),dt. \end{aligned} $$
$$ \begin{aligned} a_n &= \frac{1}{\pi}\int_{0}^{2\pi} x(t)\cos(n t),dt, & b_n &= \frac{1}{\pi}\int_{0}^{2\pi} x(t)\sin(n t),dt,\ c_n &= \frac{1}{\pi}\int_{0}^{2\pi} y(t)\cos(n t),dt, & d_n &= \frac{1}{\pi}\int_{0}^{2\pi} y(t)\sin(n t),dt. \end{aligned} $$
The
This means that we can describe our shape with a vector like:
However, a few of these parameters are redundant - once we normalise for rotation the
This gives us our final EFA parameterisation convention:
Intuitively,
Other normalisations include rotating each harmonic to have a relative phase of 0 and the starting point of the contour to be the same - see the code for details.
New to uv?
uv is a python package manager that makes setup easy - like conda, it manages virtual environments, but is much faster and easier to use.Download
uvfrom here.
Mounting the RDSF
The RDSF (Research Data Storage Facility) is Bristol Uni's research data repository.You will need to be added to the Zebrafish_Osteoarthritis project in order to read data from it - ask your PI for access if you don't have this already.
There are several ways for users to access the RDSF; if you're on Linux I recommend:
gio mount smb://rdsfcifs.acrc.bris.ac.uk/Zebrafish_Osteoarthritis
It will then prompt you for:
- username (use your UoB username)
- workgroup (
UOB) - password ()
If you're on Windows, I recommend using installing WSL (Windows Subsystem for Linux), cloning this project there and using the Linux CLI for this project.
If you really want to use Windows, at use the git bash terminal.
An example
The real files used for the analysis live in this directory, which go through the segmentation/cleaning/analyses steps.
An example workflow that uses some example data can be found in
src/scale_morphology/test/example_workflow.py: this runs through
the EFA/PDA/LDA workflow using a toy dataset of different
shapes:

We perform EFA on these, and can find the (e.g.) ten most
descriptive axes in terms of the EFA coefficients:

We then finally run LDA on the PCA coefficients; this is far less noisy and prone to overfitting than running LDA on the raw EFA coefficients, but keeps most of the variation:

