I am finding distances (euclidean) on a subset of the 821 vases to see if I can cluster them into smaller groups. I have done this before, but this time --- because I am dealing with a real heterogeneity of shapes -- I noticed something disturbing.
I standardize my vases by size and position: a kind of pseudo-registration. For each vase, I have a bunch of y values: the particular y values differ between the vases (since they were originally from very different image sizes). But to make a distance matrix you need to compare the same variables (y values) across your vases. To do that I bin the y values. But that doesn't quite work because there are big "gaps" in the y values. See below:

The points are the actual y values, the lines connect them. You can see that, for many vases, there are big bits where there is a line but no point. That's where you chopped off a handle or something: you just drew a line. That's fine: but when I "bin" the y values, I get a string of "NA"s there; and the distance matrix function does not like that. And I am not sure how to fix it: interpolate?
How do you deal with this when estimating your distances --- or does the problem simply not arise since you're working with SRVs or whatever?
I am finding distances (euclidean) on a subset of the 821 vases to see if I can cluster them into smaller groups. I have done this before, but this time --- because I am dealing with a real heterogeneity of shapes -- I noticed something disturbing.
I standardize my vases by size and position: a kind of pseudo-registration. For each vase, I have a bunch of y values: the particular y values differ between the vases (since they were originally from very different image sizes). But to make a distance matrix you need to compare the same variables (y values) across your vases. To do that I bin the y values. But that doesn't quite work because there are big "gaps" in the y values. See below:
The points are the actual y values, the lines connect them. You can see that, for many vases, there are big bits where there is a line but no point. That's where you chopped off a handle or something: you just drew a line. That's fine: but when I "bin" the y values, I get a string of "NA"s there; and the distance matrix function does not like that. And I am not sure how to fix it: interpolate?
How do you deal with this when estimating your distances --- or does the problem simply not arise since you're working with SRVs or whatever?