We have two datasets that have "passed" over the BioSample XML files, there are some discrepancies between them, we should take the union of the parsed coordinates.
Set 1: VL geocoded_all is on the SQL server.
Set 2: GN df_final_lat_lon.csv which is the product of some manual labour. Download from S3: s3://serratus-public/tmp/df_final_lat_lon.csv