Skip to content

Conversation

@jmonlong
Copy link
Contributor

With this, sequenceTubeMap could work with tabix-indexed files representing the pangenome and use a python script to query a subgraph fast. For the HPRC MC v1.1 pangenome, it takes, on average, less than a second to query a region, versus ~30s currently with vg chunk. All haplotypes in the pangenome can be queried.

More details on the tabix indexing and this subgraph extraction in https://github.com/jmonlong/manu-vggafannot

I've tried to document what are the new index files, how to use them and how to make them in a new README.tabix.md file.

This branch also contained minor other changes, like

  • Fixing a problem when toggling node transparency
  • Making tracks transparent according to their mapq value

@adamnovak
Copy link
Member

@jmonlong I did the back-merge from master, but it looks like as with #460 the tests (npm run test -- --watchAll=false) don't pass for TrackPickerDisplay, and the schema change from there is still needed.

It also looks like the tabix-backed codepath can't mix at all with anything on the vg-backed codepath, so I can't use a tabix-backed graph to view a GAM or a small unindexed GFA, and I can't use the simplify switch that calls vg simplify. What's the right way to communicate that to users in the UI?

I also don't think this will work right with the file upload feature, since we can only upload one file per track but Tabix needs two (data and index). We might need to adjust the way that works to allow uploading multiple files per track.

It should be possible to make this work great with remote data URLs, by doing range reads on them directly, but I'm not sure if the tabix command line tool can do web requests itself.

@adamnovak
Copy link
Member

Instead of a "node track" and a "graph track", it might make more sense to present this feature as a graph track that consists of four files: the positions and nodes files and their indexes.

Is it reasonable to imagine drawing a view with only those four files, and no haplotypes? It looks like that is locked out right now, probably because without the hapolotypes file there are no paths/edges at all, and the tube map needs those to draw anything.

To support that we might need to get rid of the 1 to 1 connection between having a haplotypes track file and displaying the haplotype paths. Someone might want to look at a tabix-backed graph but only see the reference paths in a particular view, but the haplotype database still needs to be included to see anything.

GBZ is a little like this because it has the haplotype data in the same file as the graph, and we fake it by having the GBZ file provide the graph track and also having it separately as the source of a haplotype track that can turn on and off.

Maybe we need to change the track model so that tracks come from databases, and databases are sets of n files that offer m tracks that you can toggle on and off.

@adamnovak adamnovak merged commit ad62f98 into master Jul 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants