Skip to content

Conversation

@jgolob
Copy link
Collaborator

@jgolob jgolob commented Oct 31, 2017

This is just a different way to accomplish the same task. I liked it because it leveraged SCONS to at least let me know if / when one of the steps in making the placements.db failed.

This particularly matters with guppy (at least the publicly available version) being flaky (in which a placements.db is created on the filesystem, but guppy dies during the classification step).

@nhoffman
Copy link
Member

nhoffman commented Nov 2, 2017

I'd be game for separating out the guppy classify step from other steps. However, the boundaries that you draw between other steps seem artificial since multicalss_concat.py is already doing a whole bunch of stuff within the script - why not just extend that script to include ingesting the adcl and seq_info tables? I'm actually hoping to rely less on the database. I am going to dig into what multiclass concat is actually doing - reduplication in the database seems redundant since the dada2 inputs already provide weights. I think that the initial design anticipated that the database would be used directly for all sorts of analyses, but the reality has been that most manipulations happen in pandas or R downstream. So it may not even be worthwhile to perform any of the steps after guppy classify, and instead extract a single table with classifications of individual sequence variants from the database, and perform additional manipulations elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants