Skip to content
Michal J. Gajda edited this page Oct 28, 2020 · 1 revision

Haskell activities report

2011, Fall

Under the Biohaskell umbrella, we provide a number of libraries and programs designed to solve bioinformatics problems. Regarding the diverse problems that are being tackled by the Haskell bioinformatics community, instead of providing one monolithic library, we opted for a number of interconnected, smaller ones, designed to solve one or a small set of problems, each.

Currently, libraries are available for sequence related problems and (Ketil, ...):

  • Secondary structure prediction of single RNA molecules (a Haskell port of the Vienna RNAfold program, a polynomial-time version of the MC-fold pipeline, and a novel algorithm, RNAwolf, for extended secondary structure prediction). This includes the ability to work with different file formats, representing canonical RNA structures and non-canonical extensions.

  • non-coding RNA prediction via Infernal and Rfam is supported with a set of programs, both on the level producing models, and interpreting results, as well as towards the evaluation of covariance models independent of specific input.

2014, May, 2nd paragraph

BioHaskell now contains a fledgling collection of libraries for structural analysis of biomolecules. hPDB is a parallel parser of Protein DataBank file format, which parses the largest structures faster than single-threaded Python code, and faster than parallel BioJava parser. It uses an octree data structure for fast querying of atom positions and contacts, and |Iterable| class to allow for iteration over deeply contained objects within hierarchical collections. The author of the library, Michal J. Gajda plans to soon release other libraries for processing biomolecular data, in particular parseSTAR for parsing STAR* format used for BMRB database of nuclear magnetic resonance data, and hDat library for processing small-angle scattering data.

Specialized Algorithms for the design of RNA sequences given structural constraints are available in the RNAdesign package. Highly specialized alignment algorithms for both, computational biology and linguistics, can be easily constructed with the FormalGrammars and GrammarProducts packages described in Sec. 7.3.3. The DnaProteinAlignment package provides a showcase, allowing for the correct alignment of amino acid sequences to genomes in the case of massive transcriptional editing.

[do not copy to hcar:

  • Killed first word. ]
Clone this wiki locally