FENNEC: Functional Exploration of Natural Networks and Ecological Communities

Assessment of species composition in ecological communities and networks is an important aspect of biodiversity research. Yet, for many ecological questions the ecological properties (traits) of organisms in a community are more informative than their scientific names. Furthermore, other properties like threat status, invasiveness, or human usage are relevant for many studies, but they can not be directly evaluated from taxonomic names alone. Despite the fact that various public databases collect such trait information, it is still a tedious manual task to enrich existing community tables with those traits, especially for large data sets. For example, nowadays, meta-barcoding or automatic image processing approaches are designed for high-throughput analyses, yielding thousands of taxa for hundreds of samples in very short time frames.

We developed the FENNEC, a web-based workbench that eases this process by mapping publicly available trait data to the user’s community tables in an automated process. We run a public instance holding traits that cover a range of topics includeing specialization, invasiveness, vulnerability, and agricultural relevance. Scientists are free to use the FENNEC as a resource for their ecological research.

Website: https://fennec.molecular.eco

Freely available at GitHub:  https://github.com/molbiodiv/fennec

Preprint: https://www.biorxiv.org/content/early/2017/09/27/194308

meta-barcoding marker demultiplexing

A single script! What it does: demultiplexing of metabarcoding data which consists of multiple markers.

  • Data must have followed a library preparation/sequencing strategy which includes sequencing of the forward primers.
  • Data must be demultiplexed for samples already.
  • Sequence data must be in forward orientation.

The categorization is based on Hidden Markov Model (HHMs) hits of the forward primer within the first 20 bp. This is very fast, and allows high throughput of the data.

GitHub: https://github.com/molbiodiv/meta-barcoding-marker-demultiplex

AliTV – Alignment Toolbox and Visualization

The comparison of genome structures of organisms can yield interesting insights into evolutionary processes. In order to do the comparison, whole genome alignments are required. However, the interpretation of whole genome alignments is difficult without proper visualization. AliTV utilizes d3.js to create interactive visualizations of whole genome alignments.

Example visualizations including the alignment of seven chloroplast genomes are available online.

Freely available at GitHub:  https://github.com/AliTVTeam/AliTV

Publication: https://peerj.com/articles/cs-116/

TBro: visualization and management of de novo transcriptomes

A web based transcriptome browser suitable for de novo transcriptomics. It has been used to analyze the Venus Flytrap transcriptome.

TBro is a web application that allows biologists to browse the vast amount of data generated by RNA-seq experiments. Powerful search options exist to find transcripts of interest. All information for each transcript is aggregated on a single page. Transcripts of interest can be organized in carts and analyzed together.

Freely available at GitHub:  https://github.com/TBroTeam/TBro

Publication: https://academic.oup.com/database/article/doi/10.1093/database/baw146/2742073

biojs-io-biom: A JavaScript library for handling data in Biological Observation Matrix (BIOM) format.

This library provides an easy to use interface to interact with data in BIOM format. The library itself is written using ES6 and is tested with Mocha. In order to provide compatibility with both versions 1.0 and 2.1 of the BIOM format a lightweight conversion server has been developed. You can find a public instance of the conversion server here.

Freely available at GitHub:  https://github.com/molbiodiv/biojs-io-biom

Publication: https://f1000research.com/articles/5-2348/v2

bcgTree: automatized phylogenetic tree building from bacterial core genomes

The need for multi-gene analyses in scientific fields such as phylogenetics and DNA barcoding has increased in recent years. In particular, these approaches are increasingly important for differentiating bacterial species, where reliance on the standard 16S rDNA marker can result in poor resolution. Additionally, the assembly of bacterial genomes has become a standard task due to advances in next-generation sequencing technologies. We created a bioinformatic pipeline, bcgTree, which uses assembled bacterial genomes either from databases or own sequencing results from the user to reconstruct their phylogenetic history. The pipeline automatically extracts 107 essential single-copy core genes, found in a majority of bacteria, using hidden Markov models and performs a partitioned maximum-likelihood analysis.

Freely available at GitHub:  https://github.com/molbiodiv/bcgTree

Publication: http://www.nrcresearchpress.com/doi/abs/10.1139/gen-2015-0175


Pollen/Plant ITS2 reference set for the RDP/UTAX classifier (2015)

Meta-barcoding of mixed pollen samples constitutes a suitable alternative to conventional pollen identification via light microscopy. Current approaches however have limitations in practicability due to low sample throughput and/or inefficient processing methods, e.g. separate steps for amplification and sample indexing.

We thus developed a new primer-adapter design for high throughput sequencing with the Illumina technology that remedies these issues. It uses a dual-indexing strategy, where sample-specific combinations of forward and reverse identifiers attached to the barcode marker allow high sample throughput with a single sequencing run. It does not require further adapter ligation steps after amplification. We applied this protocol to 384 pollen samples collected by solitary bees and sequenced all samples together on a single Illumina MiSeq v2 flow cell. According to rarefaction curves, 2,000–3,000 high quality reads per sample were sufficient to assess the complete diversity of 95% of the samples. We were able to detect 650 different plant taxa in total, of which 95% were classified at the species level. Together with the laboratory protocol, we also present an update of the reference database used by the classifier software, which increases the total number of covered global plant species included in the database from 37,403 to 72,325 (93% increase).

This study thus offers improvements for the laboratory and bioinformatical workflow to existing approaches regarding data quantity and quality as well as processing effort and cost-effectiveness. Although only tested for pollen samples, it is furthermore applicable to other research questions requiring plant identification in mixed and challenging samples.

Reference: Sickel W, M Ankenbrand, G Grimmer, A Holzschuh,S Härtel, J Lanzen, I Steffan-Dewenter, A Keller (2015) Increased efficiency in identifying mixed pollen samples by meta-barcoding with a dual-indexing approach. BMC Ecology 15: 20

Github: https://github.com/molbiodiv/meta-barcoding-dual-indexing

Laboratory rearing of solitary bees and wasps

Ecological experiments often require standardized methods that exclude natural variation and allow manipulation of a single parameter. It has been shown that domesticated honey bee larvae are raisable in a controlled environment. Here we demonstrate that this approach is also transferable to wild solitary bees and wasps without inducing negative effects on their development. Wells may also be supplemented with the antibiotic substance oxytetracycline to control the presence of bacteria. The method thus provides a useful tool to investigate offspring recruitment and larval development in solitary bees and wasps, plus their responses to manipulation of factors as for example diets, toxins and microbiota.

Reference: Becker, M., and Keller, A. (2016) Laboratory rearing of solitary bees and wasps, Insect Science 23, 918.



ITS2 database update V (with Dept. of Bioinformatics)

The internal transcribed spacer 2 (ITS2) is a well-established marker for phylogenetic analyses in eukaryotes. A reliable resource for reference sequences and their secondary structures is the ITS2 database (http://its2.bioapps.biozentrum.uni-wuerzburg.de/). However, the database was last updated in 2011. Here, we present a major update of the underlying data almost doubling the number of entities. This increases the number of taxa represented within all major eukaryotic clades. Moreover, additional data has been added to underrepresented groups and some new groups have been added. The broader coverage across the tree of life improves phylogenetic analyses and the capability of ITS2 as a DNA barcode.