Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes
1 Biometris, Wageningen University and Research Centre, 6700AC Wageningen, The Netherlands
2 Applied Bioinformatics, Plant Research International, Wageningen University and Research Centre, 6700AC Wageningen, The Netherlands
3 Current address: Functional Genomics, Nestlé Institute of Health Sciences, Campus EPFL, Quartier de l’Innovation, 1015 Lausanne, Switzerland
Algorithms for Molecular Biology 2013, 8:10 doi:10.1186/1748-7188-8-10Published: 27 March 2013
Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to belong to a detailed functional class, but not in a broader class that, due to the vocabulary structure, includes the predicted one.
We present a novel discrete optimization algorithm called Functional Annotation with Labeling CONsistency (FALCON) that resolves such contradictions. The GO is modeled as a discrete Bayesian Network. For any given input of GO term membership probabilities, the algorithm returns the most probable GO term assignments that are in accordance with the Gene Ontology structure. The optimization is done using the Differential Evolution algorithm. Performance is evaluated on simulated and also real data from Arabidopsis thaliana showing improvement compared to related approaches. We finally applied the FALCON algorithm to obtain genome-wide function predictions for six eukaryotic species based on data provided by the CAFA (Critical Assessment of Function Annotation) project.