Adding Semantics to Genome Databases: Towards an Ontology for Molecular Biology

Steffen Schulze-Kremer

Molecular biology has a communication problem. There are many databases using their own labels and categories for storing data objects and some using identical labels and categories but with a different meaning. Conversely, one concept is often found under different names. Prominent examples are the concepts "gene" and "protein sequence" which are used with different semantics by major international genomic and protein databases thereby making database integration difficult and strenuous. This situation can only be improved by either defining individual semantic interfaces between each pair of databases (complexity of order n*n) or by implementing one agreeable, transparent and computationally tractable semantic repository and linking each database to it (complexity of order n).

