Integrated Mitochondrial Protein Index (IMPI)
Knowing the mitochondrial proteome is essential for studying mitochondrial function and disease. Clues to a protein's cellular localisation can be gathered from a range of sources such as the presence of a mitochondrial targeting sequence or its identification by mass spectrometry in a mitochondrial proteomics study. For a given gene, all or none of these evidence types may be present to some degree while other non-mitochondrial genes will have been identified erroneously in these datasets. Therefore, to make the best prediction for a given gene an integrative approach must be taken.
We develop the Integrated Mitochondrial Protein Index (IMPI), a collection of genes that encode proteins with strong evidence for cellular localisation within the mammalian mitochondrion. It is created and updated using multiple types of evidence, including novel data types not previously used in defining the mitochondrial proteome, such as presence in manually curated metabolic models, antibody data from the Human Protein Atlas, predictions from four mitochondrial targeting sequence programs, and extensive experimental data from over 52 GFP and mass spectrometry localisation studies. Evidence is shared between the orthologs of organisms by using the mapping of Ensembl’s COMPARA. A machine learning classifier (support vector machines) is used to determine if proteins are mitochondrial after training on collections of characterised mitochondrial and non-mitochondrial proteins to produce the IMPI dataset. We use the machine learning algorithm to select those proteins that have similar properties and evidence to already characterised mitochondrial proteins.
Our work identifies 1550 genes in human (1130 known and 420 predicted from experimental
data), that encode proteins with strong evidence for mitochondrial localisation. Compared to the MitoCarta collection of mitochondrial proteins, our datasets have a few hundred more proteins and are available as the MRC’s Integrated Mitochondrial Protein Index (IMPI).
A clinical outcome of our IMPI data sets is in exome sequencing of mitochondrial disease patients; we can identify novel disease genes not previously recognised to be mitochondrial. A further outcome is that other scientists have searched IMPI to identify novel mitochondrial proteins relevant to their research.