Integrated Mitochondrial Protein Index (IMPI)
Knowing the mitochondrial proteome is essential for studying mitochondrial function and disease. Clues to a protein's cellular localisation can be gathered from a range of sources such as the presence of a mitochondrial targeting sequence or its identification by mass spectrometry in a mitochondrial proteomics study. For a given gene, all or none of these evidence types may be present to some degree and to make the best prediction for a given gene an integrative approach must be taken.
We develop the Integrated Mitochondrial Protein Index (IMPI), a collection of genes that encode proteins with strong evidence for cellular localisation within the mammalian mitochondrion. It is created and updated using multiple types of evidence, including novel data types not previously used in defining the mitochondrial proteome, such as presence in manually curated metabolic models, evidence from biotinylation studies, antibody data from the Human Protein Atlas, predictions from four mitochondrial targeting sequence programs, and extensive experimental data from over 52 GFP and mass spectrometry localisation studies. Evidence is shared between the orthologs of organisms by using the mapping of Ensembl’s COMPARA. A machine learning classifier (random forests) is used to determine if proteins are mitochondrial after training on collections of characterised mitochondrial and non-mitochondrial proteins to produce the IMPI dataset. We use the machine learning algorithm to select those proteins that have similar properties and evidence to already characterised mitochondrial proteins. This approach solves the problem of deciding an arbitrary threshold for what level of evidence can be considered "mitochondrial". Our work identifies 1408 Ensembl genes in human coding for mitochondrially localised proteins. Compared to the MitoCarta collection of mitochondrial proteins, our datasets have a few hundred more proteins and are available as the MRC’s Integrated Mitochondrial Protein Indices (IMPI).
A clinical outcome of our IMPI data sets is in exome sequencing of mitochondrial disease patients; we can identify novel disease genes not previously recognised to be mitochondrial. A further outcome is that other groups have searched IMPI data sets for novel mitochondrial proteins relevant to their research