Integrated Mitochondrial Protein Index (IMPI)
Knowing the mitochondrial proteome is essential for studying mitochondrial function and disease. Clues to a protein's cellular localisation can be gathered from a range of sources such as the presence of a mitochondrial targeting sequence or its identification by mass spectrometry in a mitochondrial proteomics study. For a given gene, all or none of these evidence types may be present to some degree while other non-mitochondrial genes will have been identified erroneously in these datasets. Therefore, to make the best prediction for a given gene an integrative approach must be taken.
We develop the Integrated Mitochondrial Protein Index (IMPI), a collection of genes that encode proteins with strong evidence for cellular localisation within the mammalian mitochondrion. It is created and updated using multiple types of evidence, including novel data types not previously used in defining the mitochondrial proteome, such as presence in manually curated metabolic models, antibody data from the Human Protein Atlas, conservation of predictions from four mitochondrial targeting sequence programs in four species, extensive experimental data from over 70 GFP and mass spectrometry localisation studies, protein-protein interactions, enrichment of mitochondrial RNAs binding CLUH, homologs in the amitochondriate Monocercomonoides sp., and literature/knowledge mining. Evidence is shared between the orthologs of organisms by using the mapping of Ensembl’s COMPARA. A machine learning classifier (support vector machines) is used to determine if proteins are mitochondrial after training on collections of characterised mitochondrial and non-mitochondrial proteins to produce the IMPI dataset. We use the machine learning algorithm to select those proteins that have similar properties and evidence to already characterised mitochondrial proteins.
IMPI-2018-Q2: Our work identifies 1626 genes in human (1184 known and 442 predicted from experimental
data), that encode proteins with strong evidence for mitochondrial localisation. Compared to the MitoCarta collection of mitochondrial proteins, our datasets have a few hundred more proteins and are available as the MRC’s Integrated Mitochondrial Protein Index (IMPI).
IMPI-2020-Q3pre: A pre-release of the new IMPI-2020 database including:
- 1330 proteins reported to have a mitochondrial localisation;
- 328 proteins reported to affect mitochondrial function, morphology and dynamics, but lacking conclusive localisation; and
- 511 proteins not reported as mitochondrial or affecting mitochondrial function, but with strong evidence for mitochondrial localisation by statistical analysis of experimental data, and represent novel predictions.
A clinical outcome of our IMPI data sets is in exome sequencing of mitochondrial disease patients; we can identify novel disease genes not previously recognised to be mitochondrial. A further outcome is that other scientists have searched IMPI to identify novel mitochondrial proteins relevant to their research.