The Integrated Mitochondrial Protein Index (IMPI) is a curated collection of human genes encoding proteins with evidence for associating with mitochondria and affecting their form and function. The aim of IMPI is to define a mitochondrial proteome for studying mitochondrial (dys)function and disease, including during DNA sequencing of patients with mitochondrial disease.
The IMPI collection has two parts:
- A curated collection of genes encoding mitochondrial localised/associated/ancillary proteins with supporting PubMed citations
- (Current release: 1,936 genes with over 2,300 citations)
- A collection of novel predicted mitochondrial proteins with highly significant evidence of mitochondrial localisation, but unreported in the literature as mitochondrial
- (Current release: 119 genes)
The predicted portion is created and updated by using machine learning on multiple types of evidence, including novel data types previously unused in defining the mitochondrial proteome, such as presence in manually curated metabolic models, antibody data from the Human Protein Atlas, conservation of predictions from four mitochondrial targeting sequence programs across four species, extensive experimental data from over 70 GFP and mass spectrometry localisation studies, protein-protein interactions, enrichment of mitochondrial RNAs binding CLUH, homologs in the amitochondriate Monocercomonoides sp., and literature/knowledge mining. Evidence is shared among orthologs of organisms by using the mapping of Ensembl's COMPARA.
To determine if proteins are likely mitochondrial, a machine learning classifier (support vector machines) is used after training on collections of known mitochondrial and non-mitochondrial proteins. The machine learning algorithm then identifies proteins from the human genome having similar properties and evidence to previously characterised mitochondrial proteins.
All predictions are checked manually for literature supporting a mitochondrial localisation, and if confirming publications are found, then entries are moved to the IMPI curated collection, and machine-learning re-run.
See also the companion dataset to IMPI of genes encoding proteins associated with mitochondrial disease and dysfunction: The Buddy Collection.
IMPI-2021-Q3pre: 2021 release of IMPI database including:
- 1,362 genes encoding verified mitochondrial proteins with "gold standard" evidence of mitochondrial localisation, e.g. visual conformation by GFP-tagging;
- 117 encoding associated mitochondrial proteins with evidence of mitochondrial localisation, but lacking visual confirmation, e.g. APEX tagging;
- 457 encoding ancillary mitochondrial proteins with no evidence of mitochondrial localisation, but reported to affect mitochondrial function or morphology; and
- 119 encoding predicted mitochondrial proteins not reported as mitochondrial or affecting mitochondrial function, but with highly significant evidence for mitochondrial localisation by statistical analysis of experimental data.
IMPI-2020-Q3pre: 2020 release of IMPI database including:
- 1,330 genes encoding proteins reported to have a mitochondrial localisation;
- 328 reported to affect mitochondrial function, morphology and dynamics, but lacking conclusive localisation; and
- 511 not reported as mitochondrial or affecting mitochondrial function, but with strong evidence for mitochondrial localisation by statistical analysis of experimental data.
IMPI-2018-Q2: 2018 release of IMPI database including:
- 1,184 genes encoding proteins reported to have a mitochondrial localisation; and
- 442 predicted from experimental data, encoding proteins with strong evidence for mitochondrial localisation.