skip to content
 

 

The Integrated Mitochondrial Protein Index (IMPI) is a curated collection of human genes encoding proteins with evidence for associating with mitochondria and affecting their form and function. The aim of IMPI is to define a mitochondrial proteome for studying mitochondrial (dys)function and disease, including during DNA sequencing of patients with mitochondrial disease.

The IMPI collection has two parts:

  • A curated collection of genes encoding mitochondrial localised/associated/ancillary proteins with supporting PubMed citations
    • (Current release: 1,966 genes with over 2,300 citations)
  • A collection of novel predicted mitochondrial proteins with highly significant evidence of mitochondrial localisation, but unreported in the literature as mitochondrial
    • (Current release: 119 genes)

The predicted portion is created and updated by using machine learning on multiple types of evidence, including novel data types previously unused in defining the mitochondrial proteome, such as presence in manually curated metabolic models, antibody data from the Human Protein Atlas, conservation of predictions from four mitochondrial targeting sequence programs across four species, extensive experimental data from over 70 GFP and mass spectrometry localisation studies, protein-protein interactions, enrichment of mitochondrial RNAs binding CLUH, homologs in the amitochondriate Monocercomonoides sp., and literature/knowledge mining. Evidence is shared among orthologs of organisms by using the mapping of Ensembl's COMPARA.

To determine if proteins are likely mitochondrial, a machine learning classifier (support vector machines) is used after training on collections of known mitochondrial and non-mitochondrial proteins. The machine learning algorithm then identifies proteins from the human genome having similar properties and evidence to previously characterised mitochondrial proteins.

All predictions are checked manually for literature supporting a mitochondrial localisation, and if confirming publications are found, then entries are moved to the IMPI curated collection, and machine-learning re-run.

See also the companion dataset to IMPI of genes encoding proteins associated with mitochondrial disease and dysfunction: The Buddy Collection (see below).

Availability:

IMPI-2021-Q4pre: 2021 release of IMPI database including:

  • 1,357 genes encoding verified mitochondrial proteins with "gold standard" evidence of mitochondrial localisation, e.g. visual conformation by GFP-tagging;
  • 127 encoding associated mitochondrial proteins with evidence of mitochondrial localisation, but lacking visual confirmation, e.g. APEX tagging;
  • 482 encoding ancillary mitochondrial proteins with no evidence of mitochondrial localisation, but reported to affect mitochondrial function or morphology; and
  • 117 encoding predicted mitochondrial proteins not reported as mitochondrial or affecting mitochondrial function, but with highly significant evidence for mitochondrial localisation by statistical analysis of experimental data.

IMPI-2020-Q3pre: 2020 release of IMPI database including:

  • 1,330 genes encoding proteins reported to have a mitochondrial localisation;
  • 328 reported to affect mitochondrial function, morphology and dynamics, but lacking conclusive localisation; and
  • 511 not reported as mitochondrial or affecting mitochondrial function, but with strong evidence for mitochondrial localisation by statistical analysis of experimental data.

IMPI-2018-Q2: 2018 release of IMPI database including:

  • 1,184 genes encoding proteins reported to have a mitochondrial localisation; and
  • 442 predicted from experimental data, encoding proteins with strong evidence for mitochondrial localisation.

Resources

IMPI can be accessed from MitoMiner in a user-friendly interface.

Alternatively you can download the whole data file here in Excel format:

The Buddy Collection

The Buddy collection is a companion dataset to the Integrated Mitochondrial Proteome Index (IMPI). It is a manual curation of about 2000 human genes encoding mitochondrial proteins for evidence of their association with disease and mitochondrial dysfunction, in patients, model organisms and cell lines. It also includes results from genome-wide screens investigating aspects of mitochondrial dysfunction.

A simple evaluation of each gene was made and its dysfunction was classified as causing:

N.B. No distinction is made between mitochondrial dysfunction as a cause or consequence of a genetic disease, i.e. primary versus secondary. For a fuller evaluation of selected mitochondrial disease genes, please visit Mitochondrial disorders of the Neuromuscular Disease Centre of Washington University.

Curation and annotation resources:

Availability:
Current version:
Buddy-2021Q4

Research Resources and Facilities