AI’s human protein database a ‘great leap’ for research

PARIS (AFP) – Scientists on Thursday unveiled the most exhaustive database yet of the proteins that form the building blocks of life, in a breakthrough observers said would “fundamentally change biological research”.

Every cell in every living organism is triggered to perform its function by proteins that deliver constant instructions to maintain health and ward off infection.

Unlike the genome – the complete sequence of human genes that encode cellular life – the human proteome is constantly changing in response to genetic instructions and environmental stimuli.

Understanding how proteins operate – the shape in which they end up, or “fold” into – within cells has fascinated scientists for decades.

But determining each protein’s precise function through direct experimentation is painstaking.

Fifty years of research have until now yielded only 17 per cent of the human proteome’s amino acids, the subunits of proteins.

On Thursday, researchers at Google’s DeepMind and the European Molecular Biology Laboratory (EMBL) unveiled a database of 20,000 proteins expressed by the human genome, freely and openly available online.

They also included more than 350,000 proteins from 20 organisms such as bacteria, yeast and mice that scientists rely on for research.

To create the database, scientists used a state-of-the-art machine learning programme that was able to accurately predict the shape of proteins based on their amino acid sequences.

Instead of spending months using multi-million dollar equipment, they trained their AlphaFold system on a database of 170,000 known protein structures.

The AI then used an algorithm to make accurate predictions of the shape of 58 percent of all proteins within the human proteome.

This more than doubled the number of high-accuracy human protein structures that researchers had identified during 50 years of direct experimentation, essentially overnight.

The potential applications are enormous, from researching genetic diseases and combating anti-microbial resistance to engineering more drought-resistant crops.

Paul Nurse, winner of the 2001 Nobel Prize for Medicine and director of the Francis Crick Institute, said Thursday’s release was “a great leap for biological innovation”.

“With this resource freely and openly available, the scientific community will be able to draw on collective knowledge to accelerate discovery, ushering in a new era for AI-enabled biology,” he said.

Director for the Centre for Enzyme Innovation John McGeehan at the University of Portsmouth, whose team is developing enzymes capable of consuming single-use plastic waste, said AlphaFold had revolutionised the field.

“What took us months and years to do, AlphaFold was able to do in a weekend. I feel like we have just jumped at least a year ahead of where we were yesterday,” he said.

The ability to predict a protein’s shape from its amino acid sequence using a computer rather than experimentation is already helping scientists in a number of research fields.