The protein universe in one database


However, such entries tended to be limited to proteins from humans, mice and other mammals, says Porta Pardo. It can therefore be assumed that the AlphaFold database will bring a significant increase in knowledge, since the entries come from many different organisms. ‘It will be a great tool. I’ll probably download it as soon as it comes out,” says Porta.

Since the AlphaFold software has been available for a year, the researchers are already able to predict the structure of any protein. However, many find that bundling forecasts into a single database saves even more time and money — and a lot of trouble. “It’s another barrier to entry that’s being removed,” says Porta. »I have already used protein models generated with AlphaFold. But I’ve never used the program myself.«

Jan Kasinski, a structural modeler at EMBL in Hamburg, who used the AlphaFold network himself in 2021, can’t wait to expand the database. His team spent three weeks predicting the proteome – the entirety of an organism’s proteins – of a pathogen. “Now we can just download all the models,” he said at the press conference.

New research questions become possible

The fact that almost all known proteins are stored in a database will also make a large number of new studies possible. Christine Orengo’s team recently used the AlphaFold database to identify new types of protein families, and now they want to do it on a much larger scale. Her lab hopes to use the expanded database, for example, to understand the evolution of proteins with useful properties, such as the ability to break down plastic, as well as those with more harmful characteristics, such as being able to cause cancer. By identifying distant relatives of these proteins in the database, the basis for their properties can be determined.

Martin Steinegger, a computational biologist at Seoul National University who helped develop a cloud-based version of AlphaFold, is excited about the addition to the database. However, he thinks researchers will likely need to continue using the program anyway. More and more people are turning to AlphaFold to predict how proteins interact with each other, but such predictions are not present in the database. Also, the data collection does not yet include microbial proteins identified by sequencing genetic material from soil, seawater, and other “metagenomic” sources.

Some demanding applications of the expanded AlphaFold database could also depend on downloading the entire content of 23 terabytes, which will not be feasible for many teams. Cloud storage could also prove costly. Steinegger co-developed a software tool called FoldSeek that can be used to quickly find structurally similar proteins, which could significantly reduce the AlphaFold data.



Source link -69