Using Software to Predict Protein Structure
No matter how user-friendly the software, structure prediction and analysis workflows involve dozens of steps. For instance, clicking a button to change the color scheme of the results display would technically count as a step. In this chapter, we’ll take a higher-level approach and discuss four “big picture” steps involved in this workflow.
Step 1 – Obtain the amino acid sequence for your protein of interest
Protein sequences of interest to your research can be downloaded from free online databases such as UniProt or NCBI Protein.
Alternatively, you can have DNA from a biological sample sequenced, then use software to assemble the reads and translate the consensus sequences into protein sequences.
Step 2 – Choose a suitable prediction algorithm
There are hundreds of published structure prediction algorithms available, many of them available to the public without charge. So how can you choose the best one for your purposes?
A good starting point is to look at the recent winners of the biennial Critical Assessment of Protein Structure Prediction (CASP) experiments. (A multimer-specific competition called the Critical Assessment of Prediction of Interactions, or CAPRI, began in 2003 and merged with CASP in 2014). This international competition is an objective way of ranking the accuracy of protein folding algorithms. Every two years, developers of prediction algorithms are invited to use their proprietary algorithms to predict the three-dimensional structures of a set of protein sequences. The staff at CASP then compare the predicted structures to the experimentally derived structures, none of which are published until after the results are announced.
While recent CASP winners are a good starting point when choosing single-chain protein algorithms, it is important to note that the overall “best” folding method may not always be the optimal method for your specific protein. You may gain valuable information by using two or more folding prediction methods and doing a comparative analysis of the results.
Step 3 – Use software to predict the structure
Structure prediction algorithms are available as online utilities, downloadable programs that usually need to be run through the command-line or terminal, or as supported commercial software applications.
Commercial applications provide a streamlined workflow with a graphical user interface that requires minimal hands-on time. However, the cost of these applications is significant.
By contrast, open-source solutions are free to use but are typically time-consuming to learn and often demand ongoing involvement from a team of IT experts. In most cases, end-users must also be proficient in using the command-line to enter complex instructions. Without a comprehensive user guide or support team available, researchers using open-source software must rely on forums to get their questions answered. Once the workflow is mastered, significant hands-on time may still be needed to set up and submit each prediction.
When choosing between commercial and open-source protein prediction software, you will need to weigh not only the purchase price, but the amount of training, hands-on time and computer resources needed to use them.
Step 4 – Analyze the predicted structure
If you want to view and analyze a structure that is already in the PDB database, you can use the built-in three-dimensional (3D) structure viewer at RSCB. But when you are dealing with a novel structure prediction, you will need to utilize a viewer that can open locally saved protein structures. As with the prediction step, protein structure viewers are available in a variety of commercial and open-source varieties (open-source application list on the PDB website).
To save time and training resources, it may be preferable to use a single application to run the structure prediction and to view the output model. Most commercial protein structure prediction applications support viewing and analyzing the resulting models. If you use open-source prediction software, however, you will likely need to locate, install, and master a separate application for this analysis step.