Reference: Lukasz Goldschmidt, David Cooper, Zygmunt Derewenda, David Eisenberg. (2007). Toward rational protein crystallization: A Web server for the design of crystallizable protein variants Protein Science. 16:1569-1576 (2007 Aug).

Contents:


Overview

The aim of this tool is to suggest mutation candidates that are likely to enhance a protein's crystallizability via the generation of crystal contacts by the Surface Entropy Reduction (SER) approach described by Derewenda (2004).

Derewenda argues that crystallizability is associated with surface properties of the proteins and that globular proteins recalcitrant to crystallization contain on their surface an "entropic shield", made up of long, flexible polar side chains that impede the protein's ability to form intermolecular contacts and thus to assemble into a crystalline lattice. Crystallization is driven by the free energy change from the supersaturated solution of protein to protein crystals in the solvent. Given that the enthalpy values of intermolecular interactions in the crystal lattice are typically small, crystallization is very sensitive to entropy changes involving both the solvent and the protein. Incorporation of protein molecules into the lattice carries a negative entropy term, and this is an inescapable thermodynamic cost. Furthermore, immobilization of side chains and solvent at the point of crystal contacts generates additional loss of entropy.

The Surface Entropy Reduction approach involves the replacement of surface exposed, high entropy amino acids with residues that have small, low entropy side chains such as alanines. Lysines and glutamates are of particular importance, since statistical analyses show that both types of residues are localized predominantly on the surface (Baud and Karlin, 1999) and are disfavored at protein-protein interfaces (Conte et al., 1999).


Job Submission

New job submission is done on the New Job page. The following input is required: Additional search and model parameters can be adjusted on the Parameters tab, although the default should suit most users. For protein fragments (sequences not starting at residue 1), the starting position can be specified to ensure consistent residue numbering in the results tables and graphs.

Initial processing typically takes a few minutes. The user will be notified by email upon completion; current job and queue status are shown on the web page. Subsequent job parameter revisions take only a few seconds to process and are processed on demand.


Process Summary

The submitted sequence undergoes the following three primary analyses. Each analysis assigns either a positive or neative score to every residue in the sequence. Combined these analyses identify residues most favorable for mutation. A positive contribution from every model is not required, although higher positive scores indicate better candidates.


Next, the sequence is directly analyzed and split into clusters. A cluster starts with either a high entropy residue or a target residue and contains only high entropy or target residues; gaps of up to "Max. Gap within Cluster" residues are allowed. Proposed mutations within a cluster are ranked with consideration to their pattern and flanking residues using the following principles: The cluster score is a weighted average of the above principles. The clusters are ranked and only well-scoring clusters are presented in the results. The default threshold is 66% of the score of the best-scoring cluster. Weights and cutoff threshold can be changed on the Parameters tab.

All proposed mutations within a cluster need to be introduced concurrently to ensure sufficient removal of the "entropy shield." By default a cluster will contain no more than three mutations to limit the reduction of the target protein solubility. Typically mutations from only one cluster are introduced into the protein target at a time, although larger proteins (>80 kD) may require concurrent mutation of several clusters. The protein target is often found to crystallize in a new space groups, with mutated patches directly involved in new crystal contacts.

Finally, a meta search is performed on the submitted sequence. This search attempts to detect other potential crystallization failure modes such as the requirement of metal ions or other small molecules, or interacting protein partners.


Results

The results are presented interactively on the website with internal links to analysis details as well as links to external sources. A condensed version of the results can also be delivered by email.

Summary Tab. The Summary tab contains a very brief synopsis of the proposed mutations. The mutations are proposed in groups or clusters and all proposed mutations within a cluster should be introduced together. By default clusters are sorted by the prediction confidence and thus the first returned cluster is expected to be most successful in improving crystallization and/or diffraction quality for the provided sequence. The success confidence score is displayed as well; two clusters may have similar confidence scores and thus either one of both proposals should be pursue independently.
Analysis details can be found on the Score Details tab. A graphical representation of the proposed mutation sites, secondary structure prediction and entropy profiles are on the Graphs Tab. Aligned sequences are on the Blast tab.

Score Details Tab. Score contributions making up the total score at each residue position can be found on this tab. A cluster is typically less than 10 amino acids in size and contains some non-mutable or non-high entropy amino acids. A patch of residues within a cluster that is predicted to be most successful highlighted; proposed mutations are shaded green, and target residues are shaded yellow.

Graphs Tab. The following graphs are provided to aid visualization of the proposed mutation sites, and to help understand the contribution of each analysis. Taken together, all analyses determine which sites are most suitable for mutation.

Overall Score: this stacked graph represents the score contribution from each analysis to the total score at each residue position. Refer to the legend and on the Graphs tab. Peaks indicate regions that are predicted to contain best mutation candidates to improve crystallization and/or diffraction quality.
Proposed clusters are highlighted and the cluster rank and score are shown. Residues proposed for mutation are shaded green.

A graphical representation of high entropy, mutable and low entropy target residues is shown on the bottom of this graph both pre and post mutation, respectively.

Blast Tab. Alignment results returned by PSI-BLAST. Top 50 (or fewer) alignments are shown, in default BLAST order by decreasing identity. The expectation value, bit score and sequence identity percentage to the provided sequence are shown for each alignment. A brief sequence annotation and an external link are provided.

For each proposed cluster, the residues in the aligned sequences are shown. A period indicates no change from the provided sequence. A gap in the aligned sequence is shown as '-'. An insertion in the aligned sequence is not shown. For convenience, high entropy amino acids are shown in red, and target amino acids in green.

The complete alignment and additional references (if any) are shown by clicking the expansion [+] link.

Meta Search Tab. Details results from the performed Meta Searches are shown on this tab.
  • ProLinks: The Prolinks database is a collection of inference methods used to predict functional linkages between proteins. Crystallization may require or be by improved by the co-expression with an interacting partner. This search identifies potentially interacting partners that could be co-expressed to improve crystallizability.
    Each BLAST-aligned sequence is screened for potential functional linkages. For each aligned sequences, potential matches are shown. Click the [+] expansion link to see all linkages, and detection method and confidence for each. Each linkage can be further examined on the ProLinks server using the provided link.

  • BLOCKs: Blocks are multiply-aligned, ungapped segments corresponding to the most highly conserved regions of proteins. This search identifies highly conserved motifs such as metal binding sites; addition of the corresponding metal may be required for crystallization.
    Detected signatures are shown, the number of blocks and their location in the provided sequence is shown upon expansion of en entry by clicking the [+] link.

  • PDB Homologs / Solvent Accessibility: This search lists any homolog structures with sequence identity above the specified threshold (default 20%) and analyze them for solvent accessibility with DSSP. Averaged solvent accessibility from all PDB homologs is shown on the score summary graphs; complete DSSP analysis results are shown upon expansion of en entry by clicking the [+] link.


    References

    • Derewenda, Z.S. (2004). Rational protein crystallization by mutational surface engineering. Structure (Camb) 12: 529-535.
    • Baud, F., and Karlin, S. (1999). Measures of residue density in protein structures. Proc Natl Acad Sci USA 96, 12494-12499.
    • Conte, L. L., Chothia, C., and Janin, J. (1999). The atomic structure of protein-protein recognition sites. J Mol Biol 285, 2177-2198.
    • Pickett, S. and Sternberg, M. (1993). Empirical Scale of Side-Chain Conformational Entropy in Protein Folding. J Mol Biol. 231(3):825-839.
    Updated August 2007.