SERp Server: Introduction

Reference: Lukasz Goldschmidt, David Cooper, Zygmunt Derewenda, David Eisenberg. (2007). Toward rational protein crystallization: A Web server for the design of crystallizable protein variants Protein Science. 16:1569-1576 (2007 Aug).

The aim of this tool is to suggest mutation candidates that are likely to enhance a protein's crystallizability via the generation of crystal contacts by the Surface Entropy Reduction (SER) approach described by Derewenda (2004).

Derewenda argues that crystallizability is associated with surface properties of the proteins and that globular proteins recalcitrant to crystallization contain on their surface an "entropic shield", made up of long, flexible polar side chains that impede the protein's ability to form intermolecular contacts and thus to assemble into a crystalline lattice. Crystallization is driven by the free energy change from the supersaturated solution of protein to protein crystals in the solvent. Given that the enthalpy values of intermolecular interactions in the crystal lattice are typically small, crystallization is very sensitive to entropy changes involving both the solvent and the protein. Incorporation of protein molecules into the lattice carries a negative entropy term, and this is an inescapable thermodynamic cost. Furthermore, immobilization of side chains and solvent at the point of crystal contacts generates additional loss of entropy.

The Surface Entropy Reduction approach involves the replacement of surface exposed, high entropy amino acids with residues that have small, low entropy side chains such as alanines. Lysines and glutamates are of particular importance, since statistical analyses show that both types of residues are localized predominantly on the surface (Baud and Karlin, 1999) and are disfavored at protein-protein interfaces (Conte et al., 1999).

Job Submission

New job submission is done on the New Job page. The following input is required:

Amino acid or DNA sequence to be analyzed
A short sequence name identifier (primarily for the user's convenience)
An email address for results delivery

Additional search and model parameters can be adjusted on the Parameters tab, although the default should suit most users. For protein fragments (sequences not starting at residue 1), the starting position can be specified to ensure consistent residue numbering in the results tables and graphs.

Initial processing typically takes a few minutes. The user will be notified by email upon completion; current job and queue status are shown on the web page. Subsequent job parameter revisions take only a few seconds to process and are processed on demand.

Process Summary

The submitted sequence undergoes the following three primary analyses. Each analysis assigns either a positive or neative score to every residue in the sequence. Combined these analyses identify residues most favorable for mutation. A positive contribution from every model is not required, although higher positive scores indicate better candidates.

Secondary structure prediction
The secondary structure is predicted with PSIPRED which incorporates two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST. Predicted coil regions are marked as favorable sites for mutation as they tend to be surface exposed and so far proved very effective; the entropy reduction concept was found to be less effective if the targeted patch lies on the solvent-exposed face of a helix.
The score contribution from the secondary structure analysis is directly proportional to the confidence for a residue to be in a coil region. A graph showing the secondary structure confidences is provided on the Graphs tab.
Entropy profile
The entropy profile the for entire sequence is computed using Sternberg's side chain entropy tables in a user-defined averaging window (default 3 residues). The side chain entropy values are normalized to the entropy value the amino acid with the highest entropy (Glutamine). Regions with high entropy are considered good candidates for mutation.
The score contribution from this analysis is directly proportional to the computed entropy average at each residue in the sequence. The computed entropy profile graph is shown on the Graphs tab.
PSI-BLAST search
The PSI-BLAST search is performed to identify conserved residues, which would be disfavored although not excluded from being mutation candidates. Conversely, aligned sequences containing mutations to the target residues (Alanine) provide additional support for a proposed site and are favored.
This analysis contributes negative scores for conserved residues (directly proportional to the conservation level), and positive scores for residues found mutated to a target residue. A chart showing the number of sequences found containing the same residue as the provided sequence (conserved residue) and the number of sequences found containing any target residue (mutated residue) is shown on the Graphs tab. The calculated conservation ratio is shown on the Score Details tab.

Next, the sequence is directly analyzed and split into clusters. A cluster starts with either a high entropy residue or a target residue and contains only high entropy or target residues; gaps of up to "Max. Gap within Cluster" residues are allowed. Proposed mutations within a cluster are ranked with consideration to their pattern and flanking residues using the following principles:

Prefer residues that scored favorably in the primary analyses.
Maximize length of low entropy patch post mutation.
Minimize gaps in the low entropy patch.
Minimize number of required mutations.
Maximize side chain entropy reduction.

The cluster score is a weighted average of the above principles. The clusters are ranked and only well-scoring clusters are presented in the results. The default threshold is 66% of the score of the best-scoring cluster. Weights and cutoff threshold can be changed on the Parameters tab.

All proposed mutations within a cluster need to be introduced concurrently to ensure sufficient removal of the "entropy shield." By default a cluster will contain no more than three mutations to limit the reduction of the target protein solubility. Typically mutations from only one cluster are introduced into the protein target at a time, although larger proteins (>80 kD) may require concurrent mutation of several clusters. The protein target is often found to crystallize in a new space groups, with mutated patches directly involved in new crystal contacts.

Finally, a meta search is performed on the submitted sequence. This search attempts to detect other potential crystallization failure modes such as the requirement of metal ions or other small molecules, or interacting protein partners.

Results

The results are presented interactively on the website with internal links to analysis details as well as links to external sources. A condensed version of the results can also be delivered by email.

Summary Tab. The Summary tab contains a very brief synopsis of the proposed mutations. The mutations are proposed in groups or clusters and all proposed mutations within a cluster should be introduced together. By default clusters are sorted by the prediction confidence and thus the first returned cluster is expected to be most successful in improving crystallization and/or diffraction quality for the provided sequence. The success confidence score is displayed as well; two clusters may have similar confidence scores and thus either one of both proposals should be pursue independently.
Analysis details can be found on the Score Details tab. A graphical representation of the proposed mutation sites, secondary structure prediction and entropy profiles are on the Graphs Tab. Aligned sequences are on the Blast tab.

Score Details Tab. Score contributions making up the total score at each residue position can be found on this tab. A cluster is typically less than 10 amino acids in size and contains some non-mutable or non-high entropy amino acids. A patch of residues within a cluster that is predicted to be most successful highlighted; proposed mutations are shaded green, and target residues are shaded yellow.

SS Coil Confidence: Confidence in the range of 0 - 1.0 for a residue to be in a coil region, as predicted by PSIPRED.
Entropy Average: Normalized side chain entropy average computed in the selected window (default 3 residues). Range: 0 - 1.0
Blast Conserved: Conservation level, proportional to the number of aligned sequenced containing the same residue as the provided sequence normalized by the total number of aligned sequences. Range: 0 - 1.0
Blast Mutated: Number of aligned sequenced containing any of the target residues normalized by the total number of aligned sequences. Range: 0 - 1.0
Sec Str Score: Score contribution from the secondary structure analysis, multiplied by the Secondary Structure Weight parameter. See Process Summary above for details
Entropy Score: Score contribution from the entropy profile analysis, multiplied by the Entropy Profile Weight parameter. See Process Summary above for details
Cluster Score: Score contribution from the high entropy residue cluster search model, multiplied by the High Entropy Cluster Weight parameter. See Process Summary above for details
Total Score: Sum of the scores from the above three primary analyses.

Graphs Tab. The following graphs are provided to aid visualization of the proposed mutation sites, and to help understand the contribution of each analysis. Taken together, all analyses determine which sites are most suitable for mutation.

Overall Score: this stacked graph represents the score contribution from each analysis to the total score at each residue position. Refer to the legend and on the Graphs tab. Peaks indicate regions that are predicted to contain best mutation candidates to improve crystallization and/or diffraction quality.
Proposed clusters are highlighted and the cluster rank and score are shown. Residues proposed for mutation are shaded green.

A graphical representation of high entropy, mutable and low entropy target residues is shown on the bottom of this graph both pre and post mutation, respectively.

Blast Results: Number of sequences found by the PSI-BLAST search containing the same residue as the submitted sequence (conserved residue) and a target residue (mutated), respectively.
Entropy Average: Side chain entropy average at each residue position as given by Sternberg, computed in the specified window size. Only residues with an entropy average above the specified cutoff contribute to the total score.
Secondary Structure -- Helix, Strand and Coil Regions: The confidence returned by PSIPRED for a residue to be in a helix, strand or a coil region is shown as a stacked graph. Overall prediction confidence at each position is the sum of all three confidence values, strand, helix and coil. Only residues with coil confidence above the user-specified cutoff are considered.
Secondary Structure -- Coil Regions: The confidence returned by PSIPRED for a residue to be in coil region. Only residues with coil confidence above the user-specified cutoff are considered.

Blast Tab. Alignment results returned by PSI-BLAST. Top 50 (or fewer) alignments are shown, in default BLAST order by decreasing identity. The expectation value, bit score and sequence identity percentage to the provided sequence are shown for each alignment. A brief sequence annotation and an external link are provided.

For each proposed cluster, the residues in the aligned sequences are shown. A period indicates no change from the provided sequence. A gap in the aligned sequence is shown as '-'. An insertion in the aligned sequence is not shown. For convenience, high entropy amino acids are shown in red, and target amino acids in green.

The complete alignment and additional references (if any) are shown by clicking the expansion [+] link.

Meta Search Tab. Details results from the performed Meta Searches are shown on this tab.

ProLinks: The Prolinks database is a collection of inference methods used to predict functional linkages between proteins. Crystallization may require or be by improved by the co-expression with an interacting partner. This search identifies potentially interacting partners that could be co-expressed to improve crystallizability.
Each BLAST-aligned sequence is screened for potential functional linkages. For each aligned sequences, potential matches are shown. Click the [+] expansion link to see all linkages, and detection method and confidence for each. Each linkage can be further examined on the ProLinks server using the provided link.

BLOCKs: Blocks are multiply-aligned, ungapped segments corresponding to the most highly conserved regions of proteins. This search identifies highly conserved motifs such as metal binding sites; addition of the corresponding metal may be required for crystallization.
Detected signatures are shown, the number of blocks and their location in the provided sequence is shown upon expansion of en entry by clicking the [+] link.

PDB Homologs / Solvent Accessibility: This search lists any homolog structures with sequence identity above the specified threshold (default 20%) and analyze them for solvent accessibility with DSSP. Averaged solvent accessibility from all PDB homologs is shown on the score summary graphs; complete DSSP analysis results are shown upon expansion of en entry by clicking the [+] link.

References

Derewenda, Z.S. (2004). Rational protein crystallization by mutational surface engineering. Structure (Camb) 12: 529-535.
Baud, F., and Karlin, S. (1999). Measures of residue density in protein structures. Proc Natl Acad Sci USA 96, 12494-12499.
Conte, L. L., Chothia, C., and Janin, J. (1999). The atomic structure of protein-protein recognition sites. J Mol Biol 285, 2177-2198.
Pickett, S. and Sternberg, M. (1993). Empirical Scale of Side-Chain Conformational Entropy in Protein Folding. J Mol Biol. 231(3):825-839.

Updated August 2007.

UCLA MBI — SERp Server: Introduction

Contents:

Overview

Job Submission

Process Summary

Results

References