Reference:
Lukasz Goldschmidt, David Cooper, Zygmunt Derewenda, David Eisenberg. (2007).
Toward rational protein crystallization:
A Web server for the design of crystallizable protein variants
Protein Science. 16:1569-1576 (2007 Aug).
Contents:
Overview
The aim of this tool is to suggest mutation candidates that are likely
to enhance a protein's crystallizability via the generation of
crystal contacts by the Surface Entropy Reduction (SER) approach described by Derewenda (2004).
Derewenda argues that crystallizability is associated with surface properties
of the proteins and that globular proteins recalcitrant to crystallization
contain on their surface an "entropic shield", made up of long, flexible
polar side chains that impede the protein's ability to form intermolecular
contacts and thus to assemble into a crystalline lattice. Crystallization is
driven by the free energy change from the supersaturated solution of protein
to protein crystals in the solvent. Given that the enthalpy values of
intermolecular interactions in the crystal lattice are typically small,
crystallization is very sensitive to entropy changes involving both the
solvent and the protein.
Incorporation of protein molecules into the lattice carries a negative
entropy term, and this is an inescapable thermodynamic cost. Furthermore,
immobilization of side chains and solvent at the point of crystal contacts
generates additional loss of entropy.
The Surface Entropy Reduction approach involves the replacement of surface exposed,
high entropy amino acids with residues that have small,
low entropy side chains such as alanines.
Lysines and glutamates are of particular importance, since statistical
analyses show that both types of residues are localized predominantly
on the surface (Baud and Karlin, 1999) and are disfavored at
protein-protein interfaces (Conte et al., 1999).
Job Submission
New job submission is done on the
New Job page.
The following input is required:
- Amino acid or DNA sequence to be analyzed
- A short sequence name identifier (primarily for the user's convenience)
- An email address for results delivery
Additional search and model parameters can be adjusted on the Parameters tab,
although the default should suit most users.
For protein fragments (sequences not starting at residue 1), the starting position
can be specified to ensure consistent residue numbering in the results tables and graphs.
Initial processing typically takes a few minutes. The user will be notified by
email upon completion; current job and queue status are shown on the web page.
Subsequent job parameter revisions take only a few seconds to process
and are processed on demand.
Process Summary
The submitted sequence undergoes the following three primary analyses.
Each analysis assigns either a positive or neative score to every
residue in the sequence.
Combined these analyses identify residues most favorable for mutation.
A positive contribution from every model is
not required, although higher positive scores indicate better
candidates.
- Secondary structure prediction
The secondary structure is predicted with PSIPRED which incorporates two
feed-forward neural networks which perform an analysis on output obtained
from PSI-BLAST. Predicted coil regions are marked as favorable sites for
mutation as they tend to be surface exposed and so far proved very effective;
the entropy reduction concept was found to be less effective if the targeted patch
lies on the solvent-exposed face of a helix.
The score contribution from the secondary structure analysis
is directly proportional to the confidence for
a residue to be in a coil region. A graph showing the secondary
structure confidences is provided on the Graphs tab.
- Entropy profile
The entropy profile the for entire sequence is computed using Sternberg's side
chain entropy tables in a user-defined averaging window (default 3 residues).
The side chain entropy values are normalized to the entropy value the amino
acid with the highest entropy (Glutamine). Regions with high entropy are
considered good candidates for mutation.
The score contribution from this
analysis is directly proportional to the computed entropy average at each
residue in the sequence. The computed entropy profile graph is shown on the Graphs tab.
- PSI-BLAST search
The PSI-BLAST search is performed to identify conserved residues, which would
be disfavored although not excluded from being mutation candidates.
Conversely, aligned sequences containing mutations to the target residues (Alanine)
provide additional support for a proposed site and are favored.
This analysis contributes negative scores for conserved residues (directly proportional
to the conservation level), and positive scores for residues found mutated
to a target residue.
A chart showing the number of sequences found containing the same residue as the
provided sequence (conserved residue) and the number of sequences found
containing any target residue (mutated residue) is shown on the Graphs tab.
The calculated conservation ratio is shown on the Score Details tab.
Next, the sequence is directly analyzed and split into clusters.
A cluster starts with either a high entropy residue or a target residue and contains
only high entropy or target residues; gaps of up to "Max. Gap within Cluster" residues are allowed.
Proposed mutations within a cluster are ranked with consideration to their pattern
and flanking residues using the following principles:
- Prefer residues that scored favorably in the primary analyses.
- Maximize length of low entropy patch post mutation.
- Minimize gaps in the low entropy patch.
- Minimize number of required mutations.
- Maximize side chain entropy reduction.
The cluster score is a weighted average of the above principles.
The clusters are ranked and only well-scoring clusters are presented in the
results. The default threshold is 66% of the score of the best-scoring cluster.
Weights and cutoff threshold can be changed on the Parameters tab.
All proposed mutations within a cluster need to be introduced concurrently
to ensure sufficient removal of the "entropy shield." By default a cluster will
contain no more than three mutations to limit the reduction of the target
protein solubility. Typically mutations from only one cluster are introduced
into the protein target at a time, although larger proteins (>80 kD) may require
concurrent mutation of several clusters.
The protein target is often found to crystallize in a new space groups,
with mutated patches directly involved in new crystal contacts.
Finally, a meta search is performed on the submitted sequence.
This search attempts to detect other potential crystallization failure modes
such as the requirement of metal ions or other small molecules, or
interacting protein partners.
Results
The results are presented interactively on the website with
internal links to analysis details as well as links to external sources.
A condensed version of the results can also be delivered by email.
Summary Tab.
The Summary tab contains a very brief synopsis of the proposed mutations.
The mutations are proposed in groups or clusters and all proposed mutations within
a cluster should be introduced together. By default clusters are sorted by
the prediction confidence and thus the first returned cluster is expected
to be most successful in improving crystallization and/or diffraction
quality for the provided sequence. The success confidence score is displayed
as well; two clusters may have similar confidence scores and thus either one
of both proposals should be pursue independently.
Analysis details can be found on the Score Details tab. A graphical representation
of the proposed mutation sites, secondary structure prediction and entropy
profiles are on the Graphs Tab. Aligned sequences are on the Blast tab.
Score Details Tab.
Score contributions making up the total score at each residue position
can be found on this tab.
A cluster is typically less than 10 amino acids in size and contains some
non-mutable or non-high entropy amino acids. A patch of residues within
a cluster that is predicted to be most successful highlighted; proposed
mutations are shaded green, and target residues are shaded yellow.
- SS Coil Confidence: Confidence in the range of 0 - 1.0 for a residue
to be in a coil region, as predicted by PSIPRED.
- Entropy Average: Normalized side chain entropy average computed in the
selected window (default 3 residues). Range: 0 - 1.0
- Blast Conserved: Conservation level, proportional to the number of
aligned sequenced containing the same residue as the provided sequence normalized
by the total number of aligned sequences. Range: 0 - 1.0
- Blast Mutated: Number of aligned sequenced containing any of the target
residues normalized by the total number of aligned sequences. Range: 0 - 1.0
- Sec Str Score: Score contribution from the secondary structure analysis,
multiplied by the Secondary Structure Weight parameter.
See Process Summary above for details
- Entropy Score: Score contribution from the entropy profile analysis,
multiplied by the Entropy Profile Weight parameter.
See Process Summary above for details
- Cluster Score: Score contribution from the high entropy residue cluster search model,
multiplied by the High Entropy Cluster Weight parameter.
See Process Summary above for details
- Total Score: Sum of the scores from the above three primary analyses.
Graphs Tab.
The following graphs are provided to aid visualization of the proposed mutation sites,
and to help understand the contribution of each analysis. Taken together, all analyses
determine which sites are most suitable for mutation.
Overall Score: this stacked graph represents the score contribution from
each analysis to the total score at each residue position. Refer to the legend and on the Graphs tab.
Peaks indicate regions that are predicted to contain best mutation candidates to
improve crystallization and/or diffraction quality.
Proposed clusters are highlighted and the cluster rank and score are shown.
Residues proposed for mutation are shaded green.
A graphical representation of high entropy, mutable and low entropy target residues is shown on
the bottom of this graph both pre and post mutation, respectively.
- Blast Results:
Number of sequences found by the PSI-BLAST search containing the same residue as the
submitted sequence (conserved residue) and a target residue (mutated), respectively.
- Entropy Average:
Side chain entropy average at each residue position as given by Sternberg, computed in the specified window size.
Only residues with an entropy average above the specified cutoff contribute to the total score.
- Secondary Structure -- Helix, Strand and Coil Regions:
The confidence returned by PSIPRED for a residue to be in a helix, strand or a coil region is shown as a stacked graph.
Overall prediction confidence at each position is the sum of all three confidence values, strand, helix and coil.
Only residues with coil confidence above the user-specified cutoff are considered.
- Secondary Structure -- Coil Regions:
The confidence returned by PSIPRED for a residue to be in coil region.
Only residues with coil confidence above the user-specified cutoff are considered.
Blast Tab.
Alignment results returned by PSI-BLAST. Top 50 (or fewer) alignments are shown, in default BLAST order by decreasing identity.
The expectation value, bit score and sequence identity percentage to the provided sequence are shown for each alignment. A
brief sequence annotation and an external link are provided.
For each proposed cluster, the residues in the aligned sequences are shown. A period indicates no change from the
provided sequence. A gap in the aligned sequence is shown as '-'. An insertion in the aligned sequence is not shown.
For convenience, high entropy amino acids are shown in red, and target amino acids in green.
The complete alignment and additional references (if any) are shown by clicking the expansion [+] link.
Meta Search Tab.
Details results from the performed Meta Searches are shown on this tab.
ProLinks:
The Prolinks database is a collection of inference methods used to predict functional linkages between proteins.
Crystallization may require or be by improved by the co-expression with an interacting partner.
This search identifies potentially interacting partners that could be co-expressed to improve crystallizability.
Each BLAST-aligned sequence is screened for potential functional linkages. For each aligned sequences,
potential matches are shown. Click the [+] expansion link to see all linkages, and detection method and confidence for each.
Each linkage can be further examined on the ProLinks server using the provided link.
BLOCKs:
Blocks are multiply-aligned, ungapped segments corresponding to the most highly conserved regions of proteins.
This search identifies highly conserved motifs such as metal binding sites;
addition of the corresponding metal may be required for crystallization.
Detected signatures are shown, the number of blocks and their location in the provided sequence
is shown upon expansion of en entry by clicking the [+] link.
PDB Homologs / Solvent Accessibility:
This search lists any homolog structures with sequence identity above the specified threshold (default 20%)
and analyze them for solvent accessibility with DSSP. Averaged solvent accessibility from all PDB homologs
is shown on the score summary graphs; complete DSSP analysis results are shown upon expansion of en entry
by clicking the [+] link.
References
- Derewenda, Z.S. (2004). Rational protein crystallization by mutational surface engineering. Structure (Camb) 12: 529-535.
- Baud, F., and Karlin, S. (1999). Measures of residue density in protein structures. Proc Natl Acad Sci USA 96, 12494-12499.
- Conte, L. L., Chothia, C., and Janin, J. (1999). The atomic structure of protein-protein recognition sites. J Mol Biol 285, 2177-2198.
- Pickett, S. and Sternberg, M. (1993). Empirical Scale of Side-Chain Conformational Entropy in Protein Folding. J Mol Biol. 231(3):825-839.
Updated August 2007.