Use of NMR chemical shift perturbation data
General:
Before starting HADDOCK, Ambiguous
Interaction Restraints
(AIRs)
should be generated. For this, it is important to define the
residues at the interface for each molecule based on NMR chemical shift
perturbation data, mutagenesis data or any kind of data that provides information on the
interaction interface.
In the definition of those residues, one distinguishes between "active"
and "passive" residues.
- The "active" residues are those experimentally identified
to be involved in the interaction between the two molecules AND solvent
accessible (either main chain or side chain relative accessibility
should be typically > 40-50%).
Note that the accessibility cutoff is not a hard limit. You
should carefully check the identity of the residues at the interface and possibly include
residues with lower accessibility if they carry potentially important functional groups.
- The "passive" residues are all solvent accessible surface
neighbors of active residues.
Note that the active and passive residues have to be defined by
the users based on their own interpretation of the experimental data, especially in the case
of NMR titration data. One way to interpret the significance of the shift is to calculate the
average perturbation and to consider that all perturbations higher than the average are
significant.
Once you have defined your active and passive residues,
- go to the HADDOCK home page
(http://www.nmr.chem.uu.nl/haddock/)
- select "project setup" and click on "Generate AIR restraint file"
- Enter the residue numbers corresponding to the active and passive residues for each molecule.
- Define the upper distance limit for AIRs (maximum distance between any
atom of an active residue of one molecule to any atom of an active or passive residue of the
second molecule).
Note that the current upper distance limit default value is 2A, which might seem quite
short, BUT remember that the effective distance deff will always be shorter than the shortest
distance entering the sum:
In addition since the degree of ambiguity is very high (several thousands distances can enter the sum),
the effective distance can be quite shorter than the shortest distance entering the sum!!!
- Finally, click on generate AIR restraints. An AIR restraint file in CNS format is generated.
Use "copy and paste" or save the generate AIR restraints to disk using "file save as".
Use of bioinformatic interface predictions:
In absence of any experimental information on the interaction surfaces, you might want to try to predict them based on sequence conservation and other properties. We have developed for this purpose an interface prediction software called
WHISCY. It has been designed to provide an easy interface to HADDOCK and will output, among others, lists of active and passive residues for HADDOCK. WHISCY is available as a web server at
http://www.nmr.chem.uu.nl/whiscy . The WHISCY web page also provides links to other available online interface prediction servers.
For more information refer to:
Random AIR definition (ab-initio mode):
In the absence of any experimental and/or bioinformatic information to drive the docking, HADDOCK 2.x now offers
the possibility to randomly define AIRs from solvent accessible residues (>20% relative accessibility).
For each docking trial another set of AIRs will be used. These restraints are defined in the randomairs.cns CNS script.
The sampling of residues is limited to the defined semi-flexible segments (nseg_X and following parameters in
run.cns). If no semi-flexible segment is defined, then all solvent accessible residues will be sampled (provided enough structures are generated in the rigid-body docking stage (it0)). By defining semi-flexible segments in combination with random AIR definition (ranair=true in run.cns), it is possible to limit the sampling to a selected region of the surface (e.g. the CDR loops in an antibody-antigen complex).
The random AIRs are defined (in the randomairs.cns CNS script) as follow (only for the rigid-body energy minimization stage):
- One residue on each molecule is selected randomly (Ai,Bi)
- All surface neighbors within 5A are also selected
- AIRs are defined between each residue selected from molecule A (Ai + 5A neighbors) and
the first residue randomly selected from molecule B and all its surface neighbors within
a 7.5A cutoff (Bi + 7.5A neighbors)
- AIRs are defined between each residue selected from molecule B (Bi + 5A neighbors) and
the first residue randomly selected from molecule A and all its surface neighbors within
a 7.5A cutoff (Ai + 7.5A neighbors)
AIRs are thus defined from a 5A radius patch randomly selected from one molecule to a 7.5A radius patch randomly selected on the second molecule and vice-versa. The selected residues are written to disk in structures/it0 as fileroot_1.disp,....
For the semi-flexible refinement stage, contact AIRs are automatically defined between all residues
within 5A across the interface. In the final explicit solvent refinement, no AIR restraints will be defined.
Note1: To ensure a thorough sampling of the surface, the number of structures generated at the
rigid-body stage (it0) should be increased (e.g. 10000), depending on the extent of the surface
to be sampled.
Note2: The use of random AIRs is not compatible with other distance restraints (including unambiguous and hydrogen bond restraints).
Surface contact restraints:
Surface contact restraints between the various molecules can be automatically defined in HADDOCK 2.x (surfrest=true in run.cns). These restraints are defined in the surf-restraint.cns CNS script.
This option is fully compatible with all other types of restraints.
If turned on, one surface contact restraint will be defined between each molecule as an ambiguous distance
restraint with sum averaging (as for the AIRs) between all CA or P atoms (protein and/or DNA) of one molecule
and all CA or P atoms of the other molecule. If less than 3 CA and P atoms are found, all atoms will be selected
instead. The upper distance limit is set to either 7A (both molecules contain CA and/or P atoms) or 4.5A (only
one molecule contains CA and/or P atoms) or 2A (no molecule contains CA and/or P atoms).
Such restraints can be useful in multi-body (N>2) docking to ensure that all molecules are in contact and thus
promote compactness of the docking solutions. As for the random AIRs, surface contact
restraints can be used in ab-initio docking; in such a case it is important to have enough sampling of
the random starting orientations and this significantly increases the number of structures for rigid-body docking.
Center of mass restraints:
Center of mass restraints between the various molecules can be automatically defined in HADDOCK 2.x (cmrest=true in run.cns). These restraints are defined in the cm-restraint.cns CNS script.
This option is fully compatible with all other types of restraints.
If turned on, one center of mass restraint will be defined between each molecule as an ambiguous distance
restraint with center averaging between all CA or P atoms (protein and/or DNA) of one molecule
and all CA or P atoms of the other molecule. If less than 3 CA and P atoms are found, all atoms will be selected
instead. The upper distance limit is automatically defined as the sum of the "effective radius" of each molecule.
The "effective radius" is defined as half the average length of the three principal components.
Such restraints can be useful in multi-body (N>2) docking to ensure that all molecules are in contact and thus
promote compactness of the docking solutions. As for the random AIRs, center of mass
restraints can be used in ab-initio docking; in such a case it is important to have enough sampling of
the random starting orientations and this increase significantly the number of structures for rigid-body docking.
Use of NMR chemical shift perturbation data:
We will here illustrate the process of defining AIRs in the case of NMR
chemical shift perturbation data (CSP) describing the following steps:
1. Defining residues with "significant"
chemical shift perturbations
We will assume that we have a file called csp.dat containing the
combined proton/nitrogen chemical shift changes as obtained from 15N HSQC titration
experiments in the following format: