Choose one of the existing site-files or enter your own sites.
The input must be all text with at each line another site. All sites should be of the same length.
e.g.:
DESDF
DEVDS
DTRDN
AAVDG
DEPDC
DENDG
...
DNDDA
If don't have known cleavage sites you can retrieve them from the Merops database using the getMerops feature.
Indicate after which amino acid the cleavage takes place.
e.g.:
If you have 6 residues in the site and the structure is:
P4-P3-P2-P1 x P1'-P2'
then 4 should be entered.
If you choose a predifined sites-file you do not need to fill in this field
Choose one of the existing multi-fasta-files or enter your own sequences.
The input must be in (multi)fasta format. All sequences should begin with a header line starting with >.
e.g.:
>sp|Q92934|BAD_HUMAN Bcl2 (Human)
MFQIPEFEPSEQEDSSSAERGLGPSPAGDGPSGSGKHHRQAPGLLWDASHQQEQPTSSSH
HGGAGAVEIRSRHSSYPAGTEDDEGMGEEPSPFRGRSRSAPPNLWAAQRYGRELRRMSDE
FVDSFKKGLPRPKSAGTATQMRQSSSWTRVFQSWWDRNLGRGSSAPSQ
>sp|P55957|BID_HUMAN
MDCEVNNGSSLRDECITNLLVFGFLQSCSDNSFRRELDALGHELPVLAPQWEGYDELQTD
...
You can also choose to give a list of GenBank and/or UniProt IDs
e.g.:
Q92934
P55957
P05067
P51693
P10275
AAB07138
EEB07232
...
If you enter a valid e-mail address, you will be informed by mail when the calculation is finished.
The penalty is the value that an amino acid will get if it did not occur at that position in the input sites.
If this value is set to 0, this will mean that an abnormality at one of the positions will result into a frequency score of 0 (and thus into an average score of 0). By giving the penalty a value of 1, the result will be calculated as if the amino acid was counted just once at that position.
As a default value a penalty of 0.1 is set.
Setting this value is one of the most difficult things to do. When a lot of input sites are given (e.g. >100) then smaller penalties could be applied.
Define by which score you want the output to be sorted and cutted-off (when e.g. the best 50 hits are to be displayed, the order will determine which sites will be shown)
Define which scores are to be calculated and displayed. Some score may be more important for some cases. Leaving out abundant scores will accelerate the process.
Short explanation of the scores:
FREQUENCY SCORE: score based on the occurrence of each amino acid at each position in the original (known) sites.
MATRIX SCORE: A score that indicates how much the potential site differs from the input sites using a substitution matrix.
Since the appropriate presentation of a cleavage site seems to be critical for efficient hydrolysis, another feature of SitePrediction includes information on the structural conditions of the potential site and its environment. The SSPro package is integrated which predicts the solvent accessibility and secondary structure of a protein sequence. When you check this checkbox all sequences will be prepared for this. Afterwards you can run the structure prediction.
Example of the structure prediction visualisation
Example of the solvent accessibility prediction visualisation
As it is not proven that these predictions play an important role into the cleavage site predictions, results should be interpreted with care and never used as a decisive factor.
The input sites provided by the user can be visualized by SitePrediction in two ways. The first visualization is a logo where at each position the size of the amino acid is relative to its frequency of occurrence in the known sites. In this way, the user gets an idea of the optimal cleavage site.
Example of logo visualisation of the input-sites
Secondly, for every position, a histogram is constructed where every amino acid has a bar of which the height is relative to the frequency scores. This histogram contains more information than the logo since it is a complete representation of the frequency matrix.
Example of the histogram visualisation of the input-sites
Polypeptide sequences rich in proline (P), glutamic acid (E), serine (S) and threonine (T) are sometimes proposed to be targets for rapid destruction 9. Therefore the presence of such PEST-regions in potential substrates could give extra information on the cleavage site prediction results.
Example of the pest sequence visualisation
Define here how large the pest score window must be. This is the minimum length of the PEST sequences. The value is by default set to 10, meaning that a sequence of minimum 10 hydrofilic amino acids is flanked by lysine (K), arginine (R) or histidine (H).