home  getMerops  help  tutorials  cite SitePrediction


General input


Choose one of the existing site-files or enter your own sites.
The input must be all text with at each line another site. All sites should be of the same length.

If don't have known cleavage sites you can retrieve them from the Merops database using the getMerops feature.

Cleavage position

Indicate after which amino acid the cleavage takes place.
If you have 6 residues in the site and the structure is:
P4-P3-P2-P1 x P1'-P2'
then 4 should be entered.

If you choose a predifined sites-file you do not need to fill in this field

Fasta File

Choose one of the existing multi-fasta-files or enter your own sequences.
The input must be in (multi)fasta format. All sequences should begin with a header line starting with >.
>sp|Q92934|BAD_HUMAN Bcl2 (Human)


You can also choose to give a list of GenBank and/or UniProt IDs


If you enter a valid e-mail address, you will be informed by mail when the calculation is finished.

Additional parameters (extended view)


The penalty is the value that an amino acid will get if it did not occur at that position in the input sites.
If this value is set to 0, this will mean that an abnormality at one of the positions will result into a frequency score of 0 (and thus into an average score of 0). By giving the penalty a value of 1, the result will be calculated as if the amino acid was counted just once at that position.
As a default value a penalty of 0.1 is set.
Setting this value is one of the most difficult things to do. When a lot of input sites are given (e.g. >100) then smaller penalties could be applied.

Sort Order

Define by which score you want the output to be sorted and cutted-off (when e.g. the best 50 hits are to be displayed, the order will determine which sites will be shown)

Use scores

Define which scores are to be calculated and displayed. Some score may be more important for some cases. Leaving out abundant scores will accelerate the process.
Short explanation of the scores:

FREQUENCY SCORE: score based on the occurrence of each amino acid at each position in the original (known) sites.
MATRIX SCORE: A score that indicates how much the potential site differs from the input sites using a substitution matrix.

Structural information and solvent accessibilty

structure prediction

Since the appropriate presentation of a cleavage site seems to be critical for efficient hydrolysis, another feature of SitePrediction includes information on the structural conditions of the potential site and its environment. The SSPro package is integrated which predicts the solvent accessibility and secondary structure of a protein sequence. When you check this checkbox all sequences will be prepared for this. Afterwards you can run the structure prediction.

Example of the structure prediction visualisation

Example of the solvent accessibility prediction visualisation

As it is not proven that these predictions play an important role into the cleavage site predictions, results should be interpreted with care and never used as a decisive factor.

Site analysis

analysis and logo of the sites

The input sites provided by the user can be visualized by SitePrediction in two ways. The first visualization is a logo where at each position the size of the amino acid is relative to its frequency of occurrence in the known sites. In this way, the user gets an idea of the optimal cleavage site.

Example of logo visualisation of the input-sites

Secondly, for every position, a histogram is constructed where every amino acid has a bar of which the height is relative to the frequency scores. This histogram contains more information than the logo since it is a complete representation of the frequency matrix.

Example of the histogram visualisation of the input-sites

Pest analysis

Pest analysis

Polypeptide sequences rich in proline (P), glutamic acid (E), serine (S) and threonine (T) are sometimes proposed to be targets for rapid destruction 9. Therefore the presence of such PEST-regions in potential substrates could give extra information on the cleavage site prediction results.

Example of the pest sequence visualisation


Define here how large the pest score window must be. This is the minimum length of the PEST sequences. The value is by default set to 10, meaning that a sequence of minimum 10 hydrofilic amino acids is flanked by lysine (K), arginine (R) or histidine (H).