|Home | Create TSL | Examples | Help|
Positive and negative sample
Two Sample Logo calculates statistical significance of the relative position-specific symbol frequencies between two sets of aligned sequences. For example, sequences that are known to share a sequence motif may be locally aligned including positions upstream or downstream from the motif. All aligned sequences in both samples are required to be of the same length, so dash characters ("-") should be used to pad the positions in case some sequences are shorter.
Sequences that contain a motif and at the same time have a certain functional property (say, protein modification sites or transcription factor binding regions) constitute a positive sample. Sequences that contain the motif and at the same time do not have the functional property constitute the negative sample. The distinction between the samples does not necessarily have to be based on the presence and absence of a functional property: as long as there is a clear way of interpreting the data, any pair of sets of aligned sequences can be used as positive and negative.
Either amino acid or nucleotide. If amino acid option is selected, all symbols other than the standard 20 amino acid single-letter codes will be replaced with dashes and will not be a part of the statistics. Likewise, if nucleotide option is selected, all symbols other than the A, C, G, T, and U will be replaced with dashes.
Two Sample Logo supports two types of statistical tests:
Frequently used statistical procedure that tests whether two samples were generated by the same Gaussian distribution. The assumptions of the t-test are that all observations are independent and that the standard deviations for both samples are identical, then it checks the equality of means (Hogg and Craig, 1994).Binomial test
Consider two 0-1 samples S1 and S2 of sizes n1 and n2 respectively, in which symbol 1 occurred k1 times in S1 and k2 times in S2. Let us also assume that the test statistic is the absolute difference of symbol’s relative frequencies, i.e. θ = |k1/n1 – k2/n2|. The binomial test calculates the probability that a difference ≥θ for the two samples of sizes n1 and n2 randomly drawn from the underlying null distribution could occur by chance alone. Since, according to the null model, both samples are independent and identically distributed, an unbiased estimate of the probability of success p of the underlying binomial distribution is calculated as the relative frequency of occurrence of a symbol when S1 and S2 are concatenated, i.e. p = (k1 + k2)/(n1 + n2). The achieved significance level P of the null hypothesis is then the probability that the difference ≥θ will be observed between the estimated success probabilities in the two samples of sizes n1 and n2 randomly drawn from the underlying distribution. It is calculated as:
P-value is defined as the lowest significance level at which the null hypothesis can be rejected. In the case of two sample logos, null hypothesis assumes that each symbol at each position in both samples is generated according to the same probability distribution. Based on the null hypothesis, p-value is calculated as the probability that the test statistic as extreme or more extreme than in the original samples can occur by chance alone. Here, the test statistic is the absolute value of the difference in relative frequencies between positive and negative samples. Since in most cases this probability cannot be calculated exactly, p-value is only approximated.
Show conserved residues
Because conserved motifs will not be enriched nor depleted in the positive sample in comparison to the negative sample (the difference of their relative frequencies will be zero), by default they will not be displayed in the logo. Checking this option forces the software to show conserved residues.
Fixed height symbols
When this option is checked, all enriched and depleted symbols will have the same height. When it is not checked, the height of the symbols will be proportional to the difference of relative frequencies of corresponding residues at a given position in the positive and negative sample.
A correction of the p-value in cases when multiple dependent or independent hypotheses are tested. See (Weisstein) for details.
Sets up the title for the two sequence logo.
Limits the analysis to the specified colums in the samples of aligned sequences.
First position index
Index assigned to the first symbol in the logo. For example, if the sample is a 25 residue-long window centered around an active site, first position symbol should be -12: then the active site will have index 0, and the last symbol will be indexed as +12. The default value is 1.
Show X-axis indexes
Show residue indexes on the X-axis.
Show Y-axis labels
Shows labels "enriched" and "depleted" next to the Y-axis.
Two Sample Logo supports Encapsulated PostScript (EPS), Portable Document Format (PDF), Graphics Interchange Format (GIF) and Portable Network Graphics (PNG).
Height and width of the output image, in pixels, centimeters or inches.
Sets up the image resolution. Applicable to bitmaps only (GIF and PNG).
Turns antialiasing on or off.
If this option is checked, letters in the output will be inscribed in bounding boxes.
If this option is checked, letters in the output will be only outlined (and not filled).
Black and white
All symbols are written in black type against a white background.
WebLogo default colors
Shapley color table for amino acids
In the original Shapley scheme, G and V were color-coded as white. Since this would render them invisible against a white background, their color has been changed to light grey.
Shapley color table for nucleotides
Positively charged residues (K, R, H) are colored blue, and negatively charged residues (D, E) are colored red; all neutral residues are colored black.
Hydrophobic residues (A, F, G, I, L, P, V, W, Y) are cyan colored, while the remaining hydrophilic residues are colored black. This classification was based on (Eisenberg, 1984).
Surface exposed residues (D, E, H, K, N, P, Q, R, S, T, Y) are colored orange, and burried residues (A, C, F, G, I, L, M, V, W) are colored black. This classification was based on (Janin, 1979).
High flexibility residues (D, E, K, N, P, Q, R, S) are colored red, whereas low flexibility residues (A, C, F, G, H, I, L, M, T, V, W, Y) are colored green. This classification was based on (Vihinen et al., 1994).
Disorder-promoting residues (A, R, S, Q, E, G, K, P) are colored red, order-promoting residues (N, C, I, L, F, W, Y, V) are colored blue, and disorder-order neutral residues (D, H, M, T) are colored black. This classification was based on (Dunker et al., 2001).
User defined color scheme
This option allows you to specify a new color mapping using the set of standard predefined colors listed in the following table:
Any symbol not explicitly assigned to a color will default to black.