+44 191 255 8899 info@totallab.com

Resources: SameSpots

SameSpots Workflow Guide

Introduction

In depth guide to the SameSpots workflow which provides tips for the best experimental design set up, defining the statistical calculations and highlighting the software terminology.

Image QC

Each image has a quality check. This gives an instant visual check and confidence in your downstream data analysis.

What is Image QC?

When adding images to the experiment they are checked for quality and any issues/notes are highlighted for you immediately before any further analysis is attempted.

What problem does it solve?

Poor image quality can make analysis of your gels difficult. It can increase subjectivity and make your results less accurate.

What checks does Image QC do and what can I do to improve my images?

Image QC looks at and checks that images:
•Are greyscale
• Have a high bit-depth
• Have a high dynamic range
• Are free from saturation
• Have not been compressed or edited prior to analysis
For more information on image capture see the Image Capture Ebook

What is the Image Editor?

It is a tool you can access from the software if your images require minor changes (flipping etc) then you can edit them without changing the original images.

What edit options are available?

• Flip images horizontally and vertically
• Rotate images 90 degrees clockwise and anti-clockwise
• Invert image
• Crop image

Gel Quality and SpotCheck

What is SpotCheck?

SpotCheck helps you objectively validate that your gel running meets your lab’s quality standards. It also highlights some common problems that occur during image capture. It takes a high-quality, single-sample experiment (a gold standard) and this is then used as a
baseline to compare to other testing gels that have been run using the same sample and protocol. Test gels can be those run by other staff to ensure that gel quality is maintained throughout your lab.

How does SpotCheck work?

When testing a gel against a gold standard, SpotCheck will consider each spot individually, comparing its value on the test gel with the range of values for that spot on the gold standard gels. This is done using standard deviation to get a measure of the spread of values for the spot in the gold standard and seeing where the value for the spot on the test gel lies within that spread. (Note that the values used for this are the logged normalised spot volume measurements.) Consider the example below which shows typical values for a single spot. The gold circles represent measurements for that spot from the gold standard, (in this example the gold standard contains 10 gels). The blue triangle represents the measurement for that spot on the test gel. The horizontal blue line shows the mean value of the gold standard measurements and the red lines show distances away from the mean at 1 standard deviation, 2 standard deviations, and 3 standard deviations.

In this case we can say that the spot on the test gel is just over 2 standard deviations away from the mean of the gold standard measurements. When you create your gold standard you can specify the criteria that will be used to decide which spots will be passed by SpotCheck, i.e., whether the above spot at 2 standard deviations from the mean is acceptable or not. When choosing these criteria you should consider the normal distribution (bell curve) of values:

This shows that, when performing tests against a spot in the gold standard we’d expect the test value to be within 2 standard deviations of the gold standard mean 95% of the time if the test gel is of the same quality as the gold standards.

To get the best results from SpotCheck you should also check the variance of spot values in your gold standard. The easiest way to do this is to open the gold standard experiment in SameSpots and go to the CV Graph.

A good reference experiment should have most spots with a low CV and very few with higher values like the example below:

When you have completed the SpotCheck workflow to test a gel against your gold standard you will be presented with a view of the test gel with the gold standard spot outlines overlaid on top of it. You will be able to visually and numerically review the test gel and will be provided with feedback about whether or not the test gel passes gold standard criteria.

What evidence is there that SpotCheck works?

Here is the evidence showing how this QC workflow helps establish and maintain 2D gel quality as part of your proteomics research. Complex top-down proteomics methods like 2-DE benefit from rigorous approaches to ensure reliability and reproducibility of methodologies and results. Here SpotCheck was used to set 2D gel quality within a lab and teach novices how to confidently attain this quality before running large experiments using a limited sample. A collaborative study with Bio-Rad to facilitate the use of 2-DE by providing reference protocols, images, image analysis tools and samples. HeLa Cell Lysate was distributed to 20 labs worldwide with a request to produce at least three gels, run to a common protocol, and submit images. These were analysed by SpotCheck to measure gel quality within and across-labs. For more information see the application note.

Complex top-down proteomics methods like 2-DE benefit from rigorous approaches to ensure reliability and reproducibility of methodologies and results. Here SpotCheck was used to set 2D gel quality within a lab and teach novices how to confidently attain this quality before running large experiments using a limited sample. For more information see the SpotCheck poster.

A collaborative study with Bio-Rad to facilitate the use of 2-DE by providing reference protocols, images, image analysis tools and samples. HeLa Cell Lysate was distributed to 20 labs worldwide with a request to produce at least three gels, run to a common protocol, and submit images. These were analysed by SpotCheck to measure gel quality within and across-labs.

Alignment

Makes it possible to accurately compare images by removing the positional variation.

What is Alignment?

Alignment takes two images and overlays them in the same co-ordinate space so that a spot on image A will be in the same location as the matching spot on image B.

What problem does it solve?

Alignment makes it possible to accurately compare images by removing any positional variation introduced during the gel running and imaging processes.

Why is alignment better than spot matching techniques?

Alignment makes it possible to overlay spots from each image, allowing one spot pattern to be created. Using separate spot detection and then matching spots between images is very inefficient.

How is alignment different to warping?

Warping and alignment both aim to remove the positional variation introduced during the blotting process. However warping is only accurate to the spot level whereas alignment is accurate to near pixel level.

Do I need to align DIGE images?

For low MW proteins there may be a shift due to the different dye so all DIGE images are first aligned to the Cy2 image, usually with very small changes, and then the Cy2 images are aligned together.

What is the Reference Gel?

One gel is selected to be the reference gel at the start of the experiment. This gel is used as the base for alignment so all other gels are aligned to this one. One gel is selected automatically but you can change this if you wish. An image with lots of clear, representative spots is best. The Reference Gel is only used for alignment.

How does alignment work with DIGE experiments?

Alignment works at 2 levels for DIGE experiments. All images within a DIGE gel are aligned to the Cy2 (internal standard) image first and then all Cy2 images are aligned to one Reference Cy2 image. Theoretically the Cy3 and Cy5 images should need no alignment to the Cy2 as they are all from the same gel. However, in some cases the spots at a low molecular weight migrate at slightly different rates due to the larger effect of the dye on the spot. Therefore a small amount of alignment may be required.

How does Alignment work?

Image alignment produces a spatial transform at the pixel level. The transform maps every point on an image to the appropriate point on its reference image. The transform is calculated so as to minimize the length of the warp vectors; interpolation is used to calculate the appropriate positional transform between vectors. This method allows for much more variability than using a rigid grid. An alignment vector associates a point on one image with the corresponding point on the second image. The alignment process uses information from many alignment vectors to create a mapping from one image to the other. The vectors can be automatically generated or manually added. For each pixel in the image, the mapping calculates the corresponding position in the aligned image.

What is the difference between automatic alignment vectors and manual vectors?

Automatic alignment attempts to look at the spot patterns between the images and match up similar spots/features and puts in automatically generated vectors. Manual vectors are added by users to features that they think are matched. These can be added before automatic alignment, which helps the algorithm, after automatic alignment to fix areas or simply on their own if automatic alignment fails or is difficult.

Can I “fix” alignment after spot detection?

Not at present.

But how does it know 1 pixel matches another…

Initially those pixels which are joined by alignment vectors are assumed to match together. In areas between vectors, interpolation is used to calculate the mapping of one pixel to another.

…especially with respect to that it could be a post translational shift rather than a gel artefact?

The re-mapping of pixels is not completely unconstrained. Specifically, pixels are not permitted to cross so if pixel A is to the left of pixel B on an image then the software would not allow the image to distort to the extent that pixel A shifts to the right of pixel B even if the user places a vector which suggests it should do so. Provided other spots are correctly matched around it this tends to prevent the image distortion from undoing a post transitional shift. Extreme distortions are usually apparent to the user by looking at the blue grid shown on the whole image view at the bottom left of the screen. Areas where the image distortion is extreme show up as a contorted grid.

How big a shift can auto alignment deal with? Or what if my gels are really distorted?

An overall shift of one image compared to the other is not a problem as the user can enter manual seed vectors which effectively tell it how big the initial shift should be. After this it requires that each subsequent match does not differ by too much from the matches surrounding it. Generally there should be a smooth change in the required warp vector displacements. This tends to stop the automated algorithm from matching post transitional shifts as they represent a discontinuity compared to surrounding vector field. In areas of strong gel distortion (i.e. areas which require a discontinuity in the local vector field) the user will generally have to add more seeds in that area to get the images to match automatically.

DIGE Experiments

How does alignment work with DIGE experiments?

Alignment works at 2 levels for DIGE experiments. All images within a DIGE gel are aligned to the Cy2 (internal standard) image first and then all Cy2 images are aligned to one Reference Cy2 image. Theoretically the Cy3 and Cy5 images should need no alignment to the Cy2 as they are all from the same gel. However, in some cases the spots at a low molecular weight migrate at slightly different rates due to the larger effect of the dye on the spot. Therefore a small amount of alignment may be required.

Spot Matching and Missing Values

How are spots matched?

With SameSpots, the same set of spot outlines are used for every gel in the experiment. This single set of spot outlines is always maintained during the SameSpots workflow, so any edits on one image are automatically applied to every other image in the experiment. This speeds up spot editing (you now only need to edit one image), and means that all spots are always 100% matched in all gels.
Because of this 100% matching all the time, there is no reason to edit the matching. Sometimes you may see spots that look incorrectly matched, but when this happens it’s because of a problem with the alignment, not with the matching. So you will need to correct this problem at the alignment stage. For example, the highlight image in the group “Drug A” (shown below) has been aligned incorrectly.

What are missing values, and why are they a problem?

Having missing values means that some variables do not have a measurement. For 2D analysis, this is caused by a match series having ‘gaps’. That is, some of the images have no measured spot volume. This can be because a spot wasn’t detected there, or because it was detected but matched incorrectly. Both of these causes can be corrected manually, but doing so introduces subjectivity and is error-prone (as well as very time consuming). The screenshot below of a measurements table shows the missing values in blue.

Missing values can cause serious problems. Most statistical procedures automatically eliminate cases with missing values, so you may not have enough data to perform the statistical analysis. Alternatively, although the analysis might run, the results may not be statistically significant because of the small amount of input data.

Missing values can also cause misleading results by introducing bias. With the traditional approach to 2D analysis, running more replicates makes this problem worse:

With each added gel, the chance of getting a missing value increases.

Typical levels of missing values introduced with increasing numbers of replicates/experiment using traditional analysis approach.

No. Gels No Spot Detection No. Matched in all gels % Missing Values
2 1000 900 10
5 1000 750 25
10 1000 600 40
20 1000 400 60
100 1000 100 90

 

Spot Detection, Mask of Disinterest, Filtering and Editing

Differential expression analysis is performed rapidly and objectively by the advanced spot detection algorithm.

When does Spot Detection occur?

After completing the Alignment stage a dialog will appear that allows you to detect the spots on the images. After detection you are lead to the Filtering stage.

What does Automatic Spot Detection do?

It takes a number of gel images and using advanced imaging techniques it creates a series of spot shapes which are used to create spot patterns (the spot map) that is used for all the images in the experiment. See the section on Spot Matching to understand the background to this method.

Does SameSpots take the biggest outline and apply it to the same spots across a gel image series?

The easy answer is “yes, unless there’s another spots fighting for the same material” – e.g. on one gel, there may be a large spot next to small one, so the best outline would be something like this:

But on another gel, the same spots may be about the same size so the best outlines would be something like this:

So we can’t just use the biggest outline, because that would eat away too much of the big spot so we need to come to some kind of compromise and we end up with this kind of thing:

What happens to spots if I change Alignment?

If you decide that alignment needs to change then all spots are deleted and recalculated after you have finished editing the alignment.

Which images are used to calculate the spots?

By default all of the images in the experiment are used as a basis for the spot detection. However before you detect the spots you can optionally deselect some images if you do not wish to use them to contribute.

Tip: if your experiment contains images from different capture techniques such as protein and phospho-stain it may be appropriate to use only the images from one capture process to generate the spot pattern.

What is the Mask of Disinterest?

This stage allows you to select areas of the reference gel so that no spots are detected in these areas (these are the areas of disinterest). It is useful to limit the number of spot artefacts before any editing/filtering is required.

What is Filtering?

The Filtering stage follows the automatic analysis of the aligned images that is performed after alignment (spot detection, matching, background subtraction and normalisation). On completion of analysis the Filtering page will open displaying the spot detection. If required you can remove spots based on position, area, normalised volume and combinations of these spot properties.

What happens to normalisation after Filtering?

Normalised Volumes are recalculated after any filtering is applied so any results noted in Review Spots (p-value, Fold, etc.) may change slightly.

What is background subtraction?

Background subtraction is necessary to accurately quantify the protein material in a spot. It corrects for the intensity level of the scanner bed, for example, and staining variations, etc, across the gel.

How is the background level calculated?

In SameSpots, the background level for a spot is equal to the lowest intensity value of the image pixels outside the spot’s outline. The background level is then subtracted from the intensity value of every pixel inside the spot outline to determine the volume of the protein material in the spot.

How is the Spot Volume calculated?

The Volume is the integrated intensity of the spot with any calculated background subtracted. (This value is used in the calculation of Normalised volume). The Raw spot volume is the sum of the Volume + Background.

How are spot volumes calculated for DIGE experiments?

DIGE image results are all measured as a ratio against the Cy2 image (internal standard). All results are log transformed to stabilise the variance (see Spot Detection section).

How is the Fold difference calculated?

A fold difference is the ratio of normalised volumes of a single spot between gels. This can also be calculated on average normalised volumes of a single spot between groups of gels. The Maximum fold difference refers to the fold difference in expression between 2 of the groups in the experiment. This is the fold difference of the groups with the highest and lowest average normalised volumes.

What do Expression profiles plot?

The normal expression profile view shows the log normalised spot profile (i.e. expression) values. Standardised profiles take these log normalised values and transforms them (centre and scale) to have mean = 0 and std deviation = 1. Basically, this allows you to compare the change in the profile for spots over a large dynamic range. An important point is that centering and scaling doesn’t change the correlation between the profiles. For DIGE data, the difference in both views is quite minimal where DIGE normalised values are log ratios and so are naturally scattered about the zero axis. The standardised expression view is more appropriate when looking at single stain data.

How does SameSpots handle streaks?

Areas with tails or that are badly resolved are hard to predict as it depends on exactly how they combine. In such cases you can usually select images in the 3D view that show why ‘clear spots’ in one gel are not clear when you consider the gel series as a whole.
Inconsistency between the gels (including streaks), usually manifested as a whole group of spots being detected as one big unit, is something that signals a technical issue with the experiment. The software simply splits things into the smallest consistent spot it can. If a whole area is inconsistent you will get one big spot. The inconsistency can arise from many sources, all of which must be considered by the analyst – so again it is something we must signal. A spot in a tail that is not present in all gels is extremely dubious to quantitate.

Normalisation

Why normalise?

Normalisation is required in proteomics experiments to calibrate data between different sample runs. This corrects for systematic experimental variation when running samples (for example, differences in sample loading). The effect of such systematic errors can be corrected by a unique gain factor for each sample – a scalar multiple that is applied to each feature abundance measurement.

What calculations use the Normalised Volume?

All Fold changes use Normalised Volumes. All statistics are based on the log of the Normalised Volume.

Why do we log the Normalised Volume?

Imagine you have two spots, one big and one small. These spots have been detected across several gels. Let’s assume that the spot data has been normalised. Now look at the range of values for the big spot and the small spot. You will see that the range for the small spot is less than the range for the big spot. In other words, the variance of the values for the small spot is less than the variance for the big spot. This is typical of proteomics data. As the mean expression level of spots gets higher, the variance of the spot data also increases. Now, most statistical tests require that the variance should be the same. So, we need to convert (transform) the data so that the new spot data has equal variance. This is called variance stabilisation.To stabilise the variance we use the log transform.

If we edit spots does it change the Normalised Volume of other spots?

If you are in the Filtering mode then YES the normalised volumes are recalculated after filtering. If you are reviewing spots in Review Spots then NO the normalised volumes are not recalculated. This decision was so that other p-values (for instance) would not change with minor edits of different spots.

How do we calculate the Normalised Volume?

For any given spot the Normalised Volume is calculated using the Volume which is based on the measured spot volume (sum of all the pixel intensities within the spot boundary often referred to as the ‘Raw’ volume) minus the Background Lowest pixel intensity value outside the spots boundary. The key underlying assumption is that most spots do not change in abundance (the abundance distributions do not alter globally), and hence recalibration to globally adjust all runs to be on the ‘same scale’ is appropriate. This scalar factor required can be represented as ακ for each sample: y’I = ακ yi. Where the yis the measured volume of the spot i on sample k, ακ is the scalar factor sample k and y’I is the normalised volume of the spot i on sample k.

However, there are several means by which this scalar can be estimated, even within the parameters of the key underlying assumption (that overall, the distributions do not change).

A commonly employed method to determine this scalar in the past was adjustment based on the total signal of all species in the sample (to total spot volume); for example, expressing abundances as a proportion of the total. However, such a method is at risk of perturbation by changes in abundant species that dominate the total, and so can be inaccurate and introduce further variability.

SameSpots instead uses ratiometric data in log space, along with a median and mean absolute deviation outlier filtering approach, to calculate the scalar factor. This is a more robust approach, which is less influenced by noise in the data and any biases owing to abundant species, as the absolute values of abundance are disregarded.

Method

Fix the data of one sample and calibrate all other sample data to this reference using the ratio of abundances. The Ratio is calculated in log space to give equal weighting to observed expression differences. The base assumption is that enough spots should NOT be changing in abundance. You need to consider this when designing and analysing your experiments.

What are Data Normalisation benefits?

Corrects for factors that result in experimental variation when running gel samples. Such factors can range from sample quantity to measurement sensitivity. This means you can compare quantification between different samples. Calculating abundance ratios in log space gives a more robust assessment of the gain factor, so large spots and outliers do not have a disproportional effect. Occurs automatically as part of the data analysis processing.

Can I turn Normalisation off?

Yes, in the normalisation page which is available from the Filtering Mode. Please understand that this will change your results considerably and you must understand the consequences.

Statistical Results and Experiment Design

What is Experiment Design Setup?

Many statistical tests compare 2 or more groupings of data (e.g. Control and Treated) to look at the differences between the groupings. Experiment Design is used to set up these groupings.

Can I compare different groupings in the same experiment?

Yes.

Just create more than one “Experiment Design” and then when looking at the results on other pages you can choose which grouping to use. As soon as the experiment design is changed, all measurements and statistics are recalculated. The notes and tags will remain unchanged, so you can easily see how individual spots are behaving in different experiment designs.

What is a Between-subject design?

Do samples from a given subject appear in only one condition? (i.e. control versus various drug treatments). Then use the between-subject design. The ANOVA calculation assumes that the conditions are independent and applies the statistical test that assumes the means of the conditions are equal.

What is a Within-subject design?

Have you taken samples from a given subject under different conditions? (i.e. the same subject has been sampled over a period of time or after one or more treatments). Then use the within-subject design. Here a standard ANOVA is not appropriate as the data violates the ANOVA assumption of independence. Therefore, by using a repeated measures ANOVA, individual differences can be eliminated or reduced as a source of between condition differences. This within-subject design can be thought of as an extension of the paired samples t-test, including comparison between more than two repeated measures.

What is a Two-way ANOVA Design?

Do each of your samples represent two independent variables? (i.e. either a wild type or mutant strain condition and control or treated condition within the same analysis). Then use the two-way ANOVA design. Each image must belong to two of the conditions. The interaction term in a two-way ANOVA informs you whether the effect of one of your conditions is the same for all values of your other condition (and vice versa).

Why does using SameSpots improve the results of using multivariate statistics?

SameSpots gives two significant advantages:

1. No missing values

Most statistical methods perform poorly with incomplete data. Most 2D platforms get round
this problem by guessing values for missing data. As SameSpots measures a value for every
protein on every gel there are no missing values.

2. Consistent spot outlines

Inconsistently detected spots results in more variable measurements. This reduces the
accuracy of the measurements. SameSpots uses consistent outlines and so does
not suffer from this problem.

What is Principle Components Analysis (PCA)?

Principal components analysis calculates a linear projection of the data such that the first axis (Principal component 1) will show the largest variance that could be represented by the transform (i.e. a linear combination of translation, rotation, scaling). The second principal component is the best direction orthogonal to the first axis that accounts for the next largest chunk of the variance in the data. This is an ‘unsupervised’ technique (i.e. it does not use any knowledge of the grouping of the data) and as such is useful in finding if your data has the groupings you expect or if there are outliers in your data.

How do you use the PCA plot?

Principal components analysis (PCA) converts data into a form which allows us to visualize it in fewer dimensions. So in a PCA plot gels which have similar expression patterns will be close together whilst gels that have different expression patterns will be far apart. One use of PCA is to confirm that the gels group according to their expected experimental conditions. This can be used as a quality control to identify outlier gels. It is also possible to plot protein spot data in the PCA graph and this may allow us to identify spots which are contributing to observed differences between gel groups.

What is Correlation Analysis?

Correlation analysis is a technique which allows us to explore similarities between expression profiles of protein spots across multiple gels. Spots which vary in a similar way will have a high correlation value (with 1 indicating an exact match); spots which show opposing behaviour will have a large negative correlation value (with -1 indicating a perfect mismatch); spots which are not related will have a correlation value close to zero. Correlation analysis results can be plotted as a dendrogram which shows hierarchical grouping of spots based on their correlation values.

What are q-values, and why are they important?

Q-values are the name given to adjusted p-values. The method is used to account for the multiple testing problem (see below) and uses an optimised FDR approach to solve this problem.

What are p-values?

The object of differential 2D expression analysis is to find those spots which show expression difference between groups, thereby signifying that they may be involved in some biological process of interest to the researcher. Due to chance, there will always be some difference in expression between groups. However, it is the size of this difference in comparison to the variance (i.e. the range over which expression values fall) that will tell us if this expression difference is significant or not. Thus, if the difference is large but the variance is also large, then the difference may not be significant. On the other hand, a small difference coupled with a very small variance could be
significant. We use the one way Anova test (equivalent t-test for two groups) to formalise this calculation. The tests return a p-value that takes into account the mean difference and the variance and also the sample size. The p-value is a measure of how likely you are to get this spot data if no real difference existed. Therefore, a small p-value indicates that there is a small chance of getting this data if no real difference existed and therefore you decide that the difference in group expression data is significant. By small we usually mean 0.05.

What do the p-values of a two-way ANOVA analysis mean?

The row and column ANOVA p-values are like the one-way ANOVA for the row and column conditions. The interaction p-value is the chance there is no interaction between the row and column conditions. For example, the table below represents the two-way ANOVA analysis of a controlled or treated condition (columns) of a wild-type or mutant strain (rows). The analysis shows that there are statistically significant differences between both the control/treated or wild-type/mutant strain for this spot. The interaction p-value is not significant, suggesting the control/treatment condition does not interact with the mutant/wild-type condition.

What is Statistical Power, and why is it important?

ANOVA p-value gives you the probability that the difference you are seeing is not a real difference but just happened by chance. But this is really only half the story. Power gives you the probability that you’d be able to see the difference if there was one. You need both values to complete the picture…

Another good thing about Power is that it depends on the number of replicates. So we can use it to answer questions like “Have you done enough replicates to be able to see a difference were it to exist?” And, by calculating the Power for different numbers of replicates, “How much extra Power would you get if you’d run 5 replicates instead of 4?”

What is the multiple testing problem?

When we set a p-value threshold of, for example, 0.05, we are saying that there is a 5% chance that the result is a false positive. In other words, although we have found a statistically significant result, in reality, there is no difference in the group means. While 5% is acceptable for one test, if we do lots of tests on the data, then this 5% can result in a large number of false positives. For example, if there are 200 spots on a gel and we apply an ANOVA or t-test to each, then we would expect to get 10 false positives by chance alone. This is known as the multiple testing problem.

Multiple testing and the False Discovery Rate (FDR).

While there are a number of approaches to overcoming the problems due to multiple testing, they all attempt to assign an adjusted p-value to each test, or similarly, reduce the p-value threshold. Many traditional techniques such as the Bonferroni correction are too conservative in the sense that while they reduce the number of false positives, they also reduce the number of true discoveries. The False Discovery Rate approach is a more recent development. This approach also determines adjusted p-values for each test. However, it controls the number of false discoveries in those tests that result in a discovery (i.e. a significant result). Because of this, it is less conservative that the Bonferroni approach and has greater ability (i.e. power) to find truly significant results. Another way to look at the difference is that a p-value of 0.05 implies that 5% of all tests will result in false positives. An FDR adjusted p-value (or q-value) of 0.05 implies that 5% of significant tests will result in false positives. The latter is clearly a far smaller quantity.

How are Q-values calculated?

The FDR approach is optimised by using characteristics of the p-value distribution to produce more accurate adjusted p-values. In what follows we will tie up some ideas and hopefully this will help clarify some of the ideas about p and q values. It is usual to test many hundreds or thousands of spot variables in a proteomics experiment. Each of these tests will produce a p-value. The p-values take on a value between 0 and 1 and we can create a histogram to get an idea of how the p-values are distributed between 0 and 1. Some typical p-value distributions are shown on next page. On the x-axis we have histogram bars representing p-values. Each has a width of 0.05 and so in the first bar (red or green) we have those p-values that are between 0 and 0.05. Similarly, the last bar represents those p-values between 0.95 and 1.0, and so on. The height of each bar gives an indication of how many values are in the bar. This is called a density distribution because the area of all the bars always adds up to 1. Although the two distributions appear quite different, you will notice that they flatten off towards the right of the histogram. The red (or green) bar represents the significant values, if you set a p-value threshold of 0.05.

If there are no significant changes in the experiment, you will expect to see a distribution more like that on the left above while an experiment with significant changes will look more like that on the right. So, even if there are no significant changes in the experiment, we still expect, by chance, to get p-values at 0.05. These are false positives, and shown in red. Even in an experiment with significant changes (in green), we are still unsure if a p-value at 0.05 represents a true discovery or a false positive. Now, the q-value approach tries to find the height where the p-value distribution flattens out and incorporates this height value into the calculation of FDR adjusted p-values. We can see this in the histogram below. This approach helps to establish just how many of the significant values are actually false
positives (the red portion of the green bar).

Now, the q-values are simply a set of values that will lie between 0 and 1. Also, if you order the p-values used to calculate the q-values, then the q-values will also be ordered. This can be seen in the following screen shot from SameSpots. Notice that q-values can be repeated.

To interpret the q-values, you need to look at the ordered list of q-values. There are 839 spots in this experiment. If we take spot 52 as an example, we see that it has a p-value of 0.01 and a q-value of 0.0141. Recall that a p-value of 0.01 implies a 1% chance of false positives, and so with 839 spots, we expect between 8 or 9 false positives, on average, i.e. 839*0.01 = 8.39. In this experiment, there are 52 spots with a value of 0.01 or less, and so 8 or 9 of these will be false positives. On the other hand, the q-value is a little greater at 0.0141, which means we should expect 1.41% of all the spots with q-value less than this to be false positives. This is a much better situation. We know that 52 spots have a q-value less than 0.0141 and so we should expect 52*0.0141 = 0.7332 false positives, i.e. less than one false positive. Just to reiterate, false positives according to p-values take all 839 values into account when determining how many false positives we should expect to see while q-values take into account only those tests with q-values less the threshold we choose. Of course, it is not always the case that q-values will result in less false positives, but what we can say is that they give a far more accurate indication of the level of false positives for a given cut-off value.

When doing lots of tests, as in a proteomics experiment, it is more intuitive to interpret p and q values by looking at the entire list of values in this way rather that looking at each one independently. In this way, a threshold of 0.05 has meaning across the entire experiment.
When deciding on a cut-off or threshold value, you should do this from the point of view of how many false positives will this result in, rather than just randomly picking a p- or q-value of 0.05 and saying that everything with a value less than this is significant.

 

Spot Picking

Can you create spot picking files?

Yes.

Spot Picking occurs near the end of the process and allows you to set up the spot picking to pick off either one of the gels in the experiment or from an additional gel run for the purpose of picking off. In the case of DIGE the latter is the norm where a gel external to the experiment is used as the picking gel.

Can I choose which spots to include?

Yes.

Using tags from the experiment you can choose which spots to pick from the spot list.

How does picking work with an image not in the experiment?

Picking involves the alignment of a picking image with the original gel images in order to export the correct picking coordinates to the robot of choice. In this case you align the new image with the rest of the experiment and this is then accounted for when creating the pick list.

Can I change the spot centre for picking?

The pick point (red crosshairs) can be repositioned by click/dragging within the spot.

Which Picking Robots does SameSpots support?

We produce picking lists for:
• GelPix
• ProPic
• Ettan
• Proteineer
• Generic Picking lists

Importing and Exporting Results

What is the Clip Gallery?

At every stage of the SameSpots workflow the images and data tables can be added to the Clip Gallery. The clip gallery makes it easier to capture images and tables from the software for the production of posters, publications and presentations. Images are saved as high resolution .png files (300 dpi). The saved images are retained as part of the experiment and are stored accordingly. This facility allows you to capture (high resolution) images that can be used in the development of specific reports and/or used as part of the process of publishing your experimental findings.

Can you export the Spot Measurements?

Yes.

You can export to a CSV file (Comma Separated Values) that can be opened in Excel or other spreadsheet applications.

Can you import external Information?
Protein Names and Mascot information

Following Spot picking if you have determined the identity of the picked spots using Mass Spectrometry for example you can manage this information and import it into the software. If you have performed Mascot searches, following picking and Mass Spectrometry, in order to identify the protein(s) present in a spot and saved the results for each spot as XML output files from Mascot then you can import these files for the picked spots.

Tags from external test

You can Import Spot Numbers as Tag, generated as the result of some external test performed on exported data. If you have created Tags for the spots then these will appear in the Tags column allowing you to sort your data on the basis of any tagging you have performed during the SameSpots workflow.

Is there an Experiment Report?

At the end of an experiment you can create an html report of the results. You can choose what to display including which spots to report on. There are 5 main sections to the report with multiple subsections:

• Reference image
• Experiment design
• Spot table
• Spot details
• Experiment description

Disclaimer
All material in this brochure has been written by collating information from various sources. Where possible these sources have been cited. It is a guide and not a protocol or standard operating procedure. It may not give optimal results for individual samples and systems. You should check parameters specific to your own sample, instruments and image capture software. Common best practice is to run pilot experiments to optimise sample handling, gel running, image capture and image analysis. While TotalLab has made reasonable attempts to ensure that the information contained in this brochure has been obtained from reliable sources, TotalLab is not responsible for any errors or omissions, or for the results obtained from the use of this information. All information in this brochure is provided “as is”, with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information, and without warranty of any kind, express or implied, including,but not limited to warranties of performance, merchantability and fitness for a particular purpose. Nothing herein shall to any extent substitute for the independent investigations and the sound technical and business judgment of the reader. In no event will TotalLab, or its directors, partners, employees or agents, be liable to you or anyone else for any decision made or action taken in reliance on the information in this Site or for any consequential, special or similar damages, even if advised of the possibility of such damages. SpotMap and SameSpots are trademarks of TotalLab Limited. All other products mentioned are trademarks or registered trademarks of their respective companies.