FitCycles {Rolexa}R Documentation

Fit and Plot Intensities

Description

Model-based classification of intensity data points, to either perform a base calling or generate diagnostic plots

Usage

InitializeModel(data)
InitializeFromSequence(seqInit, cycle = 1)
InitializeFromIntensities(int, cycle = 1)
SeqFitScore(int, colonies = 1:100)
SeqEvalScore(int, seqInit, colonies = 1:100)
PlotFitCycles(int, cycles = c(1,11,21,31), par = list())
PlotEvalCycles(int, seqInit, cycles = c(1,11,21,31), par = list())

Arguments

int an intensity matrix with columns 1:4 containing cluster coordinates and columns c*4+1:4 containg the intensities at cycle c
seqInit a sequence matrix from a previous base calling with cluster coordinates in columns 1:4 and sequences in column 5
data a single cycle, 4-column intensity matrix
cycle,cycles which cycle(s) to select
colonies which colonies to select
par parameters for the plotting functions

Details

This will use the EEV model of mclust to fit the data clouds with a mixture of 4 gaussian distributions. SeqFitScore and SeqEvalScore generate a list of tags and entropy scores for each sequenced colony. PlotFitCycles and PlotEvalCycles plots two 2-dimensional projections for each selected cycle with gaussian parameters represented by standard ellipses and data points colored according to the induced classification.

The fitting procedure uses Rolexa.env$HThresholds to decide if a base is unambiguous or if degenrate IUPAC codes will be used.

InitializeModel makes an initial classification to start EM optimization. It does so by constructing prototype variances which are ten cenetered and scaled using values obtained by fitting a mixture of three 1-dimensional gaussians to the histograms of each intensity channels. Then an E-step is performed to get a classification of the data suitable to start the EM algorithm.

InitializeFromSequence returns a classification based on the specified base calling.

InitializeFromIntensities returns a classification based on the maximum intensity for each colony and each specified cycle

Value

SeqFitScore and SeqEvalScore return a matrix with 4 coordinates columns, one sequence column and Rolexa.env$SequencingLength columns with an entropy for each base called.
InitializeModel, InitializeFromSequence and InitializeFromIntensities returns a 4*nrow(int) probability matrix,

Author(s)

Jacques Rougemont, Arnaud Amzallag, Christian Iseli, Laurent Farinelli, Ioannis Xenarios, Felix Naef

References

Probabilistic base calling of Solexa sequencing data, BMC Bioinformatics 2008, 9:431

Examples

library(Rolexa.demo)
data(one_tile)
par(ask=TRUE)
PlotFitCycles(int=int,cycles=1)
PlotEvalCycles(int=int,seqInit=seq,cycles=1)

[Package Rolexa version 1.1.7 Index]