| FitCycles {Rolexa} | R Documentation |
Model-based classification of intensity data points, to either perform a base calling or generate diagnostic plots
InitializeModel(data) InitializeFromSequence(seqInit, cycle = 1) InitializeFromIntensities(int, cycle = 1) SeqFitScore(int, colonies = 1:100) SeqEvalScore(int, seqInit, colonies = 1:100) PlotFitCycles(int, cycles = c(1,11,21,31), par = list()) PlotEvalCycles(int, seqInit, cycles = c(1,11,21,31), par = list())
int |
an intensity matrix with columns 1:4 containing cluster
coordinates and columns c*4+1:4 containg the intensities at
cycle c |
seqInit |
a sequence matrix from a previous base calling with cluster
coordinates in columns 1:4 and sequences in column 5 |
data |
a single cycle, 4-column intensity matrix |
cycle,cycles |
which cycle(s) to select |
colonies |
which colonies to select |
par |
parameters for the plotting functions |
This will use the EEV model of mclust to fit the
data clouds with a mixture of 4 gaussian distributions.
SeqFitScore and SeqEvalScore generate a list of tags and
entropy scores for each sequenced colony.
PlotFitCycles and PlotEvalCycles plots two 2-dimensional
projections for each selected
cycle with gaussian parameters represented by standard ellipses and data
points colored according to the induced classification.
The fitting procedure uses
Rolexa.env$HThresholds to decide if a base is
unambiguous or if degenrate IUPAC codes will be used.
InitializeModel makes an initial classification to start
EM optimization. It does so by
constructing prototype variances which are ten cenetered and scaled
using values obtained by fitting a mixture of three 1-dimensional
gaussians to the histograms of each intensity channels.
Then an E-step is performed to get a
classification of the data suitable to start the EM
algorithm.
InitializeFromSequence returns a classification based on the
specified base calling.
InitializeFromIntensities returns a classification based on the
maximum intensity for each colony and each specified cycle
SeqFitScore and SeqEvalScore return a matrix with 4
coordinates columns, one sequence column and
Rolexa.env$SequencingLength columns
with an entropy for each base called.
InitializeModel, InitializeFromSequence and InitializeFromIntensities returns a 4*nrow(int) probability matrix,
Jacques Rougemont, Arnaud Amzallag, Christian Iseli, Laurent Farinelli, Ioannis Xenarios, Felix Naef
Probabilistic base calling of Solexa sequencing data, BMC Bioinformatics 2008, 9:431
library(Rolexa.demo) data(one_tile) par(ask=TRUE) PlotFitCycles(int=int,cycles=1) PlotEvalCycles(int=int,seqInit=seq,cycles=1)