Title: | Accessing Intra-Tumor Heterogeneity and Tracking Longitudinal and Spatial Clonal Evolutionary History by Next-Generation Sequencing |
---|---|
Description: | A statistical framework and computational procedure for identifying the sub-populations within a tumor, determining the mutation profiles of each subpopulation, and inferring the tumor's phylogenetic history. The input are variant allele frequencies (VAFs) of somatic single nucleotide alterations (SNAs) along with allele-specific coverage ratios between the tumor and matched normal sample for somatic copy number alterations (CNAs). These quantities can be directly taken from the output of existing software. Canopy provides a general mathematical framework for pooling data across samples and sites to infer the underlying parameters. For SNAs that fall within CNA regions, Canopy infers their temporal ordering and resolves their phase. When there are multiple evolutionary configurations consistent with the data, Canopy outputs all configurations along with their confidence assessment. |
Authors: | Yuchao Jiang, Nancy R. Zhang |
Maintainer: | Yuchao Jiang <[email protected]> |
License: | GPL-2 |
Version: | 1.3.0 |
Built: | 2025-02-14 05:27:02 UTC |
Source: | https://github.com/cran/Canopy |
To determine whether the sampled tree will be accepted by comparing the
likelihood, used in canopy.sample.
addsamptree(tree,tree.new)
addsamptree(tree,tree.new)
tree |
input tree (current) |
tree.new |
input tree (newly sampled) |
returned tree (either retain the old tree or accept the new tree).
Yuchao Jiang [email protected]
data(MDA231) data(MDA231_tree) sna.name = MDA231$sna.name Y = MDA231$Y C = MDA231$C R = MDA231$R X = MDA231$X WM = MDA231$WM Wm = MDA231$Wm epsilonM = MDA231$epsilonM epsilonm = MDA231$epsilonm # sampling location of SNAs tree.new = MDA231_tree tree.new$sna = sampsna(MDA231_tree) tree.new$Z = getZ(tree.new, sna.name) tree.new$Q = getQ(tree.new, Y, C) tree.new$H = tree.new$Q tree.new$VAF = getVAF(tree.new, Y) tree.new$likelihood = getlikelihood(tree.new, R, X, WM, Wm, epsilonM, epsilonm) tree = addsamptree(MDA231_tree,tree.new)
data(MDA231) data(MDA231_tree) sna.name = MDA231$sna.name Y = MDA231$Y C = MDA231$C R = MDA231$R X = MDA231$X WM = MDA231$WM Wm = MDA231$Wm epsilonM = MDA231$epsilonM epsilonm = MDA231$epsilonm # sampling location of SNAs tree.new = MDA231_tree tree.new$sna = sampsna(MDA231_tree) tree.new$Z = getZ(tree.new, sna.name) tree.new$Q = getQ(tree.new, Y, C) tree.new$H = tree.new$Q tree.new$VAF = getVAF(tree.new, Y) tree.new$likelihood = getlikelihood(tree.new, R, X, WM, Wm, epsilonM, epsilonm) tree = addsamptree(MDA231_tree,tree.new)
1242 SNAs from sequencing of leukemia patient at two timepoints. All SNAs are filtered to be from copy-number-neutral region.
data(AML43)
data(AML43)
List of simulated SNA input data for Canopy.
Yuchao Jiang [email protected]
data(AML43)
data(AML43)
To get BIC as a model selection criterion from MCMC sampling results.
canopy.BIC(sampchain,projectname,K,numchain,burnin,thin,pdf)
canopy.BIC(sampchain,projectname,K,numchain,burnin,thin,pdf)
sampchain |
list of sampled trees returned by |
projectname |
name of project |
K |
number of subclones (vector) |
numchain |
number of MCMC chains with random initiations |
burnin |
burnin of MCMC chains |
thin |
MCMC chains thinning |
pdf |
whether a pdf plot of BIC should be generated, default to be TRUE |
BIC values (vector) for model selection with plot generated (pdf format).
Yuchao Jiang [email protected]
data(MDA231_sampchain) sampchain = MDA231_sampchain projectname = 'MD231' K = 3:6 numchain = 20 burnin = 150 thin = 5 bic = canopy.BIC(sampchain = sampchain, projectname = projectname, K = K, numchain = numchain, burnin = burnin, thin = thin)
data(MDA231_sampchain) sampchain = MDA231_sampchain projectname = 'MD231' K = 3:6 numchain = 20 burnin = 150 thin = 5 bic = canopy.BIC(sampchain = sampchain, projectname = projectname, K = K, numchain = numchain, burnin = burnin, thin = thin)
EM algorithm for multivariate clustering of SNAs.
canopy.cluster(R, X, num_cluster, num_run, Mu.init = NULL, Tau_Kplus1 = NULL)
canopy.cluster(R, X, num_cluster, num_run, Mu.init = NULL, Tau_Kplus1 = NULL)
R |
alternative allele read depth matrix |
X |
total read depth matrix |
num_cluster |
number of mutation clusters (BIC as model selection metric) |
num_run |
number of EM runs for estimation for each specific number of clusters (to avoid EM being stuck in local optima) |
Mu.init |
(optional) initial value of the VAF centroid for each mutation cluster in each sample |
Tau_Kplus1 |
(optional) pre-specified proportion of noise component in clustering, uniformly distributed between 0 and 1 |
Matrix of posterior probability of cluster assignment for each mutation.
Yuchao Jiang [email protected]
data(AML43) R = AML43$R X = AML43$X Mu = AML43$Mu Tau = AML43$Tau pG = canopy.cluster.Estep(Tau, Mu, R, X)
data(AML43) R = AML43$R X = AML43$X Mu = AML43$Mu Tau = AML43$Tau pG = canopy.cluster.Estep(Tau, Mu, R, X)
E-step of EM algorithm for multivariate clustering of SNAs. Used in
canopy.cluster
.
canopy.cluster.Estep(Tau, Mu, R, X)
canopy.cluster.Estep(Tau, Mu, R, X)
Tau |
prior for proportions of mutation clusters |
Mu |
MAF centroid for each mutation cluster in each sample |
R |
alternative allele read depth matrix |
X |
total read depth matrix |
Matrix of posterior probability of cluster assignment for each mutation.
Yuchao Jiang [email protected]
data(AML43) R = AML43$R X = AML43$X Mu = AML43$Mu Tau = AML43$Tau pG = canopy.cluster.Estep(Tau, Mu, R, X)
data(AML43) R = AML43$R X = AML43$X Mu = AML43$Mu Tau = AML43$Tau pG = canopy.cluster.Estep(Tau, Mu, R, X)
M-step of EM algorithm for multivariate clustering of SNAs. Used in
canopy.cluster
.
canopy.cluster.Mstep(pG, R, X, Tau_Kplus1)
canopy.cluster.Mstep(pG, R, X, Tau_Kplus1)
pG |
matrix of posterior probability of cluster assignment for each mutation |
R |
alternative allele read depth matrix |
X |
total read depth matrix |
Tau_Kplus1 |
proportion mutation cluster that is uniformly distributed to capture noise |
List of bic, converged Mu, Tau, and SNA cluster assignment.
Yuchao Jiang [email protected]
data(AML43) R = AML43$R; X = AML43$X num_cluster = 4 # Range of number of clusters to run num_run = 6 # How many EM runs per clustering step Tau_Kplus1=0.05 # Proportion of noise component Mu.init=cbind(c(0.01,0.15,0.25,0.45),c(0.2,0.2,0.01,0.2)) # initial value # of centroid canopy.cluster=canopy.cluster(R = R, X = X, num_cluster = num_cluster, num_run = num_run, Mu.init = Mu.init, Tau_Kplus1=Tau_Kplus1)
data(AML43) R = AML43$R; X = AML43$X num_cluster = 4 # Range of number of clusters to run num_run = 6 # How many EM runs per clustering step Tau_Kplus1=0.05 # Proportion of noise component Mu.init=cbind(c(0.01,0.15,0.25,0.45),c(0.2,0.2,0.01,0.2)) # initial value # of centroid canopy.cluster=canopy.cluster(R = R, X = X, num_cluster = num_cluster, num_run = num_run, Mu.init = Mu.init, Tau_Kplus1=Tau_Kplus1)
To generate a posterior tree from the sub-tree space of trees with the same configurations.
canopy.output(post, config.i, C)
canopy.output(post, config.i, C)
post |
list returned by |
config.i |
configuration of sub-tree space to be output |
C |
CNA and CNA-region overlapping matrix, only needed if overlapping CNAs are used as input |
posterior tree from the sub-tree space of trees with the same configurations.
Yuchao Jiang [email protected]
data(MDA231_sampchain) data(MDA231) sampchain = MDA231_sampchain projectname = 'MD231' K = 3:6 numchain = 20 burnin = 150 thin = 5 optK = 4 C = MDA231$C post = canopy.post(sampchain = sampchain, projectname = projectname, K = K, numchain = numchain, burnin = burnin, thin = thin, optK = optK, C = C) config.i = 3 output.tree = canopy.output(post = post, config.i = config.i, C = C)
data(MDA231_sampchain) data(MDA231) sampchain = MDA231_sampchain projectname = 'MD231' K = 3:6 numchain = 20 burnin = 150 thin = 5 optK = 4 C = MDA231$C post = canopy.post(sampchain = sampchain, projectname = projectname, K = K, numchain = numchain, burnin = burnin, thin = thin, optK = optK, C = C) config.i = 3 output.tree = canopy.output(post = post, config.i = config.i, C = C)
To plot Canopy's reconstructed phylogeny. Major plotting function of Canopy.
canopy.plottree(tree, pdf, pdf.name, txt, txt.name)
canopy.plottree(tree, pdf, pdf.name, txt, txt.name)
tree |
input tree to be plotted |
pdf |
whether a pdf plot should be generated, default to be FALSE |
pdf.name |
name of pdf to be generated, has to be provided if pdf is to be generated |
txt |
whether a txt file should be generated with information on mutations along the tree branches, default to be FALSE |
txt.name |
name of txt to be generated, has to be provided if txt is to be generated |
Plot of tree structure, clonal frequency and mutation legends (pdf format).
Yuchao Jiang [email protected]
data(MDA231_tree) canopy.plottree(MDA231_tree, pdf = TRUE, pdf.name = 'MDA231_tree.pdf')
data(MDA231_tree) canopy.plottree(MDA231_tree, pdf = TRUE, pdf.name = 'MDA231_tree.pdf')
Burnin, thinning, and posterior evaluation of MCMC sampled trees.
canopy.post(sampchain, projectname, K, numchain, burnin, thin, optK, C, post.config.cutoff)
canopy.post(sampchain, projectname, K, numchain, burnin, thin, optK, C, post.config.cutoff)
sampchain |
list of sampled trees returned by |
projectname |
name of project |
K |
number of subclones (vector) |
numchain |
number of MCMC chains with random initiations |
burnin |
burnin of MCMC chains |
thin |
MCMC chain thinning. |
optK |
optimal number of subclones determined by |
C |
CNA and CNA-region overlapping matrix, only needed if overlapping CNAs are used as input |
post.config.cutoff |
cutoff value for posterior probabilities of tree configurations, default is set to be 0.05 (only tree configurations with greater than 0.05 posterior probabilities will be reported by Canopy) |
samptreethin |
list of sampled posterior trees |
samptreethin.lik |
vector of likelihood of sampled posterior trees |
config |
vector of configuration of sampled posterior trees (integer values) |
config.summary |
summary of configurations of sampled posterior trees |
Yuchao Jiang [email protected]
data(MDA231_sampchain) data(MDA231) sampchain = MDA231_sampchain projectname = 'MD231' K = 3:6 numchain = 20 burnin = 150 thin = 5 optK = 4 C = MDA231$C post = canopy.post(sampchain = sampchain, projectname = projectname, K = K, numchain = numchain, burnin = burnin, thin = thin, optK = optK, C = C)
data(MDA231_sampchain) data(MDA231) sampchain = MDA231_sampchain projectname = 'MD231' K = 3:6 numchain = 20 burnin = 150 thin = 5 optK = 4 C = MDA231$C post = canopy.post(sampchain = sampchain, projectname = projectname, K = K, numchain = numchain, burnin = burnin, thin = thin, optK = optK, C = C)
To sample the posterior trees. Major function of Canopy.
canopy.sample(R, X, WM, Wm, epsilonM, epsilonm, C=NULL, Y, K, numchain, max.simrun, min.simrun, writeskip, projectname, cell.line=NULL, plot.likelihood=NULL)
canopy.sample(R, X, WM, Wm, epsilonM, epsilonm, C=NULL, Y, K, numchain, max.simrun, min.simrun, writeskip, projectname, cell.line=NULL, plot.likelihood=NULL)
R |
alternative allele read depth matrix |
X |
total read depth matrix |
WM |
observed major copy number matrix |
Wm |
observed minor copy number matrix |
epsilonM |
observed standard deviation of major copy number (scalar input is transformed into matrix) |
epsilonm |
observed standard deviation of minor copy number (scalar input is transformed into matrix) |
C |
CNA and CNA-region overlapping matrix, only needed if overlapping CNAs are used as input |
Y |
SNA and CNA-region overlapping matrix |
K |
number of subclones (vector) |
numchain |
number of MCMC chains with random initiations |
max.simrun |
maximum number of simutation iterations for each chain |
min.simrun |
minimum number of simutation iterations for each chain |
writeskip |
interval to store sampled trees |
projectname |
name of project |
cell.line |
default to be FALSE, TRUE if input sample is cell line (no normal cell contamination) |
plot.likelihood |
default to be TRUE, posterior likelihood plot generated for check of
convergence and selection of burnin and thinning in
|
List of sampleed trees in subtree space with different number of subclones; plot of posterior likelihoods in each subtree space generated (pdf format).
Yuchao Jiang [email protected]
data(MDA231) R = MDA231$R; X = MDA231$X WM = MDA231$WM; Wm = MDA231$Wm epsilonM = MDA231$epsilonM; epsilonm = MDA231$epsilonm C = MDA231$C Y = MDA231$Y K = 3:6 numchain = 20 projectname = 'MDA231' # sampchain = canopy.sample(R = R, X = X, WM = WM, Wm = Wm, epsilonM = epsilonM, # epsilonm = epsilonm, C = C, Y = Y, K = K, numchain = numchain, # max.simrun = 50000, min.simrun = 10000, writeskip = 200, # projectname = projectname, cell.line = TRUE, plot.likelihood = TRUE)
data(MDA231) R = MDA231$R; X = MDA231$X WM = MDA231$WM; Wm = MDA231$Wm epsilonM = MDA231$epsilonM; epsilonm = MDA231$epsilonm C = MDA231$C Y = MDA231$Y K = 3:6 numchain = 20 projectname = 'MDA231' # sampchain = canopy.sample(R = R, X = X, WM = WM, Wm = Wm, epsilonM = epsilonM, # epsilonm = epsilonm, C = C, Y = Y, K = K, numchain = numchain, # max.simrun = 50000, min.simrun = 10000, writeskip = 200, # projectname = projectname, cell.line = TRUE, plot.likelihood = TRUE)
To sample the posterior trees with pre-clustering step of SNAs. Major function of Canopy.
canopy.sample.cluster(R, X, sna_cluster, WM, Wm, epsilonM, epsilonm, C=NULL, Y, K, numchain, max.simrun, min.simrun, writeskip, projectname, cell.line=NULL, plot.likelihood=NULL)
canopy.sample.cluster(R, X, sna_cluster, WM, Wm, epsilonM, epsilonm, C=NULL, Y, K, numchain, max.simrun, min.simrun, writeskip, projectname, cell.line=NULL, plot.likelihood=NULL)
R |
alternative allele read depth matrix |
X |
total read depth matrix |
sna_cluster |
cluster assignment for each mutation from the EM Binomial clustering algorithm |
WM |
observed major copy number matrix |
Wm |
observed minor copy number matrix |
epsilonM |
observed standard deviation of major copy number (scalar input is transformed into matrix) |
epsilonm |
observed standard deviation of minor copy number (scalar input is transformed into matrix) |
C |
CNA and CNA-region overlapping matrix, only needed if overlapping CNAs are used as input |
Y |
SNA and CNA-region overlapping matrix |
K |
number of subclones (vector) |
numchain |
number of MCMC chains with random initiations |
max.simrun |
maximum number of simutation iterations for each chain |
min.simrun |
minimum number of simutation iterations for each chain |
writeskip |
interval to store sampled trees |
projectname |
name of project |
cell.line |
default to be FALSE, TRUE if input sample is cell line (no normal cell contamination) |
plot.likelihood |
default to be TRUE, posterior likelihood plot generated for check of
convergence and selection of burnin and thinning in
|
List of sampleed trees in subtree space with different number of subclones; plot of posterior likelihoods in each subtree space generated (pdf format).
Yuchao Jiang [email protected]
data(MDA231) R = MDA231$R; X = MDA231$X WM = MDA231$WM; Wm = MDA231$Wm epsilonM = MDA231$epsilonM; epsilonm = MDA231$epsilonm C = MDA231$C Y = MDA231$Y K = 3:6 numchain = 20 projectname = 'MDA231' # sampchain = canopy.sample.cluster(R = R, X = X, sna_cluster=c(1,2,3,4), # WM = WM, Wm = Wm, epsilonM = epsilonM, # epsilonm = epsilonm, C = C, Y = Y, K = K, numchain = numchain, # max.simrun = 50000, min.simrun = 10000, writeskip = 200, # projectname = projectname, cell.line = TRUE, plot.likelihood = TRUE)
data(MDA231) R = MDA231$R; X = MDA231$X WM = MDA231$WM; Wm = MDA231$Wm epsilonM = MDA231$epsilonM; epsilonm = MDA231$epsilonm C = MDA231$C Y = MDA231$Y K = 3:6 numchain = 20 projectname = 'MDA231' # sampchain = canopy.sample.cluster(R = R, X = X, sna_cluster=c(1,2,3,4), # WM = WM, Wm = Wm, epsilonM = epsilonM, # epsilonm = epsilonm, C = C, Y = Y, K = K, numchain = numchain, # max.simrun = 50000, min.simrun = 10000, writeskip = 200, # projectname = projectname, cell.line = TRUE, plot.likelihood = TRUE)
To sample the posterior trees with pre-clustering step of SNAs. Major function of Canopy.
canopy.sample.cluster.nocna(R, X, sna_cluster, K, numchain, max.simrun, min.simrun, writeskip, projectname, cell.line=NULL, plot.likelihood=NULL)
canopy.sample.cluster.nocna(R, X, sna_cluster, K, numchain, max.simrun, min.simrun, writeskip, projectname, cell.line=NULL, plot.likelihood=NULL)
R |
alternative allele read depth matrix |
X |
total read depth matrix |
sna_cluster |
cluster assignment for each mutation from the EM Binomial clustering algorithm |
K |
number of subclones (vector) |
numchain |
number of MCMC chains with random initiations |
max.simrun |
maximum number of simutation iterations for each chain |
min.simrun |
minimum number of simutation iterations for each chain |
writeskip |
interval to store sampled trees |
projectname |
name of project |
cell.line |
default to be FALSE, TRUE if input sample is cell line (no normal cell contamination) |
plot.likelihood |
default to be TRUE, posterior likelihood plot generated for check of
convergence and selection of burnin and thinning in
|
List of sampleed trees in subtree space with different number of subclones; plot of posterior likelihoods in each subtree space generated (pdf format).
Yuchao Jiang [email protected]
data(toy3) R = toy3$R; X = toy3$X sna_cluster = toy3$sna_cluster K = 3:5 numchain = 10 projectname = 'toy3' # sampchain = canopy.sample.cluster.nocna(R = R, X = X, # sna_cluster=sna_cluster, K = K, numchain = numchain, # max.simrun = 40000, min.simrun = 10000, writeskip = 200, # projectname = projectname, # cell.line = TRUE, plot.likelihood = TRUE)
data(toy3) R = toy3$R; X = toy3$X sna_cluster = toy3$sna_cluster K = 3:5 numchain = 10 projectname = 'toy3' # sampchain = canopy.sample.cluster.nocna(R = R, X = X, # sna_cluster=sna_cluster, K = K, numchain = numchain, # max.simrun = 40000, min.simrun = 10000, writeskip = 200, # projectname = projectname, # cell.line = TRUE, plot.likelihood = TRUE)
To sample the posterior trees without CNA input. Major function of Canopy.
canopy.sample.nocna(R, X, K, numchain, max.simrun, min.simrun, writeskip, projectname, cell.line=NULL, plot.likelihood=NULL)
canopy.sample.nocna(R, X, K, numchain, max.simrun, min.simrun, writeskip, projectname, cell.line=NULL, plot.likelihood=NULL)
R |
alternative allele read depth matrix |
X |
total read depth matrix |
K |
number of subclones (vector) |
numchain |
number of MCMC chains with random initiations |
max.simrun |
maximum number of simutation iterations for each chain |
min.simrun |
minimum number of simutation iterations for each chain |
writeskip |
interval to store sampled trees |
projectname |
name of project |
cell.line |
default to be FALSE, TRUE if input sample is cell line (no normal cell contamination) |
plot.likelihood |
default to be TRUE, posterior likelihood plot generated for check of
convergence and selection of burnin and thinning in
|
List of sampleed trees in subtree space with different number of subclones; plot of posterior likelihoods in each subtree space generated (pdf format).
Yuchao Jiang [email protected]
data(toy3) R = toy3$R; X = toy3$X K = 3:5 numchain = 10 projectname = 'toy3' # sampchain = canopy.sample.nocna(R = R, X = X, K = K, numchain = numchain, # max.simrun = 50000, min.simrun = 10000, writeskip = 200, # projectname = projectname, # cell.line = TRUE, plot.likelihood = TRUE)
data(toy3) R = toy3$R; X = toy3$X K = 3:5 numchain = 10 projectname = 'toy3' # sampchain = canopy.sample.nocna(R = R, X = X, K = K, numchain = numchain, # max.simrun = 50000, min.simrun = 10000, writeskip = 200, # projectname = projectname, # cell.line = TRUE, plot.likelihood = TRUE)
To get clonal composition (mutational profile of each clone) of tree. Used in
canopy.post
.
getclonalcomposition(tree)
getclonalcomposition(tree)
tree |
input tree |
List of each clone's mutational profile.
Yuchao Jiang [email protected]
data(MDA231_tree) getclonalcomposition(MDA231_tree)
data(MDA231_tree) getclonalcomposition(MDA231_tree)
To get major and minor copy per clone. Used in canopy.sample
.
getCMCm(tree, C)
getCMCm(tree, C)
tree |
input tree |
C |
CNA regions and CNA overlapping matrix |
CM |
Matrix of major copy per clone. |
Cm |
Matrix of minor copy per clone. |
Yuchao Jiang [email protected]
data(MDA231_tree) data(MDA231) C = MDA231$C getCMCm(MDA231_tree, C)
data(MDA231_tree) data(MDA231) C = MDA231$C getCMCm(MDA231_tree, C)
To get CNA genotyping matrix CZ from location of CNAs on the tree. Used in
canopy.sample
.
getCZ(tree)
getCZ(tree)
tree |
input tree |
CNA genotyping matrix CZ.
Yuchao Jiang [email protected]
data(MDA231_tree) getCZ(MDA231_tree)
data(MDA231_tree) getCZ(MDA231_tree)
To get likelihood of the tree given tree struture and data input. Used in
canopy.sample
.
getlikelihood(tree,R,X,WM,Wm,epsilonM,epsilonm)
getlikelihood(tree,R,X,WM,Wm,epsilonM,epsilonm)
tree |
input tree |
R |
alternative allele read depth matrix |
X |
total read depth matrix |
WM |
observed major copy number matrix |
Wm |
observed minor copy number matrix |
epsilonM |
observed standard deviation of major copy number (scalar input is transformed into matrix) |
epsilonm |
observed standard deviation of minor copy number (scalar input is transformed into matrix) |
Likelihood of sampled tree.
Yuchao Jiang [email protected]
data(MDA231) data(MDA231_tree) R = MDA231$R X = MDA231$X WM = MDA231$WM Wm = MDA231$Wm epsilonM = MDA231$epsilonM epsilonm = MDA231$epsilonm getlikelihood(MDA231_tree, R, X, WM, Wm, epsilonM, epsilonm)
data(MDA231) data(MDA231_tree) R = MDA231$R X = MDA231$X WM = MDA231$WM Wm = MDA231$Wm epsilonM = MDA231$epsilonM epsilonm = MDA231$epsilonm getlikelihood(MDA231_tree, R, X, WM, Wm, epsilonM, epsilonm)
To get SNA likelihood of the tree given tree struture and data input. Used in
canopy.sample.nocna
and canopy.sample.cluster.nocna
.
getlikelihood.sna(tree, R, X)
getlikelihood.sna(tree, R, X)
tree |
input tree |
R |
alternative allele read depth matrix |
X |
total read depth matrix |
Likelihood of sampled tree.
Yuchao Jiang [email protected]
data(MDA231) data(MDA231_tree) R = MDA231$R X = MDA231$X getlikelihood.sna(MDA231_tree, R, X)
data(MDA231) data(MDA231_tree) R = MDA231$R X = MDA231$X getlikelihood.sna(MDA231_tree, R, X)
To get SNA-CNA genotyping matrix , which specifies whether an SNA
precedes a CNA. Used in
canopy.sample
.
getQ(tree, Y, C)
getQ(tree, Y, C)
tree |
input tree |
Y |
SNA CNA overlapping matrix |
C |
CNA and CNA region overlapping matrix |
Genotyping matrix .
Yuchao Jiang [email protected]
data(MDA231_tree) data(MDA231) Y = MDA231$Y C = MDA231$C getQ(MDA231_tree, Y, C)
data(MDA231_tree) data(MDA231) Y = MDA231$Y C = MDA231$C getQ(MDA231_tree, Y, C)
To get variant allele frequency (VAF) matrix, which contains percentage of
mutant SNA alleles across samples. Used in canopy.sample
.
getVAF(tree,Y)
getVAF(tree,Y)
tree |
input tree |
Y |
SNA CNA overlapping matrix |
Variant allele frequency matrix VAF.
Yuchao Jiang [email protected]
data(MDA231_tree) data(MDA231) Y = MDA231$Y getVAF(MDA231_tree, Y)
data(MDA231_tree) data(MDA231) Y = MDA231$Y getVAF(MDA231_tree, Y)
To get SNA genotyping matrix from location of SNAs on the tree. Used in
canopy.sample
.
getZ(tree, sna.name)
getZ(tree, sna.name)
tree |
input tree |
sna.name |
vector of SNA names |
Genotyping matrix .
Yuchao Jiang [email protected]
data(MDA231_tree) data(MDA231) sna.name = rownames(MDA231$R) getZ(MDA231_tree, sna.name)
data(MDA231_tree) data(MDA231) sna.name = rownames(MDA231$R) getZ(MDA231_tree, sna.name)
To initialize positions of CNAs on the tree. Used in initialization step of
canopy.sample
.
initialcna(tree,cna.name)
initialcna(tree,cna.name)
tree |
input tree |
cna.name |
vector of input CNA names |
Matrix specifying positions of CNAs (start and end node).
Yuchao Jiang [email protected]
data(MDA231_tree) data(MDA231) cna.name = rownames(MDA231$WM) initialcna(MDA231_tree, cna.name)
data(MDA231_tree) data(MDA231) cna.name = rownames(MDA231$WM) initialcna(MDA231_tree, cna.name)
To initialize major and minor copies of CNAs. Used in initialization step of
canopy.sample
.
initialcnacopy(tree)
initialcnacopy(tree)
tree |
input tree |
Matrix specifying major and minor copies of CNAs.
Yuchao Jiang [email protected]
data(MDA231_tree) initialcnacopy(MDA231_tree)
data(MDA231_tree) initialcnacopy(MDA231_tree)
To initialize clonal frequency matris . Used in initialization step of
canopy.sample
.
initialP(tree,sampname,cell.line)
initialP(tree,sampname,cell.line)
tree |
input tree |
sampname |
vector of input sample names |
cell.line |
default to be FALSE, TRUE if input sample is cell line (no normal cell contamination) |
Clonal frequency matrix .
Yuchao Jiang [email protected]
data(MDA231_tree) data(MDA231) sampname = colnames(MDA231$R) initialP(MDA231_tree, sampname, cell.line = TRUE)
data(MDA231_tree) data(MDA231) sampname = colnames(MDA231$R) initialP(MDA231_tree, sampname, cell.line = TRUE)
To initialize positions of SNAs on the tree. Used in initialization step of
canopy.sample
.
initialsna(tree,sna.name)
initialsna(tree,sna.name)
tree |
input tree |
sna.name |
vector of input SNA names |
Matrix specifying positions of SNAs (start and end node).
Yuchao Jiang [email protected]
data(MDA231_tree) data(MDA231) sna.name = rownames(MDA231$R) initialsna(MDA231_tree, sna.name)
data(MDA231_tree) data(MDA231) sna.name = rownames(MDA231$R) initialsna(MDA231_tree, sna.name)
Pre-stored dataset for project MDA231. A transplantable metastasis model system was derived from a heterogeneous human breast cancer cell line MDA-MB-231. Cancer cells from the parental line MDA-MB-231 were engrafted into mouse hosts leading to organ-specific metastasis. Mixed cell populations (MCPs) were in vivo selected from either bone or lung metastasis and grew into phenotypically stable and metastatically competent cancer cell lines. The parental line as well as the MCP sublines were whole-exome sequenced with somatic SNAs and CNAs profiled.
data(MDA231)
data(MDA231)
List of input data for Canopy from project MDA231.
Yuchao Jiang [email protected]
data(MDA231)
data(MDA231)
List of sampleed trees in subtree space with different number of subclones for project MDA231.
data(MDA231_sampchain)
data(MDA231_sampchain)
List of sampled trees from different subtree space
Yuchao Jiang [email protected]
data(MDA231_sampchain)
data(MDA231_sampchain)
Most likely tree from project MDA231 as a tree example.
data(MDA231_tree)
data(MDA231_tree)
Most likely tree from project MDA231
Yuchao Jiang [email protected]
data(MDA231_tree)
data(MDA231_tree)
To sample CNA positions along the tree. Used in canopy.sample
.
sampcna(tree)
sampcna(tree)
tree |
input tree |
Newly sampled matrix specifying positions of CNAs (start and end node).
Yuchao Jiang [email protected]
data(MDA231_tree) sampcna(MDA231_tree)
data(MDA231_tree) sampcna(MDA231_tree)
To sample major and minor copies of CNAs. Used in canopy.sample
.
sampcnacopy(tree)
sampcnacopy(tree)
tree |
input tree |
Newly sampled matrix specifying major and minor copies of CNAs.
Yuchao Jiang [email protected]
data(MDA231_tree) sampcnacopy(MDA231_tree)
data(MDA231_tree) sampcnacopy(MDA231_tree)
To sample clonal frequency matrix . Used in
canopy.sample
.
sampP(tree, cell.line)
sampP(tree, cell.line)
tree |
input tree |
cell.line |
default to be FALSE, TRUE if input sample is cell line (no normal cell contamination) |
Newly sampled clonal frequency matrix .
Yuchao Jiang [email protected]
data(MDA231_tree) sampP(MDA231_tree, cell.line = TRUE)
data(MDA231_tree) sampP(MDA231_tree, cell.line = TRUE)
To sample SNA positions along the tree. Used in canopy.sample
.
sampsna(tree)
sampsna(tree)
tree |
input tree |
Newly sampled matrix specifying positions of SNAs (start and end node).
Yuchao Jiang [email protected]
data(MDA231_tree) sampsna(MDA231_tree)
data(MDA231_tree) sampsna(MDA231_tree)
To sample SNA cluster positions along the tree. Used in
canopy.sample.cluster
and canopy.sample.cluster.nocna
.
sampsna.cluster(tree)
sampsna.cluster(tree)
tree |
input tree |
Newly sampled matrix specifying positions of SNA clusters (start and end node).
Yuchao Jiang [email protected]
data(MDA231_tree) MDA231_tree$sna.cluster=initialsna(MDA231_tree,paste('cluster',1:4,sep='')) sampsna.cluster(MDA231_tree)
data(MDA231_tree) MDA231_tree$sna.cluster=initialsna(MDA231_tree,paste('cluster',1:4,sep='')) sampsna.cluster(MDA231_tree)
To sort identified overlapping CNAs by their major and minor copy numbers.
Used in canopy.post
.
sortcna(tree,C)
sortcna(tree,C)
tree |
input tree |
C |
CNA and CNA-region overlapping matrix |
Tree whose overlapping CNAs are sorted by major and minor copy numbers.
Yuchao Jiang [email protected]
data(MDA231_tree) data(MDA231) C = MDA231$C sortcna(MDA231_tree, C)
data(MDA231_tree) data(MDA231) C = MDA231$C sortcna(MDA231_tree, C)
Pre-stored simulated toy dataset.
data(toy)
data(toy)
List of simulated input data for Canopy.
Yuchao Jiang [email protected]
data(toy)
data(toy)
Pre-stored simulated toy dataset.
data(toy2)
data(toy2)
List of simulated input data for Canopy.
Yuchao Jiang [email protected]
data(toy2)
data(toy2)
Pre-stored simulated toy dataset. 200 simulated SNAs from a tree with 4 branches. No CNA events at play.
data(toy3)
data(toy3)
List of simulated SNA input data for Canopy.
Yuchao Jiang [email protected]
data(toy3)
data(toy3)