Skip to contents

Cluster peaks by spectral similarity.

Usage

cluster_spectra(
  peak_table,
  peak_no = NULL,
  alpha = 0.05,
  min_size = 5,
  max_size = NULL,
  nboot = 1000,
  plot_dend = TRUE,
  plot_spectra = TRUE,
  verbose = getOption("verbose"),
  save = FALSE,
  parallel = TRUE,
  max.only = FALSE,
  output = c("pvclust", "clusters"),
  ...
)

Arguments

peak_table

Peak table from get_peaktable.

peak_no

Minimum and maximum thresholds for the number of peaks a cluster may have. This argument is deprecated in favor of min_size and max_size.

alpha

Confidence threshold for inclusion of cluster.

min_size

Minimum number of peaks a cluster may have.

max_size

Maximum number of peaks a cluster may have.

nboot

Number of bootstrap replicates for pvclust.

plot_dend

Logical. If TRUE, plots dendrogram with bootstrap values.

plot_spectra

Logical. If TRUE, plots overlapping spectra for each cluster.

verbose

Logical. If TRUE, prints progress report to console.

save

Logical. If TRUE, saves pvclust object to current directory.

parallel

Logical. If TRUE, use parallel processing for pvclust.

max.only

Logical. If TRUE, returns only highest level for nested dendrograms.

output

What to return. Either clusters to return list of clusters, pvclust to return pvclust object, or both to return both items.

...

Additional arguments to pvclust.

Value

Returns clusters and/or pvclust object according to the value of the output argument.

  • If output = clusters, returns a list of S4 cluster objects.

  • If output = pvclust, returns a pvclust object.

  • If output = both, returns a nested list containing [[1]] the pvclust object, and [[2]] the list of S4 cluster objects.

The cluster objects consist of the following components:

  • peaks: a character vector containing the names of all peaks contained in the given cluster.

  • pval: a numeric vector of length 1 containing the bootstrap p-value (au) for the given cluster.

Details

Function to cluster peaks by spectral similarity. Before using this function, reference spectra must be attached to the peak_table using the attach_ref_spectra function. These reference spectra are then used to construct a distance matrix based on spectral similarity (pearson correlation) between peaks. Hierarchical clustering with bootstrap resampling is performed on the resulting correlation matrix to classify peaks by spectral similarity, as implemented in pvclust. Finally, bootstrap values can be used to select clusters that exceed a certain confidence threshold as defined by alpha.

Clusters can be filtered by the minimum and maximum size of the cluster using the min_size and max_size arguments respectively. If max_only is TRUE, only the largest cluster in a nested tree of clusters meeting the specified confidence threshold will be returned.

Note

  • Users should be aware that the clustering algorithm will often return nested clusters. Thus, an individual peak could appear in more than one cluster.

  • It is highly suggested to use more than 100 bootstraps if you run the clustering algorithm on real data even though we use nboot = 100 in the example to reduce runtime. The authors of pvclust suggest nboot = 10000.

References

R. Suzuki & H. Shimodaira. 2006. Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics, 22(12):1540-1542. doi:10.1093/bioinformatics/btl117 .

Author

Ethan Bass

Examples

# \donttest{
data(pk_tab)
data(Sa_warp)
pk_tab <- attach_ref_spectra(pk_tab, Sa_warp, ref = "max.int")
cl <- cluster_spectra(pk_tab, nboot = 100, max.only = FALSE, 
save = FALSE, alpha = 0.03)










# }