Introduction
While the basic workflow is similar for analyzing different types of chromatographic data with chromatographR
, there are a few important differences when analyzing GC-FID data (or other 1-dimensional data). This vignette highlights some important considerations when using chromatographR to analyze GC-FID data (or other 2D chromatography data).
To demonstrate the use of chromatographR
for the analysis GC-FID data, we will analyze a dataset on cuticular hydrocarbons (CHCs) composition in four species of paper wasps (Polistes spp.) collected by Dr. Andrew Legan (Andrew W. Legan 2022; A. Legan et al. 2022). The dataset includes CHCs collected from Polistes dominula (european paper wasp), Polistes exclamans (common paper wasp), P. fuscatus (northern paper wasp), and P. metricus (metric paper wasp). As in other social insects, cuticular hydrocarbons serve as chemical cues mediating a number of social behaviors in paper wasps (Andrew W. Legan et al. 2021), including nestmate recognition (G. J. Gamboa, Reeve, and Pfennig 1986; George J. Gamboa et al. 1996; Bruschini et al. 2011), establishment of social hierarchies (Jandt, Tibbetts, and Toth 2014) and mate choice (Reed and Landolt 1990).
# load chromatograms from Legan et al 2022
files <- rdryad::dryad_download("10.5061/dryad.wpzgmsbr8")[[1]]
chrom_paths <- grep("README|METADATA|ANNOTATED", files, invert = TRUE,
value = TRUE)
dat <- read_chroms(chrom_paths, format_in = "shimadzu_fid")
# load metadata
path_to_metadata <- grep("METADATA", files, value = TRUE)
meta <- read.csv(path_to_metadata, sep = "\t")
In order to get a good alignment, it is helpful to remove the solvent peak. To accomplish this, we can use the preprocess
function to remove the first four minutes of the chromatogram. We also reduce the time-axis resolution to .005 minutes since it drastically reduces computation time while seemingly having little effect on the accuracy of the integration results.
dat.pr <- preprocess(dat, spec.smooth = FALSE, dim1 = seq(4, 59.9, .005), cl = 1)
We can then group the chromatograms by species for further analysis.
We can extract the peaks from the alkane ladder using the get_peaks
function and plot the integrated peak areas. To eliminate small peaks that are not part of the alkane ladder, we use the filter_peaks
function to remove features that do not have an amplitude of at least 10^4 response units.
ladder <- grep("LADDER", names(dat.pr))
pks <- get_peaks(dat.pr[ladder], time.units = "s")
plot(pks, chrom_list = dat.pr[ladder])
pks_f <- filter_peaks(pks, min_height = 10000)
In this case, we also have integration results from ‘Shimadzu LabSolutions’, so we can check the integration results provided by chromatographR
against the results from LabSolutions.
# check equality of retention times
# all.equal(alkanes$RT[-1],pks_f$ALKANE_LADDER$Intensity$rt)
par(mfrow=c(1,2))
plot(pks_f$ALKANE_LADDER$Intensity$area ~ alkanes$Area[-1], pch = 20,
xlab = "Area (LabSolutions)", ylab = "Area (chromatographR)",
main = "Area")
m <- lm(pks_f$ALKANE_LADDER$Intensity$area ~ alkanes$Area[-1])
abline(m)
legend("bottomright", bty = "n",
legend = bquote(R^2 == .(format(summary(m)$adj.r.squared, digits = 4))))
plot(pks_f$ALKANE_LADDER$Intensity$height ~ alkanes$Height[-1], pch = 20,
xlab = "Area (LabSolutions)", ylab = "Area (chromatographR)",
main = "Height")
m <- lm(pks_f$ALKANE_LADDER$Intensity$height ~ alkanes$Height[-1])
abline(m)
legend("bottomright", bty = "n",
legend = bquote(R^2 == .(format(summary(m)$adj.r.squared, digits = 4))))
Reassuringly, the peak areas and heights estimated by chromatographR are very similar to the results provided by LabSolutions.
Alignment of chromatograms
We can align chromatograms by species using variable dynamic time warping.
warp_dominula <- suppressWarnings(correct_rt(dat.pr[species_idx$dominula],
alg = "vpdtw",
what = "corrected.values",
plot_it = TRUE, verbose = TRUE,
penalty = 1, maxshift = 100))
#> Selected chromatogram 9 as best reference.
warp_metricus <- suppressWarnings(correct_rt(dat.pr[species_idx$metricus],
alg = "vpdtw",
what = "corrected.values",
plot_it = FALSE, verbose = FALSE,
penalty = 2, maxshift = 100))
warp_exclamans <- suppressWarnings(correct_rt(dat.pr[species_idx$exclamans],
alg = "vpdtw",
what = "corrected.values",
plot_it = FALSE, verbose = FALSE,
penalty = 2, maxshift = 200))
warp_fuscatus <- suppressWarnings(correct_rt(dat.pr[species_idx$fuscatus],
alg = "vpdtw", what = "corrected.values",
plot_it = FALSE, verbose = FALSE,
penalty = 2, maxshift = 200))
warp_all <- suppressWarnings(correct_rt(dat.pr[-1], alg = "vpdtw",
what = "corrected.values", plot_it = FALSE,
verbose = FALSE, penalty = 2, maxshift = 200))
As a sanity check, we can compare the single species alignments for each species to the multi-species alignment of the corresponding chromatograms.
par(mfrow=c(3,1))
plot_chroms(warp_all[grep("PMET",names(warp_all))],show_legend = FALSE)
legend("topright", expression(paste(italic("P. metricus"),
" (multi-species)")), bty = "n")
plot_chroms(warp_metricus, show_legend = FALSE)
legend("topright", expression(paste(italic("P. metricus"),
" (single-species)")), bty = "n")
plot_chroms(dat.pr[species_idx$metricus], show_legend = FALSE)
legend("topright", expression(paste(italic("P. metricus"),
" (raw)")), bty = "n")
par(mfrow=c(3,1))
plot_chroms(warp_all[grep("PFUS",names(warp_all))],show_legend = FALSE)
legend("topright", expression(paste(italic("P. fuscatus"), " (multi-species)")),
bty = "n")
plot_chroms(warp_fuscatus, show_legend = FALSE)
legend("topright", expression(paste(italic("P. fuscatus"), " (single-species)")),
bty = "n")
plot_chroms(dat.pr[species_idx$fuscatus], show_legend = FALSE)
legend("topright", expression(paste(italic("P. fuscatus"),
" (raw)")), bty = "n")
par(mfrow=c(3,1))
plot_chroms(warp_all[grep("PDOM", names(warp_all))],show_legend = FALSE)
legend("topright", expression(paste(italic("P. dominula"), " (multi-species)")), bty="n")
plot_chroms(warp_dominula, show_legend = FALSE)
legend("topright", expression(paste(italic("P. dominula"), " (single-species)")), bty="n")
plot_chroms(dat.pr[species_idx$dominula], show_legend = FALSE)
legend("topright", expression(paste(italic("P. dominula"),
" (raw)")), bty="n")
par(mfrow = c(3,1))
plot_chroms(warp_all[grep("PEXC", names(warp_all))],show_legend = FALSE)
legend("topright", expression(paste(italic("P. exclamans"), " (multi-species)")),
bty = "n")
plot_chroms(warp_exclamans, show_legend = FALSE)
legend("topright", expression(paste(italic("P. exclamans"), " (single-species)")),
bty = "n")
plot_chroms(dat.pr[species_idx$exclamans], show_legend = FALSE)
legend("topright", expression(paste(italic("P. exclamans"),
" (raw)")), bty = "n")
In general, the results are quite similar between multispecies and single species alignments.
Integration and peaktable assembly
pks <- get_peaks(warp_all)
pktab <- get_peaktable(pks)
#> Warning in cbind(lambda = rep(as.numeric(names(peak_list[[1]])[comp]), ncl), :
#> NAs introduced by coercion
pktab <- attach_metadata(pktab, metadata = meta, column = "SAMPLE_ID")
Analysis of peaktable
m <- vegan::adonis2(pktab$tab ~ POLISTES_SPECIES + STATE + SEX + LAT + LON,
data = pktab$sample_meta, method = "manhattan",
na.action = na.omit)
m
#> Permutation test for adonis under reduced model
#> Permutation: free
#> Number of permutations: 999
#>
#> vegan::adonis2(formula = pktab$tab ~ POLISTES_SPECIES + STATE + SEX + LAT + LON, data = pktab$sample_meta, method = "manhattan", na.action = na.omit)
#> Df SumOfSqs R2 F Pr(>F)
#> Model 13 5.8841e+11 0.51582 8.6048 0.001 ***
#> Residual 105 5.5231e+11 0.48418
#> Total 118 1.1407e+12 1.00000
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As one would expect, the permanova shows that species is the largest contributor to the variance in CHC profiles (R2 = 0.52), followed by state of origin (R2 = 0.48) and sex (R2 = 1).
# ggordiplot function (modified from https://github.com/jfq3/ggordiplots/blob/master/R/gg_ordiplot.R) with added shape argument
gg_ordiplot <- function (ord, groups, shape = NULL, scaling = 1, choices = c(1, 2),
kind = c("sd", "se", "ehull"), conf = NULL,
show.groups = "all", ellipse = TRUE,
label = FALSE, hull = FALSE, spiders = FALSE,
pt.size = 3, plot = TRUE) {
x <- y <- cntr.x <- cntr.y <- Group <- NULL
groups <- as.factor(groups)
if (show.groups[1] == "all") {
show.groups <- as.vector(levels(groups))
}
df_ord <- vegan::scores(ord, display = "sites", scaling = scaling,
choices = choices)
axis.labels <- ggordiplots::ord_labels(ord)[choices]
df_ord <- data.frame(x = df_ord[, 1], y = df_ord[, 2], Group = groups)
if (!is.null(shape)){
df_ord <- data.frame(df_ord, shape = shape)
}
df_mean.ord <- aggregate(df_ord[, 1:2], by = list(df_ord$Group),
mean)
colnames(df_mean.ord) <- c("Group", "x", "y")
df_mean.ord <- df_mean.ord[df_mean.ord$Group %in% show.groups,
]
if (is.null(conf)) {
rslt <- vegan::ordiellipse(ord, groups = groups, display = "sites",
scaling = scaling, choices = choices,
kind = kind, show.groups = show.groups,
draw = "none", label = label)
} else {
rslt <- vegan::ordiellipse(ord, groups = groups, display = "sites",
scaling = scaling, choices = choices,
kind = kind, show.groups = show.groups,
draw = "none", conf = conf,
label = label)
}
df_ellipse <- data.frame()
for (g in show.groups) {
df_ellipse <- rbind(df_ellipse,
cbind(as.data.frame(with(df_ord[df_ord$Group == g, ],
vegan:::veganCovEllipse(rslt[[g]]$cov, rslt[[g]]$center,
rslt[[g]]$scale))), Group = g))
}
colnames(df_ellipse) <- c("x", "y", "Group")
df_ellipse <- df_ellipse[, c(3, 1, 2)]
rslt.hull <- vegan::ordihull(ord, groups = groups, scaling = scaling,
choices = choices, show.groups = show.groups,
draw = "none")
df_hull <- data.frame()
df_temp <- data.frame()
for (g in show.groups) {
x <- rslt.hull[[g]][, 1]
y <- rslt.hull[[g]][, 2]
Group <- rep(g, length(x))
df_temp <- data.frame(Group = Group, x = x, y = y)
df_hull <- rbind(df_hull, df_temp)
}
df_spiders <- df_ord
df_spiders$cntr.x <- NA
df_spiders$cntr.y <- NA
for (g in show.groups) {
df_spiders[which(df_spiders$Group == g), 4:5] <-
df_mean.ord[which(df_mean.ord == g), 2:3]
}
df_spiders <- df_spiders[, c(3, 4, 5, 1, 2)]
df_spiders <- df_spiders[order(df_spiders$Group), ]
df_spiders <- df_spiders[df_spiders$Group %in% show.groups,
]
xlab <- axis.labels[1]
ylab <- axis.labels[2]
plt <- ggplot2::ggplot() + geom_point(data = df_ord, aes(x = x,
y = y,
color = Group,
shape = shape),
size = pt.size) +
xlab(xlab) +
ylab(ylab)
if (ellipse == TRUE) {
plt <- plt + geom_path(data = df_ellipse, aes(x = x,
y = y, color = Group),
show.legend = FALSE)
}
if (label == TRUE) {
plt <- plt + geom_text(data = df_mean.ord,
aes(x = x, y = y, label = Group, color = Group),
show.legend = FALSE)
}
if (hull == TRUE) {
plt <- plt + geom_path(data = df_hull,
aes(x = x, y = y, color = Group),
show.legend = FALSE)
}
if (spiders == TRUE) {
plt <- plt + geom_segment(data = df_spiders,
aes(x = cntr.x, xend = x, y = cntr.y, yend = y,
color = Group), show.legend = FALSE)
}
plt <- plt + coord_fixed(ratio = 1)
if (plot) {
print(plt)
}
invisible(list(df_ord = df_ord, df_mean.ord = df_mean.ord,
df_ellipse = df_ellipse, df_hull = df_hull,
df_spiders = df_spiders,
plot = plt))
}
ord <- vegan::rda(pktab$tab, scale = TRUE)
pktab$sample_meta$SEX_SP <- interaction(pktab$sample_meta$SEX,
pktab$sample_meta$POLISTES_SPECIES)
gg_ordiplot(ord, groups = pktab$sample_meta[,"POLISTES_SPECIES"],
shape = pktab$sample_meta[,"SEX"], plot = FALSE)$plot +
scale_shape_manual(values = c(19, 21), name = "Sex") +
labs(colour = "Species")
Principal componenets analysis shows decent separation between species, with P. dominula, P. fuscatus and P. dominula separating along PC2 and P. exclamans separating from the other species along PC1.
We can get even better separation if we break down our data by sex, though there is considerable intraspecific variation.
cond <- which(pktab$sample_meta$SEX == "M")
ord_m <- vegan::rda(pktab$tab[cond,], scale = TRUE)
gg_ordiplot(ord_m, groups = pktab$sample_meta[cond,"POLISTES_SPECIES"],
plot = FALSE)$plot +
ggtitle("Males") +
theme_classic() +
theme(legend.position="bottom", plot.title = element_text(hjust = 0.5)) +
labs(colour = "Species")
cond <- which(pktab$sample_meta$SEX == "F")
ord_f <- vegan::rda(pktab$tab[cond,],scale=TRUE)
gg_ordiplot(ord_f, groups = pktab$sample_meta[cond,"POLISTES_SPECIES"],
plot = FALSE)$plot +
ggtitle("Females") +
theme_classic() +
theme(legend.position="bottom", plot.title = element_text(hjust = 0.5)) +
labs(colour = "Species")
References
Session Information
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_3.5.1 vegan_2.6-8 lattice_0.22-6
#> [4] permute_0.9-7 rdryad_1.0.0 chromatographR_0.7.2
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 xfun_0.49 caTools_1.18.3
#> [4] vctrs_0.6.5 tools_4.4.2 bitops_1.0-9
#> [7] generics_0.1.3 curl_6.0.1 parallel_4.4.2
#> [10] tibble_3.2.1 fansi_1.0.6 cluster_2.1.6
#> [13] pkgconfig_2.0.3 Matrix_1.7-1 readxl_1.4.3
#> [16] lifecycle_1.0.4 farver_2.1.2 compiler_4.4.2
#> [19] stringr_1.5.1 ptw_1.9-16 munsell_0.5.1
#> [22] RcppDE_0.1.7 minpack.lm_1.2-4 htmltools_0.5.8.1
#> [25] yaml_2.3.10 Formula_1.2-5 pillar_1.9.0
#> [28] tidyr_1.3.1 MASS_7.3-61 nlme_3.1-166
#> [31] mime_0.12 tidyselect_1.2.1 zip_2.3.1
#> [34] digest_0.6.37 stringi_1.8.4 dplyr_1.1.4
#> [37] purrr_1.0.2 labeling_0.4.3 VPdtw_2.2.1
#> [40] splines_4.4.2 fastmap_1.2.0 grid_4.4.2
#> [43] colorspace_2.1-1 cli_3.6.3 chromConverter_0.2.1
#> [46] magrittr_2.0.3 triebeard_0.4.1 crul_1.5.0
#> [49] dynamicTreeCut_1.63-1 utf8_1.2.4 withr_3.0.2
#> [52] ggordiplots_0.4.3 scales_1.3.0 rappdirs_0.3.3
#> [55] rmarkdown_2.29 reticulate_1.40.0 cellranger_1.1.0
#> [58] fastcluster_1.2.6 png_0.1-8 pbapply_1.7-2
#> [61] evaluate_1.0.1 knitr_1.49 hoardr_0.5.4
#> [64] mgcv_1.9-1 urltools_1.7.3 rlang_1.1.4
#> [67] Rcpp_1.0.13-1 glue_1.8.0 httpcode_0.3.0
#> [70] xml2_1.3.6 jsonlite_1.8.9 R6_2.5.1