Skip to contents
library(chromatographR)
library(rdryad)
library(vegan)
#> Loading required package: permute
#> Loading required package: lattice
#> This is vegan 2.6-8
library(ggplot2)

Introduction

While the basic workflow is similar for analyzing different types of chromatographic data with chromatographR, there are a few important differences when analyzing GC-FID data (or other 1-dimensional data). This vignette highlights some important considerations when using chromatographR to analyze GC-FID data (or other 2D chromatography data).

To demonstrate the use of chromatographR for the analysis GC-FID data, we will analyze a dataset on cuticular hydrocarbons (CHCs) composition in four species of paper wasps (Polistes spp.) collected by Dr. Andrew Legan (Andrew W. Legan 2022; A. Legan et al. 2022). The dataset includes CHCs collected from Polistes dominula (european paper wasp), Polistes exclamans (common paper wasp), P. fuscatus (northern paper wasp), and P. metricus (metric paper wasp). As in other social insects, cuticular hydrocarbons serve as chemical cues mediating a number of social behaviors in paper wasps (Andrew W. Legan et al. 2021), including nestmate recognition (G. J. Gamboa, Reeve, and Pfennig 1986; George J. Gamboa et al. 1996; Bruschini et al. 2011), establishment of social hierarchies (Jandt, Tibbetts, and Toth 2014) and mate choice (Reed and Landolt 1990).

P. dominula (Kranz 2020)

P. exclamans (© Andrew Legan)

P. fuscatus (Cook 2022)

P. metricus (Drabik-Hamshare 2022)
Figure 1: Polistes species includes in this study.
# load chromatograms from Legan et al 2022
files <- rdryad::dryad_download("10.5061/dryad.wpzgmsbr8")[[1]]
chrom_paths <- grep("README|METADATA|ANNOTATED", files, invert = TRUE, 
                    value = TRUE)
dat <- read_chroms(chrom_paths, format_in = "shimadzu_fid")

# load metadata
path_to_metadata <- grep("METADATA", files, value = TRUE)
meta <- read.csv(path_to_metadata, sep = "\t")

In order to get a good alignment, it is helpful to remove the solvent peak. To accomplish this, we can use the preprocess function to remove the first four minutes of the chromatogram. We also reduce the time-axis resolution to .005 minutes since it drastically reduces computation time while seemingly having little effect on the accuracy of the integration results.

dat.pr <- preprocess(dat, spec.smooth = FALSE, dim1 = seq(4, 59.9, .005), cl = 1)

We can then group the chromatograms by species for further analysis.

species <- c(dominula = "DOMINULA", exclamans.= "EXCLAMANS", 
             fuscatus = "FUSCATUS", metricus = "METRICUS")

species_idx <- lapply(species, function(sp){
  which(names(dat.pr) %in% gsub(".txt", "",
                                meta[which(meta$POLISTES_SPECIES == sp),
                                     "SAMPLE_ID"]))
})

We can extract the peaks from the alkane ladder using the get_peaks function and plot the integrated peak areas. To eliminate small peaks that are not part of the alkane ladder, we use the filter_peaks function to remove features that do not have an amplitude of at least 10^4 response units.

ladder <- grep("LADDER", names(dat.pr))
pks <- get_peaks(dat.pr[ladder], time.units = "s")
plot(pks, chrom_list = dat.pr[ladder])

pks_f <- filter_peaks(pks, min_height = 10000)

In this case, we also have integration results from ‘Shimadzu LabSolutions’, so we can check the integration results provided by chromatographR against the results from LabSolutions.

path_to_annotated_alkanes <- grep("ANNOTATED", files[[1]], value = TRUE)
alkanes <- read.csv(path_to_annotated_alkanes, sep="\t")
# check equality of retention times
# all.equal(alkanes$RT[-1],pks_f$ALKANE_LADDER$Intensity$rt)

par(mfrow=c(1,2))

plot(pks_f$ALKANE_LADDER$Intensity$area ~ alkanes$Area[-1], pch = 20,
     xlab = "Area (LabSolutions)", ylab = "Area (chromatographR)", 
     main = "Area")
m <- lm(pks_f$ALKANE_LADDER$Intensity$area ~ alkanes$Area[-1])
abline(m)
legend("bottomright", bty = "n", 
       legend = bquote(R^2 == .(format(summary(m)$adj.r.squared, digits = 4))))

plot(pks_f$ALKANE_LADDER$Intensity$height ~ alkanes$Height[-1], pch = 20,
     xlab = "Area (LabSolutions)", ylab = "Area (chromatographR)", 
     main = "Height")
m <- lm(pks_f$ALKANE_LADDER$Intensity$height ~ alkanes$Height[-1])
abline(m)
legend("bottomright", bty = "n", 
       legend = bquote(R^2 == .(format(summary(m)$adj.r.squared, digits = 4))))

Reassuringly, the peak areas and heights estimated by chromatographR are very similar to the results provided by LabSolutions.

Alignment of chromatograms

We can align chromatograms by species using variable dynamic time warping.

warp_dominula <- suppressWarnings(correct_rt(dat.pr[species_idx$dominula], 
                                            alg = "vpdtw", 
                                            what = "corrected.values", 
                                            plot_it = TRUE, verbose = TRUE, 
                                            penalty = 1, maxshift = 100))
#> Selected chromatogram 9 as best reference.


warp_metricus <- suppressWarnings(correct_rt(dat.pr[species_idx$metricus], 
                                            alg = "vpdtw", 
                                            what = "corrected.values", 
                                            plot_it = FALSE, verbose = FALSE, 
                                            penalty = 2, maxshift = 100))

warp_exclamans <- suppressWarnings(correct_rt(dat.pr[species_idx$exclamans],
                                             alg = "vpdtw",
                                             what = "corrected.values", 
                                             plot_it = FALSE, verbose = FALSE, 
                                             penalty = 2, maxshift = 200))

warp_fuscatus <- suppressWarnings(correct_rt(dat.pr[species_idx$fuscatus],
                            alg = "vpdtw", what = "corrected.values",
                            plot_it = FALSE, verbose = FALSE, 
                            penalty = 2, maxshift = 200))
warp_all <- suppressWarnings(correct_rt(dat.pr[-1], alg = "vpdtw",
                       what = "corrected.values", plot_it = FALSE,
                       verbose = FALSE, penalty = 2, maxshift = 200))

As a sanity check, we can compare the single species alignments for each species to the multi-species alignment of the corresponding chromatograms.

par(mfrow=c(3,1))
plot_chroms(warp_all[grep("PMET",names(warp_all))],show_legend = FALSE)
legend("topright", expression(paste(italic("P. metricus"), 
                                    " (multi-species)")), bty = "n")

plot_chroms(warp_metricus, show_legend = FALSE)
legend("topright", expression(paste(italic("P. metricus"), 
                                    " (single-species)")), bty = "n")

plot_chroms(dat.pr[species_idx$metricus], show_legend = FALSE)
legend("topright", expression(paste(italic("P. metricus"), 
                                    " (raw)")), bty = "n")

par(mfrow=c(3,1))
plot_chroms(warp_all[grep("PFUS",names(warp_all))],show_legend = FALSE)
legend("topright", expression(paste(italic("P. fuscatus"), " (multi-species)")),
       bty = "n")

plot_chroms(warp_fuscatus, show_legend = FALSE)
legend("topright", expression(paste(italic("P. fuscatus"), " (single-species)")),
       bty = "n")

plot_chroms(dat.pr[species_idx$fuscatus], show_legend = FALSE)
legend("topright", expression(paste(italic("P. fuscatus"), 
                                    " (raw)")), bty = "n")

par(mfrow=c(3,1))
plot_chroms(warp_all[grep("PDOM", names(warp_all))],show_legend = FALSE)
legend("topright", expression(paste(italic("P. dominula"), " (multi-species)")), bty="n")

plot_chroms(warp_dominula, show_legend = FALSE)
legend("topright", expression(paste(italic("P. dominula"), " (single-species)")), bty="n")

plot_chroms(dat.pr[species_idx$dominula], show_legend = FALSE)
legend("topright", expression(paste(italic("P. dominula"), 
                                    " (raw)")), bty="n")

par(mfrow = c(3,1))
plot_chroms(warp_all[grep("PEXC", names(warp_all))],show_legend = FALSE)
legend("topright", expression(paste(italic("P. exclamans"), " (multi-species)")),
       bty = "n")

plot_chroms(warp_exclamans, show_legend = FALSE)
legend("topright", expression(paste(italic("P. exclamans"), " (single-species)")),
       bty = "n")

plot_chroms(dat.pr[species_idx$exclamans], show_legend = FALSE)
legend("topright", expression(paste(italic("P. exclamans"), 
                                    " (raw)")), bty = "n")

In general, the results are quite similar between multispecies and single species alignments.

Integration and peaktable assembly

pks <- get_peaks(warp_all)
pktab <- get_peaktable(pks)
#> Warning in cbind(lambda = rep(as.numeric(names(peak_list[[1]])[comp]), ncl), :
#> NAs introduced by coercion
pktab <- attach_metadata(pktab, metadata = meta, column = "SAMPLE_ID")

Analysis of peaktable

m <- vegan::adonis2(pktab$tab ~ POLISTES_SPECIES + STATE + SEX + LAT + LON,
               data = pktab$sample_meta, method = "manhattan",
               na.action = na.omit)
m
#> Permutation test for adonis under reduced model
#> Permutation: free
#> Number of permutations: 999
#> 
#> vegan::adonis2(formula = pktab$tab ~ POLISTES_SPECIES + STATE + SEX + LAT + LON, data = pktab$sample_meta, method = "manhattan", na.action = na.omit)
#>           Df   SumOfSqs      R2      F Pr(>F)    
#> Model     13 5.8841e+11 0.51582 8.6048  0.001 ***
#> Residual 105 5.5231e+11 0.48418                  
#> Total    118 1.1407e+12 1.00000                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As one would expect, the permanova shows that species is the largest contributor to the variance in CHC profiles (R2 = 0.52), followed by state of origin (R2 = 0.48) and sex (R2 = 1).

# ggordiplot function (modified from https://github.com/jfq3/ggordiplots/blob/master/R/gg_ordiplot.R) with added shape argument
gg_ordiplot <- function (ord, groups, shape = NULL, scaling = 1, choices = c(1, 2), 
                         kind = c("sd", "se", "ehull"), conf = NULL, 
                         show.groups = "all", ellipse = TRUE, 
                         label = FALSE, hull = FALSE, spiders = FALSE, 
                         pt.size = 3, plot = TRUE) {
  x <- y <- cntr.x <- cntr.y <- Group <- NULL
  groups <- as.factor(groups)
  if (show.groups[1] == "all") {
    show.groups <- as.vector(levels(groups))
  }
  df_ord <- vegan::scores(ord, display = "sites", scaling = scaling, 
                          choices = choices)
  axis.labels <- ggordiplots::ord_labels(ord)[choices]
  df_ord <- data.frame(x = df_ord[, 1], y = df_ord[, 2], Group = groups)
  if (!is.null(shape)){
    df_ord <- data.frame(df_ord, shape = shape)
  }
  df_mean.ord <- aggregate(df_ord[, 1:2], by = list(df_ord$Group), 
                           mean)
  colnames(df_mean.ord) <- c("Group", "x", "y")
  df_mean.ord <- df_mean.ord[df_mean.ord$Group %in% show.groups, 
  ]
  if (is.null(conf)) {
    rslt <- vegan::ordiellipse(ord, groups = groups, display = "sites", 
                               scaling = scaling, choices = choices, 
                               kind = kind, show.groups = show.groups,
                               draw = "none", label = label)
  } else {
    rslt <- vegan::ordiellipse(ord, groups = groups, display = "sites", 
                               scaling = scaling, choices = choices, 
                               kind = kind, show.groups = show.groups,
                               draw = "none", conf = conf, 
                               label = label)
  }
  df_ellipse <- data.frame()
  for (g in show.groups) {
    df_ellipse <- rbind(df_ellipse, 
                        cbind(as.data.frame(with(df_ord[df_ord$Group == g, ],
                      vegan:::veganCovEllipse(rslt[[g]]$cov, rslt[[g]]$center,
                                              rslt[[g]]$scale))), Group = g))
  }
  colnames(df_ellipse) <- c("x", "y", "Group")
  df_ellipse <- df_ellipse[, c(3, 1, 2)]
  rslt.hull <- vegan::ordihull(ord, groups = groups, scaling = scaling, 
                               choices = choices, show.groups = show.groups,
                               draw = "none")
  df_hull <- data.frame()
  df_temp <- data.frame()
  for (g in show.groups) {
    x <- rslt.hull[[g]][, 1]
    y <- rslt.hull[[g]][, 2]
    Group <- rep(g, length(x))
    df_temp <- data.frame(Group = Group, x = x, y = y)
    df_hull <- rbind(df_hull, df_temp)
  }
  df_spiders <- df_ord
  df_spiders$cntr.x <- NA
  df_spiders$cntr.y <- NA
  for (g in show.groups) {
    df_spiders[which(df_spiders$Group == g), 4:5] <- 
      df_mean.ord[which(df_mean.ord == g), 2:3]
  }
  df_spiders <- df_spiders[, c(3, 4, 5, 1, 2)]
  df_spiders <- df_spiders[order(df_spiders$Group), ]
  df_spiders <- df_spiders[df_spiders$Group %in% show.groups, 
  ]
  xlab <- axis.labels[1]
  ylab <- axis.labels[2]
  plt <- ggplot2::ggplot() + geom_point(data = df_ord, aes(x = x, 
                                                           y = y, 
                                                           color = Group, 
                                                           shape = shape), 
                                        size = pt.size) + 
    xlab(xlab) + 
    ylab(ylab)
  if (ellipse == TRUE) {
    plt <- plt + geom_path(data = df_ellipse, aes(x = x, 
                                                  y = y, color = Group),
                           show.legend = FALSE)
  }
  if (label == TRUE) {
    plt <- plt + geom_text(data = df_mean.ord, 
                           aes(x = x, y = y, label = Group, color = Group),
                           show.legend = FALSE)
  }
  if (hull == TRUE) {
    plt <- plt + geom_path(data = df_hull, 
                           aes(x = x, y = y, color = Group), 
                           show.legend = FALSE)
  }
  if (spiders == TRUE) {
    plt <- plt + geom_segment(data = df_spiders, 
                              aes(x = cntr.x, xend = x, y = cntr.y, yend = y,
                                  color = Group), show.legend = FALSE)
  }
  plt <- plt + coord_fixed(ratio = 1)
  if (plot) {
    print(plt)
  }
  invisible(list(df_ord = df_ord, df_mean.ord = df_mean.ord, 
                 df_ellipse = df_ellipse, df_hull = df_hull,
                 df_spiders = df_spiders, 
                 plot = plt))
}
ord <- vegan::rda(pktab$tab, scale = TRUE)
pktab$sample_meta$SEX_SP <- interaction(pktab$sample_meta$SEX,
                                      pktab$sample_meta$POLISTES_SPECIES)

gg_ordiplot(ord, groups = pktab$sample_meta[,"POLISTES_SPECIES"],
            shape = pktab$sample_meta[,"SEX"], plot = FALSE)$plot +
  scale_shape_manual(values = c(19, 21), name = "Sex") +
  labs(colour = "Species")

Principal componenets analysis shows decent separation between species, with P. dominula, P. fuscatus and P. dominula separating along PC2 and P. exclamans separating from the other species along PC1.

We can get even better separation if we break down our data by sex, though there is considerable intraspecific variation.

cond <- which(pktab$sample_meta$SEX == "M")
ord_m <- vegan::rda(pktab$tab[cond,], scale = TRUE)

gg_ordiplot(ord_m, groups = pktab$sample_meta[cond,"POLISTES_SPECIES"], 
            plot = FALSE)$plot + 
  ggtitle("Males") +
  theme_classic() +
  theme(legend.position="bottom", plot.title = element_text(hjust = 0.5)) +
  labs(colour = "Species")


cond <- which(pktab$sample_meta$SEX == "F")
ord_f <- vegan::rda(pktab$tab[cond,],scale=TRUE)

gg_ordiplot(ord_f, groups = pktab$sample_meta[cond,"POLISTES_SPECIES"],
                         plot = FALSE)$plot + 
  ggtitle("Females") + 
  theme_classic() +
  theme(legend.position="bottom", plot.title = element_text(hjust = 0.5)) +
  labs(colour = "Species")

References

Bruschini, Claudia, Rita Cervo, Alessandro Cini, Giuseppe Pieraccini, Luigi Pontieri, Lisa Signorotti, and Stefano Turillazzi. 2011. “Cuticular Hydrocarbons Rather Than Peptides Are Responsible for Nestmate Recognition in Polistes Dominulus.” Chemical Senses 36 (8): 715–23. https://doi.org/10.1093/chemse/bjr042.
Cook, Bruce. 2022. “Northern Paper Wasp (Polistes Fuscatus).” iNaturalist; iNaturalist. August 20, 2022. https://www.inaturalist.org/observations/131629083.
Drabik-Hamshare, Martyn. 2022. “Metric Paper Wasp (Polistes Metricus).” iNaturalist; iNaturalist. August 23, 2022. https://www.inaturalist.org/observations/132080545.
Gamboa, G. J., H. K. Reeve, and D. W. Pfennig. 1986. “The Evolution and Ontogeny of Nestmate Recognition in Social Wasps.” Annual Review of Entomology 31 (January): 431–54. https://doi.org/10.1146/annurev.en.31.010186.002243.
Gamboa, George J., Thaddeus A. Grudzien, Karl E. Espelie, and Elizabeth A. Bura. 1996. “Kin Recognition Pheromones in Social Wasps: Combining Chemical and Behavioural Evidence.” Animal Behaviour 51 (3): 625–29. https://doi.org/10.1006/anbe.1996.0067.
Jandt, J. M., E. A. Tibbetts, and A. L. Toth. 2014. “Polistes Paper Wasps: A Model Genus for the Study of Social Dominance Hierarchies.” Insectes Sociaux 61 (1): 11–27. https://doi.org/10.1007/s00040-013-0328-0.
Kranz, Adam. 2020. “European Paper Wasp (Polistes Dominula).” iNaturalist; iNaturalist. July 29, 2020. https://www.inaturalist.org/observations/54801686.
Legan, Andrew W. 2022. “Molecular and Chemical Basis of Social Olfaction in Polistes Paper Wasps.” https://doi.org/10.7298/CZ0P-6J74.
Legan, Andrew W, Christopher M Jernigan, Sara E Miller, Matthieu F Fuchs, and Michael J Sheehan. 2021. “Expansion and Accelerated Evolution of 9-Exon Odorant Receptors in Polistes Paper Wasps.” Molecular Biology and Evolution 38 (9): 3832–46. https://doi.org/10.1093/molbev/msab023.
Legan, Andrew, Arthur Chen, Matthieu Fuchs, and Michael Sheehan. 2022. “119 Chromatograms (GC-FID) of the Cuticular Hydrocarbons of Four Polistes Paper Wasp Species (Hymenoptera: Vespidae: Polistinae).” Dryad. https://doi.org/10.5061/DRYAD.WPZGMSBR8.
Reed, H. C., and P. J. Landolt. 1990. “Sex Attraction in Paper Wasp,Polistes Exclamans Viereck (Hymenoptera: Vespidae), in a Wind Tunnel.” Journal of Chemical Ecology 16 (4): 1277–87. https://doi.org/10.1007/BF01021026.

Session Information

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_3.5.1        vegan_2.6-8          lattice_0.22-6      
#> [4] permute_0.9-7        rdryad_1.0.0         chromatographR_0.7.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6          xfun_0.49             caTools_1.18.3       
#>  [4] vctrs_0.6.5           tools_4.4.2           bitops_1.0-9         
#>  [7] generics_0.1.3        curl_6.0.1            parallel_4.4.2       
#> [10] tibble_3.2.1          fansi_1.0.6           cluster_2.1.6        
#> [13] pkgconfig_2.0.3       Matrix_1.7-1          readxl_1.4.3         
#> [16] lifecycle_1.0.4       farver_2.1.2          compiler_4.4.2       
#> [19] stringr_1.5.1         ptw_1.9-16            munsell_0.5.1        
#> [22] RcppDE_0.1.7          minpack.lm_1.2-4      htmltools_0.5.8.1    
#> [25] yaml_2.3.10           Formula_1.2-5         pillar_1.9.0         
#> [28] tidyr_1.3.1           MASS_7.3-61           nlme_3.1-166         
#> [31] mime_0.12             tidyselect_1.2.1      zip_2.3.1            
#> [34] digest_0.6.37         stringi_1.8.4         dplyr_1.1.4          
#> [37] purrr_1.0.2           labeling_0.4.3        VPdtw_2.2.1          
#> [40] splines_4.4.2         fastmap_1.2.0         grid_4.4.2           
#> [43] colorspace_2.1-1      cli_3.6.3             chromConverter_0.2.1 
#> [46] magrittr_2.0.3        triebeard_0.4.1       crul_1.5.0           
#> [49] dynamicTreeCut_1.63-1 utf8_1.2.4            withr_3.0.2          
#> [52] ggordiplots_0.4.3     scales_1.3.0          rappdirs_0.3.3       
#> [55] rmarkdown_2.29        reticulate_1.40.0     cellranger_1.1.0     
#> [58] fastcluster_1.2.6     png_0.1-8             pbapply_1.7-2        
#> [61] evaluate_1.0.1        knitr_1.49            hoardr_0.5.4         
#> [64] mgcv_1.9-1            urltools_1.7.3        rlang_1.1.4          
#> [67] Rcpp_1.0.13-1         glue_1.8.0            httpcode_0.3.0       
#> [70] xml2_1.3.6            jsonlite_1.8.9        R6_2.5.1