Analysis of Polistes cuticular hydrocarbons with GC-FID • chromatographR

library(chromatographR)
library(rdryad)
library(vegan)
#> Loading required package: permute
library(ggplot2)

Introduction

While the basic workflow is similar for analyzing different types of chromatographic data with chromatographR, there are a few important differences when analyzing GC-FID data (or other 1-dimensional data). This vignette highlights some important considerations when using chromatographR to analyze GC-FID data (or other 2D chromatography data).

To demonstrate the use of chromatographR for the analysis GC-FID data, we will analyze a dataset on cuticular hydrocarbons (CHCs) composition in four species of paper wasps (Polistes spp.) collected by Dr. Andrew Legan (Andrew W. Legan 2022; A. Legan et al. 2022). The dataset includes CHCs collected from Polistes dominula (european paper wasp), Polistes exclamans (common paper wasp), P. fuscatus (northern paper wasp), and P. metricus (metric paper wasp). As in other social insects, cuticular hydrocarbons serve as chemical cues mediating a number of social behaviors in paper wasps (Andrew W. Legan et al. 2021), including nestmate recognition (G. J. Gamboa, Reeve, and Pfennig 1986; George J. Gamboa et al. 1996; Bruschini et al. 2011), establishment of social hierarchies (Jandt, Tibbetts, and Toth 2014) and mate choice (Reed and Landolt 1990).

is_github_actions <- Sys.getenv("GITHUB_ACTIONS") == "true"

if (is_github_actions){
  # Load files from GitHub (https://github.com/ethanbass/10_5061_dryad_wpzgmsbr8/releases/tag/v1.0)
  tmp <- tempfile(fileext = ".zip")
  curl::curl_download("https://github.com/ethanbass/10_5061_dryad_wpzgmsbr8/archive/refs/tags/v1.0.zip", destfile = tmp)
  unzip(tmp, exdir = "./data/")
  files <- list.files("./data/10_5061_dryad_wpzgmsbr8-1.0", full.names = TRUE)
} else{
  # Load files from Dryad
  files <- rdryad::dryad_download("10.5061/dryad.wpzgmsbr8")[[1]]
}

chrom_paths <- grep("README|METADATA|ANNOTATED", files, invert = TRUE, 
                    value = TRUE)
dat <- read_chroms(chrom_paths, format_in = "shimadzu_fid")

path_to_metadata <- grep("METADATA", files, value = TRUE)
meta <- read.csv(path_to_metadata, sep = "\t")

In order to get a good alignment, it is helpful to remove the solvent peak. To accomplish this, we can use the preprocess function to remove the first four minutes of the chromatogram. We also reduce the time-axis resolution (from 40 ms to 300 ms) since it drastically reduces computation time while seemingly having little effect on the accuracy of the integration results.

dat.pr <- preprocess(dat, spec.smooth = FALSE, dim1 = seq(from = 4, to = 59.9, by = .005), # .005 minutes = 300 ms
                     cl = 1)
names(dat.pr) <- gsub(".txt", "", names(dat.pr))

We can then group the chromatograms by species for further analysis.

species <- c(dominula = "DOMINULA", exclamans.= "EXCLAMANS", 
             fuscatus = "FUSCATUS", metricus = "METRICUS")

species_idx <- lapply(species, function(sp){
  which(names(dat.pr) %in% meta[which(meta$POLISTES_SPECIES == sp),
                                     "SAMPLE_ID"])
})

We can extract the peaks from the alkane ladder using the get_peaks function and plot the integrated peak areas. To eliminate small peaks that are not part of the alkane ladder, we use the filter_peaks function to remove features that do not have an amplitude of at least 10^4 response units.

ladder <- grep("LADDER", names(dat.pr))
pks <- get_peaks(dat.pr[ladder], time.units = "s")
plot(pks, chrom_list = dat.pr[ladder])

pks_f <- filter_peaks(pks, min_height = 10000)

In this case, we also have integration results from ‘Shimadzu LabSolutions’, so we can check the integration results provided by chromatographR against the results from LabSolutions.

path_to_annotated_alkanes <- grep("ANNOTATED", files[[1]], value = TRUE)
alkanes <- read.csv(path_to_annotated_alkanes, sep="\t")

# check equality of retention times
# all.equal(alkanes$RT[-1],pks_f$ALKANE_LADDER$Intensity$rt)

par(mfrow=c(1,2))

plot(pks_f$ALKANE_LADDER$Intensity$area ~ alkanes$Area[-1], pch = 20,
     xlab = "Area (LabSolutions)", ylab = "Area (chromatographR)", 
     main = "Area")
m <- lm(pks_f$ALKANE_LADDER$Intensity$area ~ alkanes$Area[-1])
abline(m)
legend("bottomright", bty = "n", 
       legend = bquote(R^2 == .(format(summary(m)$adj.r.squared, digits = 4))))

plot(pks_f$ALKANE_LADDER$Intensity$height ~ alkanes$Height[-1], pch = 20,
     xlab = "Area (LabSolutions)", ylab = "Area (chromatographR)", 
     main = "Height")
m <- lm(pks_f$ALKANE_LADDER$Intensity$height ~ alkanes$Height[-1])
abline(m)
legend("bottomright", bty = "n", 
       legend = bquote(R^2 == .(format(summary(m)$adj.r.squared, digits = 4))))

Reassuringly, the peak areas and heights estimated by chromatographR are very similar to the results provided by LabSolutions.

Alignment of chromatograms

We can align chromatograms by species using variable dynamic time warping.

warp_dominula <- suppressWarnings(correct_rt(dat.pr[species_idx$dominula], 
                                            alg = "vpdtw", 
                                            what = "corrected.values", 
                                            plot_it = TRUE, verbose = TRUE, 
                                            penalty = 1, maxshift = 100))
#> Selected chromatogram 9 as best reference.


warp_metricus <- suppressWarnings(correct_rt(dat.pr[species_idx$metricus], 
                                            alg = "vpdtw", 
                                            what = "corrected.values", 
                                            plot_it = FALSE, verbose = FALSE, 
                                            penalty = 2, maxshift = 100))

warp_exclamans <- suppressWarnings(correct_rt(dat.pr[species_idx$exclamans],
                                             alg = "vpdtw",
                                             what = "corrected.values", 
                                             plot_it = FALSE, verbose = FALSE, 
                                             penalty = 2, maxshift = 200))

warp_fuscatus <- suppressWarnings(correct_rt(dat.pr[species_idx$fuscatus],
                            alg = "vpdtw", what = "corrected.values",
                            plot_it = FALSE, verbose = FALSE, 
                            penalty = 2, maxshift = 200))

warp_all <- suppressWarnings(correct_rt(dat.pr[-1], alg = "vpdtw",
                       what = "corrected.values", plot_it = FALSE,
                       verbose = FALSE, penalty = 2, maxshift = 200))

As a sanity check, we can compare the single species alignments for each species to the multi-species alignment of the corresponding chromatograms.

Utility function to format text in base R plots with italics.

format_italics <- function(str) {
  spl <- strsplit(str, "\\*")[[1]]
  non_empty <- spl[spl != ""]
  bquote(paste(italic(.(non_empty[1])), .(non_empty[2])))
}

par(mfrow=c(3,1))
plot_chroms_heatmap(dat.pr[species_idx$metricus], show_legend = FALSE, 
                    title = format_italics("*P. metricus* (raw data)"), xlim=c(25, 50))

plot_chroms_heatmap(warp_metricus, show_legend = FALSE, 
                    title = format_italics("*P. metricus* (single-species alignment)"),
                    xlim=c(25, 50))

plot_chroms_heatmap(warp_all[grep("PMET", names(warp_all))], show_legend = FALSE,
                    title = format_italics("*P. metricus* (multi-species alignment)"),
                    xlim=c(25, 50))

par(mfrow=c(3,1))
plot_chroms_heatmap(dat.pr[species_idx$fuscatus], show_legend = FALSE, 
                    title = format_italics("*P. fuscatus* (raw data)"),
                    xlim=c(20,45))

plot_chroms_heatmap(warp_fuscatus, show_legend = FALSE, 
                    title = format_italics("*P. fuscatus* (single-species alignment)"),
                    xlim=c(20,45))

plot_chroms_heatmap(warp_all[grep("PFUS", names(warp_all))], show_legend = FALSE,
                    title = format_italics("*P. fuscatus* (multi-species alignment)"),
                    xlim=c(20,45))

par(mfrow=c(3,1))
plot_chroms_heatmap(dat.pr[species_idx$dominula], show_legend = FALSE, 
                    title = format_italics("*P. dominula* (raw data)"),
                    xlim=c(25,45))

plot_chroms_heatmap(warp_dominula, show_legend = FALSE, 
                    title = format_italics("*P. dominula* (single-species alignment)"),
                    xlim=c(25,45))

plot_chroms_heatmap(warp_all[grep("PDOM", names(warp_all))], show_legend = FALSE,
                    title = format_italics("*P. dominula* (multi-species alignment)"),
                    xlim=c(25,45))

par(mfrow = c(3,1))
plot_chroms_heatmap(dat.pr[species_idx$exclamans], show_legend = FALSE, 
                    title = format_italics("*P. exclamans* (raw data)"),
                    xlim = c(30,40))

plot_chroms_heatmap(warp_exclamans, show_legend = FALSE, 
                    title = format_italics("*P. exclamans* (single-species alignment)"),
                    xlim = c(30,40))

plot_chroms_heatmap(warp_all[grep("PEXC", names(warp_all))], show_legend = FALSE,
                    title = format_italics("*P. exclamans* (multi-species alignment)"),
                    xlim = c(30,40))

In general, the results are quite similar between multi-species and single species alignments, though the single-species alignments are noticeably better in some areas. Since we want to compare CHCs across species, we will proceed with the imperfect multi-species alignments.

Integration and peaktable assembly

pks <- get_peaks(warp_all)
pktab <- get_peaktable(pks)
pktab <- attach_metadata(pktab, metadata = meta, column = "SAMPLE_ID")

Analysis of cuticular hydrocarbon composition

m <- vegan::adonis2(pktab$tab ~ POLISTES_SPECIES + STATE + SEX + LAT + LON,
               data = pktab$sample_meta, method = "manhattan",
               na.action = na.omit, by = "margin")
m
#> Permutation test for adonis under reduced model
#> Marginal effects of terms
#> Permutation: free
#> Number of permutations: 999
#> 
#> vegan::adonis2(formula = pktab$tab ~ POLISTES_SPECIES + STATE + SEX + LAT + LON, data = pktab$sample_meta, method = "manhattan", by = "margin", na.action = na.omit)
#>                   Df   SumOfSqs      R2      F Pr(>F)    
#> POLISTES_SPECIES   3 1.3830e+11 0.12124 8.7638  0.001 ***
#> STATE              7 1.3067e+11 0.11455 3.5489  0.001 ***
#> SEX                1 4.2418e+10 0.03719 8.0641  0.001 ***
#> LAT                1 9.3279e+09 0.00818 1.7733  0.095 .  
#> LON                1 1.5117e+10 0.01325 2.8740  0.013 *  
#> Residual         105 5.5231e+11 0.48418                  
#> Total            118 1.1407e+12 1.00000                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As one would expect, the permanova results shows that species is the largest contributor to the variance in CHC profiles (R²_marginal = 0.12), followed by state of origin (R²_marginal = 0.11) and sex (R²_marginal = 0.04).

Modified ggordiplot function with added shape argument.

# ggordiplot function (modified from https://github.com/jfq3/ggordiplots/blob/master/R/gg_ordiplot.R) with added shape argument

gg_ordiplot <- function (ord, groups, shape = NULL, scaling = 1, choices = c(1, 2), 
                         kind = c("sd", "se", "ehull"), conf = NULL, 
                         show.groups = "all", ellipse = TRUE, 
                         label = FALSE, hull = FALSE, spiders = FALSE, 
                         pt.size = 3, plot = TRUE) {
  x <- y <- cntr.x <- cntr.y <- Group <- NULL
  groups <- as.factor(groups)
  if (show.groups[1] == "all") {
    show.groups <- as.vector(levels(groups))
  }
  df_ord <- vegan::scores(ord, display = "sites", scaling = scaling, 
                          choices = choices)
  axis.labels <- ggordiplots::ord_labels(ord)[choices]
  df_ord <- data.frame(x = df_ord[, 1], y = df_ord[, 2], Group = groups)
  if (!is.null(shape)){
    df_ord <- data.frame(df_ord, shape = shape)
  }
  df_mean.ord <- aggregate(df_ord[, 1:2], by = list(df_ord$Group), 
                           mean)
  colnames(df_mean.ord) <- c("Group", "x", "y")
  df_mean.ord <- df_mean.ord[df_mean.ord$Group %in% show.groups, 
  ]
  if (is.null(conf)) {
    rslt <- vegan::ordiellipse(ord, groups = groups, display = "sites", 
                               scaling = scaling, choices = choices, 
                               kind = kind, show.groups = show.groups,
                               draw = "none", label = label)
  } else {
    rslt <- vegan::ordiellipse(ord, groups = groups, display = "sites", 
                               scaling = scaling, choices = choices, 
                               kind = kind, show.groups = show.groups,
                               draw = "none", conf = conf, 
                               label = label)
  }
  df_ellipse <- data.frame()
  for (g in show.groups) {
    df_ellipse <- rbind(df_ellipse, 
                        cbind(as.data.frame(with(df_ord[df_ord$Group == g, ],
                      vegan:::veganCovEllipse(rslt[[g]]$cov, rslt[[g]]$center,
                                              rslt[[g]]$scale))), Group = g))
  }
  colnames(df_ellipse) <- c("x", "y", "Group")
  df_ellipse <- df_ellipse[, c(3, 1, 2)]
  rslt.hull <- vegan::ordihull(ord, groups = groups, scaling = scaling, 
                               choices = choices, show.groups = show.groups,
                               draw = "none")
  df_hull <- data.frame()
  df_temp <- data.frame()
  for (g in show.groups) {
    x <- rslt.hull[[g]][, 1]
    y <- rslt.hull[[g]][, 2]
    Group <- rep(g, length(x))
    df_temp <- data.frame(Group = Group, x = x, y = y)
    df_hull <- rbind(df_hull, df_temp)
  }
  df_spiders <- df_ord
  df_spiders$cntr.x <- NA
  df_spiders$cntr.y <- NA
  for (g in show.groups) {
    df_spiders[which(df_spiders$Group == g), 4:5] <- 
      df_mean.ord[which(df_mean.ord == g), 2:3]
  }
  df_spiders <- df_spiders[, c(3, 4, 5, 1, 2)]
  df_spiders <- df_spiders[order(df_spiders$Group), ]
  df_spiders <- df_spiders[df_spiders$Group %in% show.groups, 
  ]
  xlab <- axis.labels[1]
  ylab <- axis.labels[2]
  plt <- ggplot2::ggplot() + ggplot2::geom_point(data = df_ord, ggplot2::aes(x = x, 
                                                           y = y, 
                                                           color = Group, 
                                                           shape = shape), 
                                        size = pt.size) + 
    ggplot2::xlab(xlab) + 
    ggplot2::ylab(ylab)
  if (ellipse) {
    plt <- plt + ggplot2::geom_path(data = df_ellipse, ggplot2::aes(x = x, 
                                                  y = y, color = Group),
                           show.legend = FALSE)
  }
  if (label) {
    plt <- plt + ggplot2::geom_text(data = df_mean.ord, 
                           ggplot2::aes(x = x, y = y, label = Group, color = Group),
                           show.legend = FALSE)
  }
  if (hull) {
    plt <- plt + geom_path(data = df_hull, 
                           ggplot2::aes(x = x, y = y, color = Group), 
                           show.legend = FALSE)
  }
  if (spiders) {
    plt <- plt + ggplot2::geom_segment(data = df_spiders, 
                              ggplot2::aes(x = cntr.x, xend = x, y = cntr.y, yend = y,
                                  color = Group), show.legend = FALSE)
  }
  plt <- plt + ggplot2::coord_fixed(ratio = 1)
  if (plot) {
    print(plt)
  }
  invisible(list(df_ord = df_ord, df_mean.ord = df_mean.ord, 
                 df_ellipse = df_ellipse, df_hull = df_hull,
                 df_spiders = df_spiders, 
                 plot = plt))
}

ord <- vegan::rda(pktab$tab, scale = TRUE)
pktab$sample_meta$SEX_SP <- interaction(pktab$sample_meta$SEX,
                                      pktab$sample_meta$POLISTES_SPECIES)

gg_ordiplot(ord, groups = pktab$sample_meta[,"POLISTES_SPECIES"],
            shape = pktab$sample_meta[,"SEX"], plot = FALSE)$plot +
  scale_shape_manual(values = c(19, 21), name = "Sex") +
  labs(colour = "Species") +
  theme_classic()

Principal componenets analysis shows decent separation between species, with P. dominula, P. fuscatus and P. dominula separating along PC2 and P. exclamans separating from the other species along PC1.

We can get even better separation if we break down our data by sex, though there is considerable intraspecific variation.

cond <- which(pktab$sample_meta$SEX == "M")
ord_m <- vegan::rda(pktab$tab[cond,], scale = TRUE)

p_male <- gg_ordiplot(ord_m, groups = pktab$sample_meta[cond, "POLISTES_SPECIES"], 
            plot = FALSE)$plot + 
  ggtitle("Males") +
  theme_classic() +
  theme(legend.position = "bottom", plot.title = element_text(hjust = 0.5)) +
  labs(colour = "Species")

cond <- which(pktab$sample_meta$SEX == "F")
ord_f <- vegan::rda(pktab$tab[cond,],scale=TRUE)

p_female <- gg_ordiplot(ord_f, groups = pktab$sample_meta[cond,"POLISTES_SPECIES"],
                         plot = FALSE)$plot + 
  ggtitle("Females") + 
  theme_classic() +
  theme(legend.position = "bottom", plot.title = element_text(hjust = 0.5)) +
  labs(colour = "Species")

p_male

p_female

References

Bruschini, Claudia, Rita Cervo, Alessandro Cini, Giuseppe Pieraccini, Luigi Pontieri, Lisa Signorotti, and Stefano Turillazzi. 2011. “Cuticular Hydrocarbons Rather Than Peptides Are Responsible for Nestmate Recognition in Polistes Dominulus.” Chemical Senses 36 (8): 715–23. https://doi.org/10.1093/chemse/bjr042.

Cook, Bruce. 2022. “Northern Paper Wasp (Polistes Fuscatus).” iNaturalist; iNaturalist. August 20, 2022. https://www.inaturalist.org/observations/131629083.

Drabik-Hamshare, Martyn. 2022. “Metric Paper Wasp (Polistes Metricus).” iNaturalist; iNaturalist. August 23, 2022. https://www.inaturalist.org/observations/132080545.

Gamboa, G. J., H. K. Reeve, and D. W. Pfennig. 1986. “The Evolution and Ontogeny of Nestmate Recognition in Social Wasps.” Annual Review of Entomology 31 (January): 431–54. https://doi.org/10.1146/annurev.en.31.010186.002243.

Gamboa, George J., Thaddeus A. Grudzien, Karl E. Espelie, and Elizabeth A. Bura. 1996. “Kin Recognition Pheromones in Social Wasps: Combining Chemical and Behavioural Evidence.” Animal Behaviour 51 (3): 625–29. https://doi.org/10.1006/anbe.1996.0067.

Jandt, J. M., E. A. Tibbetts, and A. L. Toth. 2014. “Polistes Paper Wasps: A Model Genus for the Study of Social Dominance Hierarchies.” Insectes Sociaux 61 (1): 11–27. https://doi.org/10.1007/s00040-013-0328-0.

Kranz, Adam. 2020. “European Paper Wasp (Polistes Dominula).” iNaturalist; iNaturalist. July 29, 2020. https://www.inaturalist.org/observations/54801686.

Legan, Andrew W. 2022. “Molecular and Chemical Basis of Social Olfaction in Polistes Paper Wasps.” https://doi.org/10.7298/CZ0P-6J74.

Legan, Andrew W, Christopher M Jernigan, Sara E Miller, Matthieu F Fuchs, and Michael J Sheehan. 2021. “Expansion and Accelerated Evolution of 9-Exon Odorant Receptors in Polistes Paper Wasps.” Molecular Biology and Evolution 38 (9): 3832–46. https://doi.org/10.1093/molbev/msab023.

Legan, Andrew, Arthur Chen, Matthieu Fuchs, and Michael Sheehan. 2022. “119 Chromatograms (GC-FID) of the Cuticular Hydrocarbons of Four Polistes Paper Wasp Species (Hymenoptera: Vespidae: Polistinae).” Dryad. https://doi.org/10.5061/DRYAD.WPZGMSBR8.

Reed, H. C., and P. J. Landolt. 1990. “Sex Attraction in Paper Wasp,Polistes Exclamans Viereck (Hymenoptera: Vespidae), in a Wind Tunnel.” Journal of Chemical Ecology 16 (4): 1277–87. https://doi.org/10.1007/BF01021026.

Session Information

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_3.5.2        vegan_2.7-1          permute_0.9-8       
#> [4] rdryad_1.0.0         chromatographR_0.7.4
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6          xfun_0.52             caTools_1.18.3       
#>  [4] lattice_0.22-7        vctrs_0.6.5           tools_4.5.1          
#>  [7] bitops_1.0-9          generics_0.1.4        curl_6.4.0           
#> [10] parallel_4.5.1        tibble_3.3.0          cluster_2.1.8.1      
#> [13] pkgconfig_2.0.3       Matrix_1.7-3          data.table_1.17.8    
#> [16] RColorBrewer_1.1-3    readxl_1.4.5          lifecycle_1.0.4      
#> [19] compiler_4.5.1        farver_2.1.2          stringr_1.5.1        
#> [22] ptw_1.9-16            RcppDE_0.1.8          minpack.lm_1.2-4     
#> [25] htmltools_0.5.8.1     yaml_2.3.10           Formula_1.2-5        
#> [28] pillar_1.11.0         MASS_7.3-65           RaMS_1.4.3           
#> [31] nlme_3.1-168          mime_0.13             tidyselect_1.2.1     
#> [34] zip_2.3.3             digest_0.6.37         stringi_1.8.7        
#> [37] dplyr_1.1.4           purrr_1.1.0           labeling_0.4.3       
#> [40] VPdtw_2.2.1           splines_4.5.1         rprojroot_2.1.0      
#> [43] fastmap_1.2.0         grid_4.5.1            here_1.0.1           
#> [46] cli_3.6.5             chromConverter_0.7.5  magrittr_2.0.3       
#> [49] base64enc_0.1-3       crul_1.6.0            dynamicTreeCut_1.63-1
#> [52] withr_3.0.2           ggordiplots_0.4.3     scales_1.4.0         
#> [55] rappdirs_0.3.3        bit64_4.6.0-1         rmarkdown_2.29       
#> [58] bit_4.6.0             reticulate_1.43.0     cellranger_1.1.0     
#> [61] fastcluster_1.3.0     png_0.1-8             pbapply_1.7-4        
#> [64] evaluate_1.0.4        knitr_1.50            hoardr_0.5.5         
#> [67] mgcv_1.9-3            rlang_1.1.6           Rcpp_1.1.0           
#> [70] glue_1.8.0            httpcode_0.3.0        xml2_1.3.8           
#> [73] jsonlite_2.0.0        R6_2.6.1