Using Bioconductor with High Throughput Assays

Bioconductor includes packages for analysis of diverse areas of high-throughput assays such as flow cytometry, quantitative real-time PCR, mass spectrometry, proteomics and other cell-based data.

1 Sample Workflow

The following psuedo-code illustrates a typical R / Bioconductor session. It makes use of the flow cytometry packages to load, transform and visualize the flow data and gate certain populations in the dataset.

The workflow loads the flowCore, flowStats and flowViz packages and its dependencies. It loads the ITN data with 15 samples, each of which includes, in addition to FSC and SSC, 5 fluorescence channels: CD3, CD4, CD8, CD69 and HLADR.

## Load packages
library(flowCore)
library(flowStats)
library(flowViz) # for flow data visualization

## Load data
data(ITN)
ITN
## A flowSet with 15 experiments.
## 
## An object of class 'AnnotatedDataFrame'
##   rowNames: sample01 sample02 ... sample15 (15 total)
##   varLabels: GroupID SiteCode ... name (7 total)
##   varMetadata: labelDescription
## 
##   column names:
##   FSC SSC CD8 CD69 CD4 CD3 HLADr Time

First, we need to transform all the fluorescence channels. Using a workFlow object can help to keep track of our progress.

## Create a workflow instance and transform data using asinh
wf <- workFlow(ITN)
## Warning: 'workFlow' is deprecated.
## Use 'flowWorkspace::GatingSet' instead.
## See help("Deprecated")
asinh <- arcsinhTransform()
tl <- transformList(colnames(ITN)[3:7], asinh, 
                      transformationId = "asinh")
add(wf, tl)

Next we use the lymphGate function to find the T-cells in the CD3/SSC projection.

## Identify T-cells population
lg <- lymphGate(Data(wf[["asinh"]]), channels=c("SSC", "CD3"),
         preselection="CD4", filterId="TCells", eval=FALSE,
         scale=2.5)
add(wf, lg$n2gate, parent="asinh")
print(xyplot(SSC ~ CD3| PatientID, wf[["TCells+"]],
             par.settings=list(gate=list(col="red", 
             fill="red", alpha=0.3))))
## Note: method with signature 'filter#missing' chosen for function 'glpolygon',
##  target signature 'logicalFilterResult#missing'.
##  "filterResult#ANY" would also be valid

A typical workflow for flow cytometry data analysis in Bioconductor flow packages include data transformation, normalization, filtering, manual gating, semi-automatic gating and automatic clustering if desired. Details can be found in flowWorkFlow.pdf or the vignettes of the flow cytometry packages.

2 Installation and Use

Follow installation instructions to start using these packages. To install the flowCore package and all of its dependencies, evaluate the commands

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("flowCore")

Package installation is required only once per R installation. View a full list of available packages.

To use the flowCore package, evaluate the command

library("flowCore")

This instruction is required once in each R session.

3 Exploring Package Content

Packages have extensive help pages, and include vignettes highlighting common use cases. The help pages and vignettes are available from within R. After loading a package, use syntax like

help(package="flowCore")
?read.FCS

to obtain an overview of help on the flowCore package, and the read.FCS function, and

browseVignettes(package="flowCore")

to view vignettes (providing a more comprehensive introduction to package functionality) in the flowCore package. Use

help.start()

to open a web page containing comprehensive help resources.

4 Diverse Assays Resources

The following provide a brief overview of packages useful for analysis of high-throughput assays. More comprehensive workflows can be found in documentation (available from package descriptions) and in Bioconductor publications.

4.1 Flow Cytometry

These packages use standard FCS files, including infrastructure, utilities, visualization and semi-autogating methods for the analysis of flow cytometry data.

flowCore, flowViz, flowQ, flowStats, flowUtils, flowFP, flowTrans,

Algorithms for clustering flow cytometry data are found in these packages:

flowClust, flowMeans, flowMerge, SamSPECTRAL

A typical workflow using the packages flowCore, flowViz, flowQ and flowStats is described in detail in flowWorkFlow.pdf. The data files used in the workflow can be downloaded from here.

4.2 Cell-based Assays

These packages provide data structures and algorithms for cell-based high-throughput screens (HTS).

cellHTS2, RNAither

This package supports the xCELLigence system which contains a series of real-time cell analyzer (RTCA).

RTCA

4.3 High-throughput qPCR Assays

These package provide algorithm for the analysis of cycle threshold (Ct) from quantitative real-time PCR data.

HTqPCR, ddCt, qpcrNorm

4.4 Mass Spectrometry and Proteomics data

These packages provide framework for processing, visualization, and statistical analysis of mass spectral and proteomics data.

clippda, MassArray, MassSpecWavelet, PROcess, flagme, xcms

4.5 Imaging Based Assays

These packages provide infrastructure for image-based phenotyping and automation of other image-related tasks:

EBImage, imageHTS

sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 9 (stretch)
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] splines   methods   stats     graphics  grDevices utils     datasets 
## [8] base     
## 
## other attached packages:
##  [1] flowViz_1.40.0            lattice_0.20-35          
##  [3] flowStats_3.34.0          flowWorkspace_3.24.4     
##  [5] ncdfFlow_2.22.0           BH_1.62.0-1              
##  [7] RcppArmadillo_0.7.900.2.0 cluster_2.0.6            
##  [9] fda_2.4.4                 Matrix_1.2-10            
## [11] flowCore_1.42.2           shiny_1.0.3              
## [13] rmarkdown_1.6             knitr_1.16               
## 
## loaded via a namespace (and not attached):
##  [1] Biobase_2.36.2      jsonlite_1.5        assertthat_0.2.0   
##  [4] highr_0.6           stats4_3.4.1        latticeExtra_0.6-28
##  [7] yaml_2.1.14         robustbase_0.92-7   backports_1.1.0    
## [10] glue_1.1.1          digest_0.6.12       RColorBrewer_1.1-2 
## [13] colorspace_1.3-2    htmltools_0.3.6     httpuv_1.3.5       
## [16] plyr_1.8.4          multicool_0.1-10    pcaPP_1.9-72       
## [19] XML_3.98-1.9        pkgconfig_2.0.1     misc3d_0.8-4       
## [22] questionr_0.6.1     bookdown_0.4        zlibbioc_1.22.0    
## [25] xtable_1.8-2        corpcor_1.6.9       mvtnorm_1.0-6      
## [28] scales_0.4.1        tibble_1.3.3        BiocGenerics_0.22.0
## [31] hexbin_1.27.1       magrittr_1.5        IDPmisc_1.1.17     
## [34] mime_0.5            evaluate_0.10.1     ks_1.10.7          
## [37] MASS_7.3-47         FNN_1.1             graph_1.54.0       
## [40] tools_3.4.1         data.table_1.10.4   matrixStats_0.52.2 
## [43] stringr_1.2.0       munsell_0.4.3       bindrcpp_0.2       
## [46] compiler_3.4.1      rlang_0.1.1         grid_3.4.1         
## [49] rstudioapi_0.6      htmlwidgets_0.9     miniUI_0.1.1       
## [52] gtable_0.2.0        codetools_0.2-15    rrcov_1.4-3        
## [55] R6_2.2.2            gridExtra_2.2.1     dplyr_0.7.1        
## [58] bindr_0.1           rprojroot_1.2       KernSmooth_2.23-15 
## [61] Rgraphviz_2.20.0    stringi_1.1.5       parallel_3.4.1     
## [64] rmdformats_0.3.3    Rcpp_0.12.11        DEoptimR_1.0-8