Using Bioconductor with High Throughput Assays
Bioconductor includes packages for analysis of diverse areas of high-throughput assays such as flow cytometry, quantitative real-time PCR, mass spectrometry, proteomics and other cell-based data.
1 Sample Workflow
The following psuedo-code illustrates a typical R / Bioconductor session. It makes use of the flow cytometry packages to load, transform and visualize the flow data and gate certain populations in the dataset.
The workflow loads the flowCore
, flowStats
and flowViz
packages and its dependencies. It loads the ITN data with 15 samples, each of which includes, in addition to FSC and SSC, 5 fluorescence channels: CD3, CD4, CD8, CD69 and HLADR.
## Load packages
library(flowCore)
library(flowStats)
library(flowViz) # for flow data visualization
## Load data
data(ITN)
ITN
## A flowSet with 15 experiments.
##
## An object of class 'AnnotatedDataFrame'
## rowNames: sample01 sample02 ... sample15 (15 total)
## varLabels: GroupID SiteCode ... name (7 total)
## varMetadata: labelDescription
##
## column names:
## FSC SSC CD8 CD69 CD4 CD3 HLADr Time
First, we need to transform all the fluorescence channels. Using a workFlow
object can help to keep track of our progress.
## Create a workflow instance and transform data using asinh
wf <- workFlow(ITN)
## Warning: 'workFlow' is deprecated.
## Use 'flowWorkspace::GatingSet' instead.
## See help("Deprecated")
asinh <- arcsinhTransform()
tl <- transformList(colnames(ITN)[3:7], asinh,
transformationId = "asinh")
add(wf, tl)
Next we use the lymphGate
function to find the T-cells in the CD3/SSC projection.
## Identify T-cells population
lg <- lymphGate(Data(wf[["asinh"]]), channels=c("SSC", "CD3"),
preselection="CD4", filterId="TCells", eval=FALSE,
scale=2.5)
add(wf, lg$n2gate, parent="asinh")
print(xyplot(SSC ~ CD3| PatientID, wf[["TCells+"]],
par.settings=list(gate=list(col="red",
fill="red", alpha=0.3))))
## Note: method with signature 'filter#missing' chosen for function 'glpolygon',
## target signature 'logicalFilterResult#missing'.
## "filterResult#ANY" would also be valid
A typical workflow for flow cytometry data analysis in Bioconductor flow packages include data transformation, normalization, filtering, manual gating, semi-automatic gating and automatic clustering if desired. Details can be found in flowWorkFlow.pdf or the vignettes of the flow cytometry packages.
2 Installation and Use
Follow installation instructions to start using these packages. To install the flowCore
package and all of its dependencies, evaluate the commands
## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("flowCore")
Package installation is required only once per R installation. View a full list of available packages.
To use the flowCore
package, evaluate the command
library("flowCore")
This instruction is required once in each R session.
3 Exploring Package Content
Packages have extensive help pages, and include vignettes highlighting common use cases. The help pages and vignettes are available from within R. After loading a package, use syntax like
help(package="flowCore")
?read.FCS
to obtain an overview of help on the flowCore
package, and the read.FCS
function, and
browseVignettes(package="flowCore")
to view vignettes (providing a more comprehensive introduction to package functionality) in the flowCore
package. Use
help.start()
to open a web page containing comprehensive help resources.
4 Diverse Assays Resources
The following provide a brief overview of packages useful for analysis of high-throughput assays. More comprehensive workflows can be found in documentation (available from package descriptions) and in Bioconductor publications.
4.1 Flow Cytometry
These packages use standard FCS files, including infrastructure, utilities, visualization and semi-autogating methods for the analysis of flow cytometry data.
flowCore, flowViz, flowQ, flowStats, flowUtils, flowFP, flowTrans,
Algorithms for clustering flow cytometry data are found in these packages:
flowClust, flowMeans, flowMerge, SamSPECTRAL
A typical workflow using the packages flowCore
, flowViz
, flowQ
and flowStats
is described in detail in flowWorkFlow.pdf. The data files used in the workflow can be downloaded from here.
4.2 Cell-based Assays
These packages provide data structures and algorithms for cell-based high-throughput screens (HTS).
This package supports the xCELLigence system which contains a series of real-time cell analyzer (RTCA).
4.3 High-throughput qPCR Assays
These package provide algorithm for the analysis of cycle threshold (Ct) from quantitative real-time PCR data.
4.4 Mass Spectrometry and Proteomics data
These packages provide framework for processing, visualization, and statistical analysis of mass spectral and proteomics data.
4.5 Imaging Based Assays
These packages provide infrastructure for image-based phenotyping and automation of other image-related tasks:
sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 9 (stretch)
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] splines methods stats graphics grDevices utils datasets
## [8] base
##
## other attached packages:
## [1] flowViz_1.40.0 lattice_0.20-35
## [3] flowStats_3.34.0 flowWorkspace_3.24.4
## [5] ncdfFlow_2.22.0 BH_1.62.0-1
## [7] RcppArmadillo_0.7.900.2.0 cluster_2.0.6
## [9] fda_2.4.4 Matrix_1.2-10
## [11] flowCore_1.42.2 shiny_1.0.3
## [13] rmarkdown_1.6 knitr_1.16
##
## loaded via a namespace (and not attached):
## [1] Biobase_2.36.2 jsonlite_1.5 assertthat_0.2.0
## [4] highr_0.6 stats4_3.4.1 latticeExtra_0.6-28
## [7] yaml_2.1.14 robustbase_0.92-7 backports_1.1.0
## [10] glue_1.1.1 digest_0.6.12 RColorBrewer_1.1-2
## [13] colorspace_1.3-2 htmltools_0.3.6 httpuv_1.3.5
## [16] plyr_1.8.4 multicool_0.1-10 pcaPP_1.9-72
## [19] XML_3.98-1.9 pkgconfig_2.0.1 misc3d_0.8-4
## [22] questionr_0.6.1 bookdown_0.4 zlibbioc_1.22.0
## [25] xtable_1.8-2 corpcor_1.6.9 mvtnorm_1.0-6
## [28] scales_0.4.1 tibble_1.3.3 BiocGenerics_0.22.0
## [31] hexbin_1.27.1 magrittr_1.5 IDPmisc_1.1.17
## [34] mime_0.5 evaluate_0.10.1 ks_1.10.7
## [37] MASS_7.3-47 FNN_1.1 graph_1.54.0
## [40] tools_3.4.1 data.table_1.10.4 matrixStats_0.52.2
## [43] stringr_1.2.0 munsell_0.4.3 bindrcpp_0.2
## [46] compiler_3.4.1 rlang_0.1.1 grid_3.4.1
## [49] rstudioapi_0.6 htmlwidgets_0.9 miniUI_0.1.1
## [52] gtable_0.2.0 codetools_0.2-15 rrcov_1.4-3
## [55] R6_2.2.2 gridExtra_2.2.1 dplyr_0.7.1
## [58] bindr_0.1 rprojroot_1.2 KernSmooth_2.23-15
## [61] Rgraphviz_2.20.0 stringi_1.1.5 parallel_3.4.1
## [64] rmdformats_0.3.3 Rcpp_0.12.11 DEoptimR_1.0-8