An introduction to opm opm an R package to analyse OmniLog Phenotype MicroArray data Dr. Johannes Sikorski, Dr. Lea Vaas, Dr. Markus Göker Leibniz-Institut DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH www.dsmz.de
You have numerous OmniLog Phenotype MicroArray data of closely related organisms or cell lines of numerous well-defined mutants obtained under diverse physiological test conditions www.biolog.com and you want to explore them full-fledged and quantitatively into diverse directions of analysis frameworks.
opm: Tools for analysing OmniLog(R) Phenotype Microarray data enables you: to organize your PM data, curve parameters and metadata to subset and query your data graphical display of raw kinetics or aggregated curve parameters exploit the full statistics implemented in R export to third-party software using YAML
Software requirements http://www.r-project.org/ R is a free software environment for statistical computing and graphics. http://www.rstudio.org/ RStudio is a free and open source integrated development environment (IDE) for R. http://cran.r-project.org/web/packages/opm/index.html add-on package opm: Tools for analysing OmniLog(R) Phenotype Microarray data
R Code of this presentation The R code of this presentation is available on request from Dr. Johannes Sikorski Dr. Lea Vaas Dr. Markus Göker johannes.sikorski@dsmz.de l.vaas@cbs.knaw.nl markus.goeker@dsmz.de Feel free to contact us in case of any questions regarding usage of opm.
opm enables you: to organize your PM data, curve parameters and metadata to subset and query your data graphical display of raw kinetics or aggregated curve parameters exploit the full statistics implemented in R export to third-party software using YAML
OPM organizes your PM data in OPMS objects: Example: a set of 9 PM plates of the same plate type intensity Hour Hour 00.00 00.25 00.50. 30.00. 60.00 lysin 35 33 37. 102. 328 per well: raw kinetic data An OPMS object stores: raw kinetic data aggregated curve parameters metadata Plate 1 Plate 2 Plate 3 Plate 4 Plate 5 Plate 6 Plate 7 Plate 8 Plate 9 Plate 1 Plate 2 Plate 3 Plate 4 Plate 5 Plate 6 Plate 7 Plate 8 Plate 9 Plate 1 Plate 2 Plate 3 Plate 4 Plate 5 Plate 6 Plate 7 Plate 8 Plate 9 The size of the OPMS object is only limited by the amount of RAM memory lysin mu 15.559078 lambda 5.798210 A 305.989319 AUC 23308.269348 mu CI95 low 3.803466 lambda CI95 low 1.080333 A CI95 low 305.642353 AUC CI95 low 23125.092442 mu CI95 high 140.841704 lambda CI95 high 11.819251 A CI95 high 306.986123 AUC CI95 high 23411.648024 metadata Plate 3 Taxonomy Bacillus subtilis. habitat soil sampling place GPS coord. sampling date 2011-06-15 sampling season summer habitat [ C] 27. sporulation yes. PCR (gene xyz) positive.... as much and what you wish... per well: aggregated curve parameters, confidenceintervals from bootstrapping per plate: any metadata of interest to the user Lag = lambda, Slope = mu, Max = A, Area Under the Curve = AUC
read_opm() Leibniz-Institut DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH
Load Demo files that come with the opm package read_opm(names, convert = c("try", "no", "yes", "sep", "grp"), gen.iii = FALSE, include = list(),..., demo = FALSE) # Use the built-in function opm_files() to retrieve the paths where the example files in your R installation are located: (files <- opm_files("testdata")) # read in the files, which are zipped # using the include argument to select specific plates of interest # by this, three files are loaded into the object "example.opm" example.opm <- read_opm(files, include = "*Example_?.csv.xz")
Load Demo files that come with the opm package read_opm(names, convert = c("try", "no", "yes", "sep", "grp"), gen.iii = FALSE, include = list(),..., demo = FALSE) # read in all CSV raw data files in your working directory PM1 <- read_opm(".") # read in all CSV raw data files in your working directory and convert the plate type to GenIII plates GenIII <- read_opm(".", gen.iii = TRUE)
Load Demo files that come with the opm package # let us check some information on the files in this OPMS object plates(example.opm) summary(example.opm) show(example.opm) dim(example.opm) hours(example.opm) length(example.opm) max(example.opm) plate_type(example.opm) seq(example.opm) setup_time(example.opm) measurements(example.opm) wells(example.opm) wells(example.opm, full = TRUE)
Load Demo files that come with the opm package # let us check some information on the files in this OPMS object plates(example.opm) summary(example.opm) show(example.opm) dim(example.opm) hours(example.opm) length(example.opm) max(example.opm) plate_type(example.opm) seq(example.opm) setup_time(example.opm) measurements(example.opm) wells(example.opm) wells(example.opm, full = TRUE)
Load Demo files that come with the opm package # let us check some information on the files in this OPMS object plates(example.opm) summary(example.opm) show(example.opm) dim(example.opm) hours(example.opm) length(example.opm) max(example.opm) plate_type(example.opm) seq(example.opm) setup_time(example.opm) measurements(example.opm) wells(example.opm) wells(example.opm, full = TRUE)
Load Demo files that come with the opm package # let us check some information on the files in this OPMS object plates(example.opm) summary(example.opm) show(example.opm) dim(example.opm) hours(example.opm) length(example.opm) max(example.opm) plate_type(example.opm) seq(example.opm) setup_time(example.opm) measurements(example.opm) wells(example.opm) wells(example.opm, full = TRUE)
Load Demo files that come with the opm package # let us check some information on the files in this OPMS object plates(example.opm) summary(example.opm) show(example.opm) dim(example.opm) hours(example.opm) length(example.opm) max(example.opm) plate_type(example.opm) seq(example.opm) setup_time(example.opm) measurements(example.opm) wells(example.opm) wells(example.opm, full = TRUE)
Load Demo files that come with the opm package # let us check some information on the files in this OPMS object plates(example.opm) summary(example.opm) show(example.opm) dim(example.opm) hours(example.opm) length(example.opm) max(example.opm) plate_type(example.opm) seq(example.opm) setup_time(example.opm) measurements(example.opm) wells(example.opm) wells(example.opm, full = TRUE)
Load Demo files that come with the opm package # let us check some information on the files in this OPMS object plates(example.opm) summary(example.opm) show(example.opm) dim(example.opm) hours(example.opm) length(example.opm) max(example.opm) plate_type(example.opm) seq(example.opm) setup_time(example.opm) measurements(example.opm) wells(example.opm) wells(example.opm, full = TRUE)
Load Demo files that come with the opm package # let us check some information on the files in this OPMS object plates(example.opm) summary(example.opm) show(example.opm) dim(example.opm) hours(example.opm) length(example.opm) max(example.opm) plate_type(example.opm) seq(example.opm) setup_time(example.opm) measurements(example.opm) wells(example.opm) wells(example.opm, full = TRUE)
Load Demo files that come with the opm package # let us check some information on the files in this OPMS object plates(example.opm) summary(example.opm) show(example.opm) dim(example.opm) hours(example.opm) length(example.opm) max(example.opm) plate_type(example.opm) seq(example.opm) setup_time(example.opm) measurements(example.opm) returns the raw kinetic data wells(example.opm) wells(example.opm, full = TRUE)
Load Demo files that come with the opm package # let us check some information on the files in this OPMS object plates(example.opm) summary(example.opm) show(example.opm) dim(example.opm) hours(example.opm) length(example.opm) max(example.opm) plate_type(example.opm) seq(example.opm) setup_time(example.opm) measurements(example.opm) wells(example.opm) wells(example.opm, full = TRUE)
Load Demo files that come with the opm package # let us check some information on the files in this OPMS object plates(example.opm) summary(example.opm) show(example.opm) dim(example.opm) hours(example.opm) length(example.opm) max(example.opm) plate_type(example.opm) seq(example.opm) setup_time(example.opm) measurements(example.opm) wells(example.opm) wells(example.opm, full = TRUE)
do_aggr() aggregate only A and AUC using a fast algorithm x <- do_aggr(example.opm, program = "opm-fast") aggregate all 4 parameters using a spline fit algorithm (grofit package) x <- do_aggr(example.opm) include 100x bootstrap replicates x <- do_aggr(example.opm, program = "opm-fast", boot = 100) x <- do_aggr(example.opm, boot = 100) Note: time consuming
check aggregated data aggregated(example.opm) A01 A02 A03 A04 A07 mu 4.242159 5.769109 0.02138581 0.2827407 0.2383062 lambda -2.340620 12.799329-465.46803431 20.0749555-14.4573092 A 47.923185 62.738943 11.51078807 19.4617762 18.2811191 AUC 3914.852139 4154.830048 1070.20657323 1250.9426009 1396.9447154 mu CI95 low 2.733574 3.045267-1.10076311-2.2050686-4.8515830 lambda CI95 low -38.403543-10.300782 56.14216650 42.4248855 24.8184260 A CI95 low 47.197513 58.940763 11.17285004 19.1992801 16.9627344 AUC CI95 low 3875.243148 4093.577722 1056.62986435 1230.3571787 1352.9702303 mu CI95 high 14.170557 13.689212 6.15737265 9.3063345 21.5309783 lambda CI95 high 79.044830 50.248293 87.70587107 106.1197708 107.3697670 A CI95 high 52.484756 67.456369 15.37628753 23.6590936 30.0717055 AUC CI95 high 3941.361758 4183.239559 1077.02925382 1262.9208049 1432.5071603
OPM organizes your PM data in OPMS objects: Example: a set of 9 PM plates of the same plate type intensity Hour Hour 00.00 00.25 00.50. 30.00. 60.00 lysin 35 33 37. 102. 328 per well: raw kinetic data An OPMS object stores: raw kinetic data aggregated curve parameters metadata Plate 1 Plate 2 Plate 3 Plate 4 Plate 5 Plate 6 Plate 7 Plate 8 Plate 9 Plate 1 Plate 2 Plate 3 Plate 4 Plate 5 Plate 6 Plate 7 Plate 8 Plate 9 Plate 1 Plate 2 Plate 3 Plate 4 Plate 5 Plate 6 Plate 7 Plate 8 Plate 9 You need to provide the metadata separately lysin mu 15.559078 lambda 5.798210 A 305.989319 AUC 23308.269348 mu CI95 low 3.803466 lambda CI95 low 1.080333 A CI95 low 305.642353 AUC CI95 low 23125.092442 mu CI95 high 140.841704 lambda CI95 high 11.819251 A CI95 high 306.986123 AUC CI95 high 23411.648024 metadata Plate 3 Taxonomy Bacillus subtilis. habitat soil sampling place GPS coord. sampling date 2011-06-15 sampling season summer habitat [ C] 27. sporulation yes. PCR (gene xyz) positive.... as much and what you wish... per well: aggregated curve parameters, confidenceintervals from bootstrapping per plate: any metadata of interest to the user Lag = lambda, Slope = mu, Max = A, Area Under the Curve = AUC
You need to provide the metadata separately One Problem Arises: Imagine, you have numerous plates with numerous metadata to each plate. How can we make sure that the metadata are matched CORRECTLY to the specific raw kinetic data? Solution: We need an identifier that perfectly matches metadata to raw kinetic data. We use as identifier the Setup Time and Position of the plate in the reader. Good news: opm allows to export these informations as a start for the metadata file using the function: collect_template()
collect_template() data frame add further metadata columns metadata <- collect_template(files, include = "*Example_?.csv.xz") Unique identifier to merge metadata and raw kinetic data
collect_template() data frame CSV file (or *.txt, *.dat) add further metadata columns add further metadata columns in a spreadsheed application collect_template(files, include = "*Example_?.csv.xz", outfile = "template.csv") note the FORMAT: columns are tab separated, fields protected by quotation marks
collect_template() data frame add further metadata columns CSV file (or *.txt, *.dat) add further metadata columns save tab separated and use quotation marks as field protector load file into R environment using to_metadata()
collect_template() data frame add further metadata columns CSV file (or *.txt, *.dat) add further metadata columns save tab separated and use quotation marks as field protector load file into R environment using metadata.example <- to_metadata("template.csv") metadata.example <- to_metadata("template.csv", strip.white = FALSE) metadata.example <- to_metadata("template.csv", sep = ",")
collect_template() data frame add further metadata columns CSV file (or *.txt, *.dat) add further metadata columns further added metadata columns Note: mock metadata for demonstration purpose
include_metadata() data frame with metadata metadata OPMS object with kinetic raw data example.opm example.opm.metadata <- include_metadata(example.opm, md = metadata)
draw kinetic data xy_plot(example.opm)
xy_plot(example.opm) Leibniz-Institut DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH
xy_plot(example.opm, col = c("blue", "red", "green"), lwd = 2)
xy_plot(example.opm, col = c("blue", "red", "green"), lwd = 2, legend.fmt = list(space = "right"))
xy_plot(example.opm, col = c("blue", "red", "green"), lwd = 2, legend.fmt = list(space = "right"), include = c("species", "strain"))
xy_plot(example.opm, col = c("blue", "red", "green"), lwd = 2, legend.fmt = list(space = "right"), include = c("species", "strain")) Modify panel strip, strip text, and legend by using arguments from lattice
xy_plot(example.opm[,, ]) What about drawing only parts? It is possible to plot (1) specific plates, (2) time points, or (3) wells by indexing OPMS objects using square brackets. xy_plot(example.opm[plates, time points, wells])
xy_plot(example.opm[,, ])
xy_plot(example.opm[ 3,, ])
xy_plot(example.opm[ 3, 1:100, ])
xy_plot(example.opm[ 3, 1:100, c("a01", "A02", "E05", "G08", "H10")])
xy_plot(example.opm[ 3, 1:100, c("a01", "A02", "E05", "G08", "H10")])
Heatmaps compare plates on the basis of aggregated curve parameters
The generation of heatmaps includes two steps: (1) Extract the curve parameter values using extract() (2) Create the heatmap using heat_map()
First step: AUC <- extract(example.opm, dataframe = TRUE, as.labels = list("country", "Species", "strain", "town"), subset = "AUC") metadata of interest parameter and values from aggregating the curve parameters
Second step: heat_map(auc, as.labels = c("species", "town"), as.groups = "town", cexrow = 1.2, use.fun = "gplots", main = "nice heatmap", col = topo.colors(120))
heat_map(auc, as.labels = c("species", "town"), as.groups = "town", cexrow = 1.2, use.fun = "gplots", main = "nice heatmap", col = topo.colors(120))
Confidence interval plot Do curves differ significantly in aggregated curve parameters? We make use of the 95% confidence intervals calculated from 100 bootstrap replicates.
xy_plot(example.opm) In which aggregated curve parameters do these curves differ significantly?
xy_plot(example.opm[,,"d10"], include = list("species","town"), neg.ctrl = FALSE)
ci_plot(example.opm[,, c("d10")], as.labels = list("species","town"), subset = "A")
ci_plot(example.opm[,, c("d10")], as.labels = list("species","town"), subset = "AUC")
ci_plot(example.opm[,, c("d10")], as.labels = list("species","town"), subset = "lambda")
xy_plot(example.opm) Do these curves differ in their lag phase? Try yourself
radial_plot(example.opm[,, 5:17], sep = " ", as.labels = c("species", "town"), draw.legend = FALSE, subset = "AUC")
xy_plot(example.opm[plates, time points, wells]) data(vaas_et_al) -114 GenIII plates (run 96 hours) - numerous replicates of - each two strains of Escherichia coli and Pseudomonas aeruginosa, - including aggregated bootstrapped curve parameters and metadata
data(vaas_et_al)