Package 'epiomics' reference manual

Title:	Analysis of Omics Data in Observational Studies
Description:	A collection of fast and flexible functions for analyzing omics data in observational studies. Multiple different approaches for integrating multiple environmental/genetic factors, omics data, and/or phenotype data are implemented. This includes functions for performing omics wide association studies with one or more variables of interest as the exposure or outcome; a function for performing a meet in the middle analysis for linking exposures, omics, and outcomes (as described by Chadeau-Hyam et al., (2010) <doi:10.3109/1354750X.2010.533285>); and a function for performing a mixtures analysis across all omics features using quantile-based g-Computation (as described by Keil et al., (2019) <doi:10.1289/EHP5838>).
Authors:	Jesse Goodrich [aut, cre]
Maintainer:	Jesse Goodrich <[email protected]>
License:	GPL (>= 3)
Version:	1.2.0
Built:	2025-03-16 05:20:50 UTC
Source:	https://github.com/goodrich-lab/epiomics

Create volcano plot using results from owas

Description

Creates a coefficient plot based on ggplot using the results from the owas function.

Usage

coef_plot_from_owas(
  df,
  main_cat_var = NULL,
  order_effects = TRUE,
  highlight_adj_p = TRUE,
  highlight_adj_p_threshold = 0.05,
  effect_ratio = FALSE,
  flip_axis = FALSE,
  filter_p_less_than = 1
)
coef_plot_from_owas(
  df,
  main_cat_var = NULL,
  order_effects = TRUE,
  highlight_adj_p = TRUE,
  highlight_adj_p_threshold = 0.05,
  effect_ratio = FALSE,
  flip_axis = FALSE,
  filter_p_less_than = 1
)

Arguments

`df`	output from `owas` function call, using conf_int = TRUE.
`main_cat_var`	Which variable should be the primary categorical variable? Should be either var_name or feature_name. Only relevant if both var_name and feature_name have more than one level. Default is NULL, and the y-axis is chosen as the variable that has more levels.
`order_effects`	Should features be ordered by the mean effect estimate? Default is TRUE.
`highlight_adj_p`	Should features which meet a specific adjusted p-value threshold be highlighted? Default is TRUE.
`highlight_adj_p_threshold`	If `highlight_adj_p` = TRUE, can set annotation_adj_p_threshold to change the adjusted p-value threshold for which features will be highlighted. Defaults to 0.05.
`effect_ratio`	Are the effect estimates on the ratio scale (ie, should the null effect line be centered at 1)? Defaults to FALSE.
`flip_axis`	Flip the x and y axis? Default is FALSE, and the y-axis is plotted with the features or variable names.
`filter_p_less_than`	P-value threshold for which features/variables will be included in the plot. Default is 1, and all features will be included.

Value

A ggplot figure

Examples

data("example_data")

# Get names of omics
colnames_omic_fts <- colnames(example_data)[
  grep("feature_",
       colnames(example_data))][1:5]

# Run function with continuous exposure as the variable of interest
owas_out <- owas(df = example_data,
                 var = "exposure1",
                 omics = colnames_omic_fts,
                 covars = c("age", "sex"),
                 var_exposure_or_outcome = "exposure",
                 family = "gaussian", 
                 conf_int = TRUE)

coef_plot_from_owas(owas_out)

data("example_data")

# Get names of omics
colnames_omic_fts <- colnames(example_data)[
  grep("feature_",
       colnames(example_data))][1:5]

# Run function with continuous exposure as the variable of interest
owas_out <- owas(df = example_data,
                 var = "exposure1",
                 omics = colnames_omic_fts,
                 covars = c("age", "sex"),
                 var_exposure_or_outcome = "exposure",
                 family = "gaussian", 
                 conf_int = TRUE)

coef_plot_from_owas(owas_out)

Example data with multiple exposures, multiple outcomes,

Description

Example data with multiple exposures, multiple outcomes,

Usage

data(example_data)
data(example_data)

Format

An dataframe with multiple exposures, outcomes, and omics features.

Examples

data(example_data)

data(example_data)

Perform 'omics wide association study

Description

Implements a meet in the middle analysis for identifying omics associated with both exposures and outcomes, as described by Chadeau-Hyam et al., 2010.

Usage

meet_in_middle(
  df,
  exposure,
  outcome,
  omics,
  covars = NULL,
  outcome_family = "gaussian",
  confidence_level = 0.95,
  conf_int = FALSE,
  ref_group_exposure = NULL,
  ref_group_outcome = NULL
)
meet_in_middle(
  df,
  exposure,
  outcome,
  omics,
  covars = NULL,
  outcome_family = "gaussian",
  confidence_level = 0.95,
  conf_int = FALSE,
  ref_group_exposure = NULL,
  ref_group_outcome = NULL
)

Arguments

`df`	Dataframe
`exposure`	Name of the exposure of interest. Can be either continuous or dichotomous. Currently, only a single exposure is supported.
`outcome`	Name of the outcome of interest. Can be either continuous or dichotomous. For dichotomous variables, must set `outcome_family` to "logistic", and values must be either 0/1 or a factor with the first level representing the reference group. Currently, only a single outcome is supported.
`omics`	Names of all omics features in the dataset
`covars`	Names of covariates (can be NULL)
`outcome_family`	"gaussian" for linear models (via lm) or "binomial" for logistic (via glm)
`confidence_level`	Confidence level for marginal significance (defaults to 0.95)
`conf_int`	Should Confidence intervals be generated for the estimates? Default is FALSE. Setting to TRUE will take longer. For logistic models, calculates Wald confidence intervals via `confint.default`.
`ref_group_exposure`	Reference category if the exposure is a character or factor. If not, can leave empty.
`ref_group_outcome`	Reference category if the outcome is a character or factor. If not, can leave empty.

Value

A list of three dataframes, containing:

Results from the Exposure-Omics Wide Association Study
Results from the Omics-Outcome Wide Association Study
Overlapping significant features from 1 and 2. For each omics wide association, results are provided in a data frame with 6 columns: feature_name: name of the omics feature estimate: the model estimate for the feature. For linear models, this is the beta: for logistic models, this is the log odds. se: Standard error of the estimate p_value: p-value for the estimate adjusted_pval: FDR adjusted p-value threshold: Marginal significance, based on unadjusted p-values

Examples

# Load Example Data
data("example_data")

# Get names of omics
colnames_omic_fts <- colnames(example_data)[grep("feature_",
                                              colnames(example_data))][1:10]

# Meet in the middle with a dichotomous outcome
res <- meet_in_middle(df = example_data,
                      exposure = "exposure1", 
                      outcome = "disease1", 
                      omics = colnames_omic_fts,
                      covars = c("age", "sex"), 
                      outcome_family = "binomial")

# Meet in the middle with a continuous outcome 
res <- meet_in_middle(df = example_data,
                      exposure = "exposure1", 
                      outcome = "weight", 
                      omics = colnames_omic_fts,
                      covars = c("age", "sex"), 
                      outcome_family = "gaussian")

# Meet in the middle with a continuous outcome and no covariates
res <- meet_in_middle(df = example_data,
                      exposure = "exposure1", 
                      outcome = "weight", 
                      omics = colnames_omic_fts,
                      outcome_family = "gaussian")

# Load Example Data
data("example_data")

# Get names of omics
colnames_omic_fts <- colnames(example_data)[grep("feature_",
                                              colnames(example_data))][1:10]

# Meet in the middle with a dichotomous outcome
res <- meet_in_middle(df = example_data,
                      exposure = "exposure1", 
                      outcome = "disease1", 
                      omics = colnames_omic_fts,
                      covars = c("age", "sex"), 
                      outcome_family = "binomial")

# Meet in the middle with a continuous outcome 
res <- meet_in_middle(df = example_data,
                      exposure = "exposure1", 
                      outcome = "weight", 
                      omics = colnames_omic_fts,
                      covars = c("age", "sex"), 
                      outcome_family = "gaussian")

# Meet in the middle with a continuous outcome and no covariates
res <- meet_in_middle(df = example_data,
                      exposure = "exposure1", 
                      outcome = "weight", 
                      omics = colnames_omic_fts,
                      outcome_family = "gaussian")

Perform 'omics wide association study

Description

Implements an omics wide association study with the option of using the 'omics data as either the dependent variable (i.e., for performing an exposure –> 'omics analysis) or using the 'omics as the independent variable (i.e., for performing an 'omics –> outcome analysis). Allows for either continuous or dichotomous outcomes, and provides the option to adjust for covariates.

Usage

owas(
  df,
  var,
  omics,
  covars = NULL,
  var_exposure_or_outcome,
  family = "gaussian",
  confidence_level = 0.95,
  conf_int = FALSE,
  ref_group = NULL,
  test_data_quality = TRUE
)
owas(
  df,
  var,
  omics,
  covars = NULL,
  var_exposure_or_outcome,
  family = "gaussian",
  confidence_level = 0.95,
  conf_int = FALSE,
  ref_group = NULL,
  test_data_quality = TRUE
)

Arguments

`df`	Dataset
`var`	Name of the variable or variables of interest- this is usually either an exposure variable or an outcome variable. Can be either continuous or dichotomous. For dichotomous variables, must set `family` to "binomial", and values must be either 0/1 or a factor with the first level representing the reference group. Can handle multiple variables, but they must all be of the same `family`.
`omics`	Names of all omics features in the dataset
`covars`	Names of covariates (can be NULL)
`var_exposure_or_outcome`	Is the variable of interest an exposure (independent variable) or outcome (dependent variable)? Must be either "exposure" or "outcome"
`family`	"gaussian" (default) for linear models (via lm) or "binomial" for logistic (via glm)
`confidence_level`	Confidence level for marginal significance (defaults to 0.95, or an alpha of 0.05)
`conf_int`	Should Confidence intervals be generated for the estimates? Default is FALSE. Setting to TRUE will take longer. For logistic models, calculates Wald confidence intervals via `confint.default`.
`ref_group`	Reference category if the variable of interest is a character or factor. If not, can leave empty.
`test_data_quality`	If TRUE (default), then code will ensure that the variance of all variables in the analysis is greater than 0 after dropping any missing data.

Value

A data frame with 6 columns: feature_name: name of the omics feature estimate: the model estimate for the feature. For linear models, this is the beta; for logistic models, this is the log odds. se: Standard error of the estimate test_statistic: t-value p_value: p-value for the estimate adjusted_pval: FDR adjusted p-value threshold: Marginal significance, based on unadjusted p-values

Examples

# Load Example Data
data("example_data")

# Get names of omics
colnames_omic_fts <- colnames(example_data)[grep("feature_",
                                              colnames(example_data))][1:10]

# Get names of exposures
expnms = c("exposure1", "exposure2", "exposure3")

# Run function with one continuous exposure as the variable of interest
owas(df = example_data, 
     var = "exposure1", 
     omics = colnames_omic_fts, 
     covars = c("age", "sex"), 
     var_exposure_or_outcome = "exposure", 
     family = "gaussian")
     
# Run function with multiple continuous exposures as the variable of interest
owas(df = example_data, 
     var = expnms, 
     omics = colnames_omic_fts, 
     covars = c("age", "sex"), 
     var_exposure_or_outcome = "exposure", 
     family = "gaussian")

# Run function with dichotomous outcome as the variable of interest
owas(df = example_data, 
     var = "disease1", 
     omics = colnames_omic_fts, 
     covars = c("age", "sex"), 
     var_exposure_or_outcome = "outcome", 
     family = "binomial")

# Load Example Data
data("example_data")

# Get names of omics
colnames_omic_fts <- colnames(example_data)[grep("feature_",
                                              colnames(example_data))][1:10]

# Get names of exposures
expnms = c("exposure1", "exposure2", "exposure3")

# Run function with one continuous exposure as the variable of interest
owas(df = example_data, 
     var = "exposure1", 
     omics = colnames_omic_fts, 
     covars = c("age", "sex"), 
     var_exposure_or_outcome = "exposure", 
     family = "gaussian")
     
# Run function with multiple continuous exposures as the variable of interest
owas(df = example_data, 
     var = expnms, 
     omics = colnames_omic_fts, 
     covars = c("age", "sex"), 
     var_exposure_or_outcome = "exposure", 
     family = "gaussian")

# Run function with dichotomous outcome as the variable of interest
owas(df = example_data, 
     var = "disease1", 
     omics = colnames_omic_fts, 
     covars = c("age", "sex"), 
     var_exposure_or_outcome = "outcome", 
     family = "binomial")

Perform 'omics wide association study for matched case control studies

Description

Implements an omics wide association study for matched case control studies using conditional logistic regression. For this function, the variable of of interest should be a dichotomous outcome, and the strata is the variable indicating the matching.

Usage

owas_clogit(
  df,
  cc_status,
  cc_set,
  omics,
  covars = NULL,
  confidence_level = 0.95,
  conf_int = FALSE,
  method = "efron",
  test_data_quality = TRUE
)
owas_clogit(
  df,
  cc_status,
  cc_set,
  omics,
  covars = NULL,
  confidence_level = 0.95,
  conf_int = FALSE,
  method = "efron",
  test_data_quality = TRUE
)

Arguments

`df`	Dataset
`cc_status`	Name of the variable indicating case control status. Must be either 0/1 or a factor with the first level representing the reference group.
`cc_set`	Name of the variable indicating the case control set.
`omics`	Names of all omics features in the dataset reference group.
`covars`	Names of covariates (can be NULL)
`confidence_level`	Confidence level for marginal significance (defaults to 0.95, or an alpha of 0.05)
`conf_int`	Should Confidence intervals be generated for the estimates? Default is FALSE. Setting to TRUE will take longer. For logistic models, calculates Wald confidence intervals via `confint.default`.
`method`	method used the correct (exact) calculation in the conditional likelihood or one of the approximations. Default is "efron". Passed to `clogit`.
`test_data_quality`	If TRUE (default), then code will ensure that the variance of all variables in the analysis is greater than 0 after dropping any missing data.

Value

A data frame with 6 columns: feature_name: name of the omics feature estimate: the model estimate for the feature. For linear models, this is the beta; for logistic models, this is the log odds. se: Standard error of the estimate test statistic: t-value p_value: p-value for the estimate adjusted_pval: FDR adjusted p-value threshold: Marginal significance, based on unadjusted p-values

Perform omics wide association study using qgcomp

Description

Omics wide association study using quantile-based g-Computation (as described by Keil et al., (2019) doi:10.1289/EHP5838) to examine associations of exposure mixtures with each individual 'omics feature as an outcome 'omics data as either the dependent variable. Allows for either continuous or dichotomous outcomes, and provides the option to adjust for covariates.

Usage

owas_qgcomp(
  df,
  expnms,
  omics,
  covars = NULL,
  q = 4,
  confidence_level = 0.95,
  family = "gaussian",
  rr = TRUE,
  run.qgcomp.boot = TRUE,
  test_data_quality = TRUE
)
owas_qgcomp(
  df,
  expnms,
  omics,
  covars = NULL,
  q = 4,
  confidence_level = 0.95,
  family = "gaussian",
  rr = TRUE,
  run.qgcomp.boot = TRUE,
  test_data_quality = TRUE
)

Arguments

`df`	Dataset
`expnms`	Name of the exposures. Can be either continuous or dichotomous. For dichotomous variables, must set `q` to "NULL", and values must be either 0/1.
`omics`	Names of all omics features in the dataset
`covars`	Names of covariates (can be NULL)
`q`	NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. Defaults to 4If NULL, then qgcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation).
`confidence_level`	Confidence level for marginal significance (defaults to 0.95, or an alpha of 0.05)
`family`	Currently only "gaussian" (default) for linear models (via lm) or "binomial" for logistic. Default is "gaussian".
`rr`	see `qgcomp()`
`run.qgcomp.boot`	Should the model be fit with qgcomp.boot? See package qgcomp.boot for details. Default is TRUE. Setting to FALSE decreases computational time.
`test_data_quality`	If TRUE (default), then code will ensure that the variance of all variables in the analysis is greater than 0 after dropping any missing data.

Value

A data frame with the following columns: feature: name of the omics feature psi: the model estimate for the feature. For linear models, this is the beta; for logistic models, this is the log odds. lcl_psi: the lower confidence interval. ucl_psi: the upper confidence interval. p_value: p-value for the estimate test_statistic: t-statistic for psi coefficient adjusted_pval: FDR adjusted p-value threshold: Marginal significance, based on unadjusted p-values covariates: the names of covariates in the model, if any coef_exposure: the individual coefficient of each exposure

Examples

# Load Example Data
data("example_data")

# Get names of omics
colnames_omic_fts <- colnames(example_data)[grep("feature_",
                                              colnames(example_data))][1:5]

# Names of exposures in mixture
 exposure_names = c("exposure1", "exposure2", "exposure3")

# Run function without covariates
out <- owas_qgcomp(df = example_data,
                   expnms = exposure_names,
                   omics = colnames_omic_fts,
                   q = 4, 
                   confidence_level = 0.95) 


# Run analysis with covariates
out <- owas_qgcomp(df = example_data,
                   expnms = c("exposure1", "exposure2", "exposure3"),
                   covars = c("weight", "age", "sex"),
                   omics = colnames_omic_fts,
                   q = 4, 
                   confidence_level = 0.95) 
 
# Load Example Data
data("example_data")

# Get names of omics
colnames_omic_fts <- colnames(example_data)[grep("feature_",
                                              colnames(example_data))][1:5]

# Names of exposures in mixture
 exposure_names = c("exposure1", "exposure2", "exposure3")

# Run function without covariates
out <- owas_qgcomp(df = example_data,
                   expnms = exposure_names,
                   omics = colnames_omic_fts,
                   q = 4, 
                   confidence_level = 0.95) 


# Run analysis with covariates
out <- owas_qgcomp(df = example_data,
                   expnms = c("exposure1", "exposure2", "exposure3"),
                   covars = c("weight", "age", "sex"),
                   omics = colnames_omic_fts,
                   q = 4, 
                   confidence_level = 0.95)

Create volcano plot using results from owas

Description

Creates a volcano plot based on ggplot using the results from the owas function.

Usage

volcano_owas(
  df,
  annotate_ftrs = TRUE,
  annotation_p_threshold = 0.05,
  highlight_adj_p = TRUE,
  highlight_adj_p_threshold = 0.05,
  horizontal_line_p_value = 0.05
)
volcano_owas(
  df,
  annotate_ftrs = TRUE,
  annotation_p_threshold = 0.05,
  highlight_adj_p = TRUE,
  highlight_adj_p_threshold = 0.05,
  horizontal_line_p_value = 0.05
)

Arguments

`df`	output from `owas` function call
`annotate_ftrs`	Should features be annotated with the feature name? Default is TRUE. If necessary can change the p_value_threshold as well.
`annotation_p_threshold`	If `annotate_ftrs` = TRUE, can set annotation_p_threshold to change the p-value threshold for which features will be annotated. Defaults to 0.05.
`highlight_adj_p`	Should features which meet a specific adjusted p-value threshold be highlighted? Default is TRUE.
`highlight_adj_p_threshold`	If `highlight_adj_p` = TRUE, can set annotation_adj_p_threshold to change the adjusted p-value threshold for which features will be highlighted. Defaults to 0.05.
`horizontal_line_p_value`	Set the p-value for the horizontal line for the threshold of significance.

Value

A ggplot figure

Examples

data("example_data")

# Get names of omics
colnames_omic_fts <- colnames(example_data)[
  grep("feature_",
       colnames(example_data))][1:5]

# Run function with continuous exposure as the variable of interest
owas_out <- owas(df = example_data,
                 var = "exposure1",
                 omics = colnames_omic_fts,
                 covars = c("age", "sex"),
                 var_exposure_or_outcome = "exposure",
                 family = "gaussian")

vp <- volcano_owas(owas_out)

data("example_data")

# Get names of omics
colnames_omic_fts <- colnames(example_data)[
  grep("feature_",
       colnames(example_data))][1:5]

# Run function with continuous exposure as the variable of interest
owas_out <- owas(df = example_data,
                 var = "exposure1",
                 omics = colnames_omic_fts,
                 covars = c("age", "sex"),
                 var_exposure_or_outcome = "exposure",
                 family = "gaussian")

vp <- volcano_owas(owas_out)

Package 'epiomics'

Help Index

Create volcano plot using results from owas

Description

Usage

Arguments

Value

Examples

Example data with multiple exposures, multiple outcomes,

Description

Usage

Format

Examples

Perform 'omics wide association study

Description

Usage

Arguments

Value

Examples

Perform 'omics wide association study

Description

Usage

Arguments

Value

Examples

Perform 'omics wide association study for matched case control studies

Description

Usage

Arguments

Value

Perform omics wide association study using qgcomp

Description

Usage

Arguments

Value

Examples

Create volcano plot using results from owas

Description

Usage

Arguments

Value

Examples