Title: | Analysis of Omics Data in Observational Studies |
---|---|
Description: | A collection of fast and flexible functions for analyzing omics data in observational studies. Multiple different approaches for integrating multiple environmental/genetic factors, omics data, and/or phenotype data are implemented. This includes functions for performing omics wide association studies with one or more variables of interest as the exposure or outcome; a function for performing a meet in the middle analysis for linking exposures, omics, and outcomes (as described by Chadeau-Hyam et al., (2010) <doi:10.3109/1354750X.2010.533285>); and a function for performing a mixtures analysis across all omics features using quantile-based g-Computation (as described by Keil et al., (2019) <doi:10.1289/EHP5838>). |
Authors: | Jesse Goodrich [aut, cre] |
Maintainer: | Jesse Goodrich <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.2.0 |
Built: | 2025-02-14 05:20:03 UTC |
Source: | https://github.com/goodrich-lab/epiomics |
Creates a coefficient plot based on ggplot using the results from the
owas
function.
coef_plot_from_owas( df, main_cat_var = NULL, order_effects = TRUE, highlight_adj_p = TRUE, highlight_adj_p_threshold = 0.05, effect_ratio = FALSE, flip_axis = FALSE, filter_p_less_than = 1 )
coef_plot_from_owas( df, main_cat_var = NULL, order_effects = TRUE, highlight_adj_p = TRUE, highlight_adj_p_threshold = 0.05, effect_ratio = FALSE, flip_axis = FALSE, filter_p_less_than = 1 )
df |
output from |
main_cat_var |
Which variable should be the primary categorical variable? Should be either var_name or feature_name. Only relevant if both var_name and feature_name have more than one level. Default is NULL, and the y-axis is chosen as the variable that has more levels. |
order_effects |
Should features be ordered by the mean effect estimate? Default is TRUE. |
highlight_adj_p |
Should features which meet a specific adjusted p-value threshold be highlighted? Default is TRUE. |
highlight_adj_p_threshold |
If |
effect_ratio |
Are the effect estimates on the ratio scale (ie, should the null effect line be centered at 1)? Defaults to FALSE. |
flip_axis |
Flip the x and y axis? Default is FALSE, and the y-axis is plotted with the features or variable names. |
filter_p_less_than |
P-value threshold for which features/variables will be included in the plot. Default is 1, and all features will be included. |
A ggplot figure
data("example_data") # Get names of omics colnames_omic_fts <- colnames(example_data)[ grep("feature_", colnames(example_data))][1:5] # Run function with continuous exposure as the variable of interest owas_out <- owas(df = example_data, var = "exposure1", omics = colnames_omic_fts, covars = c("age", "sex"), var_exposure_or_outcome = "exposure", family = "gaussian", conf_int = TRUE) coef_plot_from_owas(owas_out)
data("example_data") # Get names of omics colnames_omic_fts <- colnames(example_data)[ grep("feature_", colnames(example_data))][1:5] # Run function with continuous exposure as the variable of interest owas_out <- owas(df = example_data, var = "exposure1", omics = colnames_omic_fts, covars = c("age", "sex"), var_exposure_or_outcome = "exposure", family = "gaussian", conf_int = TRUE) coef_plot_from_owas(owas_out)
Example data with multiple exposures, multiple outcomes,
data(example_data)
data(example_data)
An dataframe with multiple exposures, outcomes, and omics features.
data(example_data)
data(example_data)
Implements a meet in the middle analysis for identifying omics associated with both exposures and outcomes, as described by Chadeau-Hyam et al., 2010.
meet_in_middle( df, exposure, outcome, omics, covars = NULL, outcome_family = "gaussian", confidence_level = 0.95, conf_int = FALSE, ref_group_exposure = NULL, ref_group_outcome = NULL )
meet_in_middle( df, exposure, outcome, omics, covars = NULL, outcome_family = "gaussian", confidence_level = 0.95, conf_int = FALSE, ref_group_exposure = NULL, ref_group_outcome = NULL )
df |
Dataframe |
exposure |
Name of the exposure of interest. Can be either continuous or dichotomous. Currently, only a single exposure is supported. |
outcome |
Name of the outcome of interest. Can be either continuous or
dichotomous. For dichotomous variables, must set |
omics |
Names of all omics features in the dataset |
covars |
Names of covariates (can be NULL) |
outcome_family |
"gaussian" for linear models (via lm) or "binomial" for logistic (via glm) |
confidence_level |
Confidence level for marginal significance (defaults to 0.95) |
conf_int |
Should Confidence intervals be generated for the estimates?
Default is FALSE. Setting to TRUE will take longer. For logistic models,
calculates Wald confidence intervals via |
ref_group_exposure |
Reference category if the exposure is a character or factor. If not, can leave empty. |
ref_group_outcome |
Reference category if the outcome is a character or factor. If not, can leave empty. |
A list of three dataframes, containing:
Results from the Exposure-Omics Wide Association Study
Results from the Omics-Outcome Wide Association Study
Overlapping significant features from 1 and 2. For each omics wide association, results are provided in a data frame with 6 columns: feature_name: name of the omics feature estimate: the model estimate for the feature. For linear models, this is the beta: for logistic models, this is the log odds. se: Standard error of the estimate p_value: p-value for the estimate adjusted_pval: FDR adjusted p-value threshold: Marginal significance, based on unadjusted p-values
# Load Example Data data("example_data") # Get names of omics colnames_omic_fts <- colnames(example_data)[grep("feature_", colnames(example_data))][1:10] # Meet in the middle with a dichotomous outcome res <- meet_in_middle(df = example_data, exposure = "exposure1", outcome = "disease1", omics = colnames_omic_fts, covars = c("age", "sex"), outcome_family = "binomial") # Meet in the middle with a continuous outcome res <- meet_in_middle(df = example_data, exposure = "exposure1", outcome = "weight", omics = colnames_omic_fts, covars = c("age", "sex"), outcome_family = "gaussian") # Meet in the middle with a continuous outcome and no covariates res <- meet_in_middle(df = example_data, exposure = "exposure1", outcome = "weight", omics = colnames_omic_fts, outcome_family = "gaussian")
# Load Example Data data("example_data") # Get names of omics colnames_omic_fts <- colnames(example_data)[grep("feature_", colnames(example_data))][1:10] # Meet in the middle with a dichotomous outcome res <- meet_in_middle(df = example_data, exposure = "exposure1", outcome = "disease1", omics = colnames_omic_fts, covars = c("age", "sex"), outcome_family = "binomial") # Meet in the middle with a continuous outcome res <- meet_in_middle(df = example_data, exposure = "exposure1", outcome = "weight", omics = colnames_omic_fts, covars = c("age", "sex"), outcome_family = "gaussian") # Meet in the middle with a continuous outcome and no covariates res <- meet_in_middle(df = example_data, exposure = "exposure1", outcome = "weight", omics = colnames_omic_fts, outcome_family = "gaussian")
Implements an omics wide association study with the option of using the 'omics data as either the dependent variable (i.e., for performing an exposure –> 'omics analysis) or using the 'omics as the independent variable (i.e., for performing an 'omics –> outcome analysis). Allows for either continuous or dichotomous outcomes, and provides the option to adjust for covariates.
owas( df, var, omics, covars = NULL, var_exposure_or_outcome, family = "gaussian", confidence_level = 0.95, conf_int = FALSE, ref_group = NULL, test_data_quality = TRUE )
owas( df, var, omics, covars = NULL, var_exposure_or_outcome, family = "gaussian", confidence_level = 0.95, conf_int = FALSE, ref_group = NULL, test_data_quality = TRUE )
df |
Dataset |
var |
Name of the variable or variables of interest- this is usually
either an exposure variable or an outcome variable. Can be either
continuous or dichotomous. For dichotomous variables, must set |
omics |
Names of all omics features in the dataset |
covars |
Names of covariates (can be NULL) |
var_exposure_or_outcome |
Is the variable of interest an exposure (independent variable) or outcome (dependent variable)? Must be either "exposure" or "outcome" |
family |
"gaussian" (default) for linear models (via lm) or "binomial" for logistic (via glm) |
confidence_level |
Confidence level for marginal significance (defaults to 0.95, or an alpha of 0.05) |
conf_int |
Should Confidence intervals be generated for the estimates?
Default is FALSE. Setting to TRUE will take longer. For logistic models,
calculates Wald confidence intervals via |
ref_group |
Reference category if the variable of interest is a character or factor. If not, can leave empty. |
test_data_quality |
If TRUE (default), then code will ensure that the variance of all variables in the analysis is greater than 0 after dropping any missing data. |
A data frame with 6 columns: feature_name: name of the omics feature estimate: the model estimate for the feature. For linear models, this is the beta; for logistic models, this is the log odds. se: Standard error of the estimate test_statistic: t-value p_value: p-value for the estimate adjusted_pval: FDR adjusted p-value threshold: Marginal significance, based on unadjusted p-values
# Load Example Data data("example_data") # Get names of omics colnames_omic_fts <- colnames(example_data)[grep("feature_", colnames(example_data))][1:10] # Get names of exposures expnms = c("exposure1", "exposure2", "exposure3") # Run function with one continuous exposure as the variable of interest owas(df = example_data, var = "exposure1", omics = colnames_omic_fts, covars = c("age", "sex"), var_exposure_or_outcome = "exposure", family = "gaussian") # Run function with multiple continuous exposures as the variable of interest owas(df = example_data, var = expnms, omics = colnames_omic_fts, covars = c("age", "sex"), var_exposure_or_outcome = "exposure", family = "gaussian") # Run function with dichotomous outcome as the variable of interest owas(df = example_data, var = "disease1", omics = colnames_omic_fts, covars = c("age", "sex"), var_exposure_or_outcome = "outcome", family = "binomial")
# Load Example Data data("example_data") # Get names of omics colnames_omic_fts <- colnames(example_data)[grep("feature_", colnames(example_data))][1:10] # Get names of exposures expnms = c("exposure1", "exposure2", "exposure3") # Run function with one continuous exposure as the variable of interest owas(df = example_data, var = "exposure1", omics = colnames_omic_fts, covars = c("age", "sex"), var_exposure_or_outcome = "exposure", family = "gaussian") # Run function with multiple continuous exposures as the variable of interest owas(df = example_data, var = expnms, omics = colnames_omic_fts, covars = c("age", "sex"), var_exposure_or_outcome = "exposure", family = "gaussian") # Run function with dichotomous outcome as the variable of interest owas(df = example_data, var = "disease1", omics = colnames_omic_fts, covars = c("age", "sex"), var_exposure_or_outcome = "outcome", family = "binomial")
Implements an omics wide association study for matched case control studies using conditional logistic regression. For this function, the variable of of interest should be a dichotomous outcome, and the strata is the variable indicating the matching.
owas_clogit( df, cc_status, cc_set, omics, covars = NULL, confidence_level = 0.95, conf_int = FALSE, method = "efron", test_data_quality = TRUE )
owas_clogit( df, cc_status, cc_set, omics, covars = NULL, confidence_level = 0.95, conf_int = FALSE, method = "efron", test_data_quality = TRUE )
df |
Dataset |
cc_status |
Name of the variable indicating case control status. Must be either 0/1 or a factor with the first level representing the reference group. |
cc_set |
Name of the variable indicating the case control set. |
omics |
Names of all omics features in the dataset reference group. |
covars |
Names of covariates (can be NULL) |
confidence_level |
Confidence level for marginal significance (defaults to 0.95, or an alpha of 0.05) |
conf_int |
Should Confidence intervals be generated for the estimates?
Default is FALSE. Setting to TRUE will take longer. For logistic models,
calculates Wald confidence intervals via |
method |
method used the correct (exact) calculation in the
conditional likelihood or one of the approximations. Default is "efron".
Passed to |
test_data_quality |
If TRUE (default), then code will ensure that the variance of all variables in the analysis is greater than 0 after dropping any missing data. |
A data frame with 6 columns: feature_name: name of the omics feature estimate: the model estimate for the feature. For linear models, this is the beta; for logistic models, this is the log odds. se: Standard error of the estimate test statistic: t-value p_value: p-value for the estimate adjusted_pval: FDR adjusted p-value threshold: Marginal significance, based on unadjusted p-values
Omics wide association study using quantile-based g-Computation (as described by Keil et al., (2019) doi:10.1289/EHP5838) to examine associations of exposure mixtures with each individual 'omics feature as an outcome 'omics data as either the dependent variable. Allows for either continuous or dichotomous outcomes, and provides the option to adjust for covariates.
owas_qgcomp( df, expnms, omics, covars = NULL, q = 4, confidence_level = 0.95, family = "gaussian", rr = TRUE, run.qgcomp.boot = TRUE, test_data_quality = TRUE )
owas_qgcomp( df, expnms, omics, covars = NULL, q = 4, confidence_level = 0.95, family = "gaussian", rr = TRUE, run.qgcomp.boot = TRUE, test_data_quality = TRUE )
df |
Dataset |
expnms |
Name of the exposures. Can be either continuous or
dichotomous. For dichotomous variables, must set |
omics |
Names of all omics features in the dataset |
covars |
Names of covariates (can be NULL) |
q |
NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. Defaults to 4If NULL, then qgcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation). |
confidence_level |
Confidence level for marginal significance (defaults to 0.95, or an alpha of 0.05) |
family |
Currently only "gaussian" (default) for linear models (via lm) or "binomial" for logistic. Default is "gaussian". |
rr |
see |
run.qgcomp.boot |
Should the model be fit with qgcomp.boot? See package qgcomp.boot for details. Default is TRUE. Setting to FALSE decreases computational time. |
test_data_quality |
If TRUE (default), then code will ensure that the variance of all variables in the analysis is greater than 0 after dropping any missing data. |
A data frame with the following columns: feature: name of the omics feature psi: the model estimate for the feature. For linear models, this is the beta; for logistic models, this is the log odds. lcl_psi: the lower confidence interval. ucl_psi: the upper confidence interval. p_value: p-value for the estimate test_statistic: t-statistic for psi coefficient adjusted_pval: FDR adjusted p-value threshold: Marginal significance, based on unadjusted p-values covariates: the names of covariates in the model, if any coef_exposure: the individual coefficient of each exposure
# Load Example Data data("example_data") # Get names of omics colnames_omic_fts <- colnames(example_data)[grep("feature_", colnames(example_data))][1:5] # Names of exposures in mixture exposure_names = c("exposure1", "exposure2", "exposure3") # Run function without covariates out <- owas_qgcomp(df = example_data, expnms = exposure_names, omics = colnames_omic_fts, q = 4, confidence_level = 0.95) # Run analysis with covariates out <- owas_qgcomp(df = example_data, expnms = c("exposure1", "exposure2", "exposure3"), covars = c("weight", "age", "sex"), omics = colnames_omic_fts, q = 4, confidence_level = 0.95)
# Load Example Data data("example_data") # Get names of omics colnames_omic_fts <- colnames(example_data)[grep("feature_", colnames(example_data))][1:5] # Names of exposures in mixture exposure_names = c("exposure1", "exposure2", "exposure3") # Run function without covariates out <- owas_qgcomp(df = example_data, expnms = exposure_names, omics = colnames_omic_fts, q = 4, confidence_level = 0.95) # Run analysis with covariates out <- owas_qgcomp(df = example_data, expnms = c("exposure1", "exposure2", "exposure3"), covars = c("weight", "age", "sex"), omics = colnames_omic_fts, q = 4, confidence_level = 0.95)
Creates a volcano plot based on ggplot using the results from the
owas
function.
volcano_owas( df, annotate_ftrs = TRUE, annotation_p_threshold = 0.05, highlight_adj_p = TRUE, highlight_adj_p_threshold = 0.05, horizontal_line_p_value = 0.05 )
volcano_owas( df, annotate_ftrs = TRUE, annotation_p_threshold = 0.05, highlight_adj_p = TRUE, highlight_adj_p_threshold = 0.05, horizontal_line_p_value = 0.05 )
df |
output from |
annotate_ftrs |
Should features be annotated with the feature name? Default is TRUE. If necessary can change the p_value_threshold as well. |
annotation_p_threshold |
If |
highlight_adj_p |
Should features which meet a specific adjusted p-value threshold be highlighted? Default is TRUE. |
highlight_adj_p_threshold |
If |
horizontal_line_p_value |
Set the p-value for the horizontal line for the threshold of significance. |
A ggplot figure
data("example_data") # Get names of omics colnames_omic_fts <- colnames(example_data)[ grep("feature_", colnames(example_data))][1:5] # Run function with continuous exposure as the variable of interest owas_out <- owas(df = example_data, var = "exposure1", omics = colnames_omic_fts, covars = c("age", "sex"), var_exposure_or_outcome = "exposure", family = "gaussian") vp <- volcano_owas(owas_out)
data("example_data") # Get names of omics colnames_omic_fts <- colnames(example_data)[ grep("feature_", colnames(example_data))][1:5] # Run function with continuous exposure as the variable of interest owas_out <- owas(df = example_data, var = "exposure1", omics = colnames_omic_fts, covars = c("age", "sex"), var_exposure_or_outcome = "exposure", family = "gaussian") vp <- volcano_owas(owas_out)