Title: | Drug Target Set Enrichment Analysis |
---|---|
Description: | It is a novel tool used to identify the candidate drugs against a particular disease based on the drug target set enrichment analysis. It assumes the most effective drugs are those with a closer affinity in the protein-protein interaction network to the specified disease. (See Gómez-Carballa et al. (2022) <doi: 10.1016/j.envres.2022.112890> and Feng et al. (2022) <doi: 10.7150/ijms.67815> for disease expression profiles; see Wishart et al. (2018) <doi: 10.1093/nar/gkx1037> and Gaulton et al. (2017) <doi: 10.1093/nar/gkw1074> for drug target information; see Kanehisa et al. (2021) <doi: 10.1093/nar/gkaa970> for the details of KEGG database.) |
Authors: | Junwei Han [aut, cre, cph], Yinchun Su [aut] |
Maintainer: | Junwei Han <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.0.3 |
Built: | 2024-11-22 04:38:06 UTC |
Source: | https://github.com/hanjunwei-lab/dtsea |
The DTSEA implements a novel application to GSEA and extends the adoption of GSEA.
The Drug Target Set Enrichment Analysis (DTSEA) is a novel tool used to identify the most effective drug set against a particular disease based on the Gene Set Enrichment Analysis (GSEA).
The central hypothesis of DTSEA is that the targets of potential candidates for a specific disease (e.g., COVID-19) ought to be close to each other, or at least not so far away from the disease. The DTSEA algorithm determines whether a drug is potent for the chosen disease by the proximity between drug targets and the disease-related genes. Under the central hypothesis of DTSEA, the DTSEA consists of two main parts:
Evaluate the influence of the specific disease in the PPI network by the
random walk with restart algorithm.
To evaluate the influence, we compute the disease-node distance by using the
random walk with restart (RwR) algorithm, then rank the nodes reversely.
Evaluate the drug-disease associations based on GSEA.
The GSEA approach is adopted in this part to identify whether candidate drug
targets are disease-related (top) or disease-unrelated (bottom) on the human
PPI list. The specific disease gene list is normalized by the median and is
set zero as the arbitrary cutoff point to classify the relations manually.
In this package, we provide the example data, which is a small set of data to demonstrate the usage and the main idea behind DTSEA. We provide some extra data files, the real data we used in the DTSEA paper. The supplementary package is now on the GitHub. Anyone can obtain this package by the example code.
DTSEA
# if (!"devtools" %in% as.data.frame(installed.packages())$Package) # install.packages("devtools") # devtools::install_github("hanjunwei-lab/DTSEAdata")
# if (!"devtools" %in% as.data.frame(installed.packages())$Package) # install.packages("devtools") # devtools::install_github("hanjunwei-lab/DTSEAdata")
The function provides a reliable approach to generating a p0 vector.
calculate_p0(nodes, disease)
calculate_p0(nodes, disease)
nodes |
The |
disease |
The |
The resulting p0 vector.
library(DTSEA) library(dplyr) # Load the data data("example_disease_list", package = "DTSEA") data("example_drug_target_list", package = "DTSEA") data("example_ppi", package = "DTSEA") # Compute the p0 vector p0 <- calculate_p0(nodes = example_ppi, disease = example_disease_list) # You can decrease the order of the p0 to get the most affected nodes. p0 <- sort(p0, decreasing = TRUE) %>% names() %>% head(10) # If you have obtained the supplemental data, then you can compute the p0 # in the real data set # supp_data <- get_data(c("graph", "disease_related")) # p0 <- calculate_p0(nodes = supp_data[["graph"]], # disease = supp_data[["disease_related"]])
library(DTSEA) library(dplyr) # Load the data data("example_disease_list", package = "DTSEA") data("example_drug_target_list", package = "DTSEA") data("example_ppi", package = "DTSEA") # Compute the p0 vector p0 <- calculate_p0(nodes = example_ppi, disease = example_disease_list) # You can decrease the order of the p0 to get the most affected nodes. p0 <- sort(p0, decreasing = TRUE) %>% names() %>% head(10) # If you have obtained the supplemental data, then you can compute the p0 # in the real data set # supp_data <- get_data(c("graph", "disease_related")) # p0 <- calculate_p0(nodes = supp_data[["graph"]], # disease = supp_data[["disease_related"]])
Computes Cronbach's alpha
cronbach.alpha(data)
cronbach.alpha(data)
data |
A data frame or matrix contains n subjects * m raters. |
The Cronbach's alpha (unstandardized)
library(DTSEA) library(tibble) # Load the data data <- tribble(~x, ~y, ~z, 1, 1, 2, 5, 6, 5, 7, 8, 4, 2, 3, 2, 8, 6, 5) # Run Cronbach's alpha cat(cronbach.alpha(data))
library(DTSEA) library(tibble) # Load the data data <- tribble(~x, ~y, ~z, 1, 1, 2, 5, 6, 5, 7, 8, 4, 2, 3, 2, 8, 6, 5) # Run Cronbach's alpha cat(cronbach.alpha(data))
The DTSEA function determines whether a drug is potent for a specific disease by the proximity between its targets and the disease-related genes.
DTSEA( network, disease, drugs, rwr.pt = 0, sampleSize = 101, minSize = 1, maxSize = Inf, nproc = 0, eps = 1e-50, nPermSimple = 5000, gseaParam = 1, verbose = TRUE )
DTSEA( network, disease, drugs, rwr.pt = 0, sampleSize = 101, minSize = 1, maxSize = Inf, nproc = 0, eps = 1e-50, nPermSimple = 5000, gseaParam = 1, verbose = TRUE )
network |
The human protein-protein interactome network. It should be or be preconverted before being inputted in DTSEA. |
disease |
The disease-related nodes. |
drugs |
The drug-target long format dataframe. It includes at least columns with the drug_id and drug_target. |
rwr.pt |
The random walk p0 vector. Set it to 0 if you wish DTSEA automatically compute it, or you can provide your predetermined p0 vector. |
sampleSize |
The size of a randomly selected gene collection, where size = pathwaySize |
minSize |
Minimal set of a drug set to be tested. |
maxSize |
Maximal set of a drug set to be tested. |
nproc |
The CPU workers that fgsea would utilize. |
eps |
The boundary of calculating the p value. |
nPermSimple |
Number of permutations in the simple fgsea implementation for preliminary estimation of P-values. |
gseaParam |
GSEA parameter value, all gene-level statistics are raised to the power of 'gseaParam' before calculating of GSEA enrichment scores. |
verbose |
Show the messages |
The resulting dataframe consists of drug_id
, pval
, padj
,
log2err
, ES
, NES
, size
, and leadingEdge
.
library(dplyr) library(DTSEA) # Load the data data("example_disease_list", package = "DTSEA") data("example_drug_target_list", package = "DTSEA") data("example_ppi", package = "DTSEA") # Run the DTSEA and sort the result dataframe by normalized enrichment scores # (NES) result <- DTSEA( network = example_ppi, disease = example_disease_list, drugs = example_drug_target_list, verbose = FALSE ) %>% arrange(desc(NES)) # Or you can utilize the multi-core advantages by enable nproc parameters # on non-Windows operating systems. ## Not run: result <- DTSEA( network = example_ppi, disease = example_disease_list, drugs = example_drug_target_list, nproc = 10, verbose = FALSE ) ## End(Not run) # We can extract the significantly NES > 0 drug items. result %>% filter(NES > 0 & pval < .05) # Or we can draw the enrichment plot of the first predicted drug. fgsea::plotEnrichment( pathway = example_drug_target_list %>% filter(drug_id == slice(result, 1)$drug_id) %>% pull(gene_target), stats = random.walk(network = example_ppi, p0 = calculate_p0(nodes = example_ppi, disease = example_disease_list) ) ) # If you have obtained the supplemental data, then you can do random walk # with restart in the real data set # supp_data <- get_data(c("graph", "disease_related", "example_ppi")) # result <- DTSEA(network = supp_data[["graph"]], # disease = supp_data[["disease_related"]], # drugs = supp_data[["drug_targets"]], # verbose = FALSE)
library(dplyr) library(DTSEA) # Load the data data("example_disease_list", package = "DTSEA") data("example_drug_target_list", package = "DTSEA") data("example_ppi", package = "DTSEA") # Run the DTSEA and sort the result dataframe by normalized enrichment scores # (NES) result <- DTSEA( network = example_ppi, disease = example_disease_list, drugs = example_drug_target_list, verbose = FALSE ) %>% arrange(desc(NES)) # Or you can utilize the multi-core advantages by enable nproc parameters # on non-Windows operating systems. ## Not run: result <- DTSEA( network = example_ppi, disease = example_disease_list, drugs = example_drug_target_list, nproc = 10, verbose = FALSE ) ## End(Not run) # We can extract the significantly NES > 0 drug items. result %>% filter(NES > 0 & pval < .05) # Or we can draw the enrichment plot of the first predicted drug. fgsea::plotEnrichment( pathway = example_drug_target_list %>% filter(drug_id == slice(result, 1)$drug_id) %>% pull(gene_target), stats = random.walk(network = example_ppi, p0 = calculate_p0(nodes = example_ppi, disease = example_disease_list) ) ) # If you have obtained the supplemental data, then you can do random walk # with restart in the real data set # supp_data <- get_data(c("graph", "disease_related", "example_ppi")) # result <- DTSEA(network = supp_data[["graph"]], # disease = supp_data[["disease_related"]], # drugs = supp_data[["drug_targets"]], # verbose = FALSE)
The list was integrated the significantly differentially expressed genes (DEGs) of GEO dataset GSE183071 and the work from Feng, Song, Guo, and et al.
example_disease_list
example_disease_list
An object of class character
of length 63.
Gómez-Carballa A, Rivero-Calle I, Pardo-Seco J, Gómez-Rial J, Rivero-Velasco C, Rodríguez-Núñez N, Barbeito-Castiñeiras G, Pérez-Freixo H, Cebey-López M, Barral-Arca R, Rodriguez-Tenreiro C, Dacosta-Urbieta A, Bello X, Pischedda S, Currás-Tuala MJ, Viz-Lasheras S, Martinón-Torres F, Salas A; GEN-COVID study group. A multi-tissue study of immune gene expression profiling highlights the key role of the nasal epithelium in COVID-19 severity. Environ Res. 2022 Jul;210:112890. doi: 10.1016/j.envres.2022.112890. Epub 2022 Feb 22. PMID: 35202626; PMCID: PMC8861187.
Feng S, Song F, Guo W, Tan J, Zhang X, Qiao F, Guo J, Zhang L, Jia X. Potential Genes Associated with COVID-19 and Comorbidity. Int J Med Sci. 2022 Jan 24;19(2):402-415. doi: 10.7150/ijms.67815. PMID: 35165525; PMCID: PMC8795808.
library(DTSEA) data("example_disease_list", package = "DTSEA")
library(DTSEA) data("example_disease_list", package = "DTSEA")
Drug-target interactions were downloaded and integrated from DrugBank and ChEMBL.
example_drug_target_list
example_drug_target_list
A data frame with 970 rows and 3 variables:
drug_id
: the DrugBank ID
drug_name
: the name of each drug
gene_target
: the targets of drugs
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018 Jan 4;46(D1):D1074-D1082. doi: 10.1093/nar/gkx1037. PMID: 29126136; PMCID: PMC5753335.
Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibrián-Uhalte E, Davies M, Dedman N, Karlsson A, Magariños MP, Overington JP, Papadatos G, Smit I, Leach AR. The ChEMBL database in 2017. Nucleic Acids Res. 2017 Jan 4;45(D1):D945-D954. doi: 10.1093/nar/gkw1074. Epub 2016 Nov 28. PMID: 27899562; PMCID: PMC5210557.
library(DTSEA) data("example_drug_target_list", package = "DTSEA")
library(DTSEA) data("example_drug_target_list", package = "DTSEA")
We extracted the gene functional interaction network from multiple sources with experimental evidence and then integrated them.
example_ppi
example_ppi
An igraph object
Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 2021 Jan 8;49 (D1):D545-D551. doi: 10.1093/nar/gkaa970. PMID: 33125081; PMCID: PMC7779016.
library(DTSEA) data("example_ppi", package = "DTSEA")
library(DTSEA) data("example_ppi", package = "DTSEA")
Computes the Kendall's coefficient of concordance.
kendall.w(raw, correct = TRUE)
kendall.w(raw, correct = TRUE)
raw |
A data frame or matrix contains n subjects * m raters. |
correct |
Logical. Indicates whether the W should be corrected for ties within raters. |
The resulting list consists of title
, kendall.w
, chisq
, df
,
pval
, report
.
library(DTSEA) library(tibble) # Load the data data <- tribble(~x, ~y, ~z, 1,1,2, 5,6,5, 7,8,4, 2,3,2, 8,6,5) # Run Kendall's W print(kendall.w(data)$report)
library(DTSEA) library(tibble) # Load the data data <- tribble(~x, ~y, ~z, 1,1,2, 5,6,5, 7,8,4, 2,3,2, 8,6,5) # Run Kendall's W print(kendall.w(data)$report)
The random graph was retrieved from Menche et al (2015).
random_graph
random_graph
An igraph object
Menche J, Sharma A, Kitsak M, Ghiassian SD, Vidal M, Loscalzo J, Barabási AL. Disease networks. Uncovering disease-disease relationships through the incomplete interactome. Science. 2015 Feb 20;347(6224):1257601. doi: 10.1126/science.1257601. PMID: 25700523; PMCID: PMC4435741.
library(DTSEA) data("random_graph", package = "DTSEA")
library(DTSEA) data("random_graph", package = "DTSEA")
Function random.walk
is supposed to implement the original
Random Walk with Restart (RwR) on the input graph. If the seeds (i.e., a set
of starting nodes) are given, it intends to calculate the affinity score of
all nodes in the graph to the seeds.
random.walk( network, p0, edge_weight = FALSE, gamma = 0.7, threshold = 1e-10, pt.post.processing = "log", pt.align = "median", verbose = FALSE )
random.walk( network, p0, edge_weight = FALSE, gamma = 0.7, threshold = 1e-10, pt.post.processing = "log", pt.align = "median", verbose = FALSE )
network |
The input graph object. It should be either an igraph object or an edge list matrix / data frame. |
p0 |
The starting vector on time t0. |
edge_weight |
Logical to indicate whether the input graph contains weight information. |
gamma |
The restart probability used for RwR. The |
threshold |
The threshold used for RwR. The |
pt.post.processing |
The way to scale the |
pt.align |
The way to normalize the output |
verbose |
Show the progress of the calculation. |
pt
vector
library(DTSEA) # Load the data data("example_disease_list", package = "DTSEA") data("example_drug_target_list", package = "DTSEA") data("example_ppi", package = "DTSEA") # Perform random walk p0 <- calculate_p0(nodes = example_ppi, disease = example_disease_list) pt <- random.walk(network = example_ppi, p0 = p0) # Perform GSEA analysis # .... # If you have obtained the supplemental data, then you can do random walk # with restart in the real data set # supp_data <- get_data(c("graph", "disease_related", "example_ppi")) # p0 <- calculate_p0(nodes = supp_data[["graph"]], # disease = supp_data[["disease_related"]]) # pt <- random.walk(network = supp_data[["example_ppi"]], # p0 = p0)
library(DTSEA) # Load the data data("example_disease_list", package = "DTSEA") data("example_drug_target_list", package = "DTSEA") data("example_ppi", package = "DTSEA") # Perform random walk p0 <- calculate_p0(nodes = example_ppi, disease = example_disease_list) pt <- random.walk(network = example_ppi, p0 = p0) # Perform GSEA analysis # .... # If you have obtained the supplemental data, then you can do random walk # with restart in the real data set # supp_data <- get_data(c("graph", "disease_related", "example_ppi")) # p0 <- calculate_p0(nodes = supp_data[["graph"]], # disease = supp_data[["disease_related"]]) # pt <- random.walk(network = supp_data[["example_ppi"]], # p0 = p0)
Calculates the separation of two sets of nodes on a network. The metric is calculated as in Menche et al. (2015).
separation(graph, set_a, set_b)
separation(graph, set_a, set_b)
graph |
The input graph object. It should be either an igraph object or an edge list matrix/data frame. |
set_a |
The first gene set |
set_b |
The second gene set |
The separation and distance measurement of the specified two modules.
library(DTSEA) # Load the data data("random_graph", package = "DTSEA") # Compute the separation metric separation <- separation( graph = random_graph, set_a = c("4", "6", "8", "13"), set_b = c("8", "9", "10", "15", "18") ) cat(separation, "\n")
library(DTSEA) # Load the data data("random_graph", package = "DTSEA") # Compute the separation metric separation <- separation( graph = random_graph, set_a = c("4", "6", "8", "13"), set_b = c("8", "9", "10", "15", "18") ) cat(separation, "\n")