Somatic mutations play an important role in cancer development and disease progression, and mutational profiling is used far more commonly than other ‘omics’ analyses in clinical practice. With the development of PD-1 immunotherapy technology, many experiments show that patients with high TMB have been shown more clinical benefits of immunotherapy (PD-1/PD-L1). However, to date, WES remains confined to research settings, due to the high cost of the large genomic space sequenced. In the clinical setting, instead, targeted enrichment panels (gene panels) of various genomic sizes are emerging as the routine technology for TMB assessment. This package attempts to develop a new pathway-based gene panel for TMB assessment (pathway-based tumor mutational burden, PTMB), using somatic mutations files in an efficient manner from either TCGA sources or any in-house studies as long as the data is in MAF format. Besides, we develop a multiple machine learning method using sample PTMB profiles to identify cancer-specific dysfunction pathways, which can be a biomarker of prognostic and predictive for cancer immunotherapy.
MAF files contain many fields ranging from chromosome names to cosmic annotations. However, most of the analysis in PTMB uses the following fields.
Complete specification of MAF files can be found on NCI GDC documentation page.
The pathwayTMB package is a Bioinformatics tool to
identify Cancer-specific pathways. And pathwayTMB
functions can be categorized into mainly Visualization and Analysis
modules. Each of these functions and a short description is summarized
as shown below:
1.Obtain survival benefit mutations frequency matrix.
2.Calculate the length of encoded genes.
3.Infer pathway-based tumor mutational burden profiles (PTMB).
4.Identify Cancer-specific dysfunction pathways.
5.Visualization results:
5.1 Plot Patients’ Kaplan-Meier Survival Curves.
5.2 Plot patient-specific dysfunction pathways and user-interested
geneset mutually exclusive and co-occurrence plots.
5.3 Plot patient-specific dysfunction pathways’ waterfall plots.
5.4 Plot the ROC curve.
The function get_gene_length can extract coding
genes’ length from gene transfer format (GTF) file. The GTF file can
download from GENCODE.
To identify cancer-specific pathways, PTMB requires
sample mutation data with survival information. For each pathway,
PTMB is defined as pathway tumor mutational burden
corrected by genes’ length and number, the calculation formula is as
follows:
$$PTMB_{ij}=\frac{\sum_{k=1}^N\frac{mut_{ijk}}{length_{jk}}}{N}\tag{1}$$
where mutij
is the i th sample’s number of
mutations of the k th gene in
the j th pathway; lengthjk
is the length of k th gene in
the j th pathway; N is the total number of genes in
pathway j.The PTMBij
denotes the pathway j tumor
mutational burden for sample i, which was calculated in the
context of survival-related genes’ mutation frequency matrix.
#perform the function `get_PTMB`
#PTMB_matrix<-get_PTMB(freq_matrix=mut_matrix,genesmbol=genesmbol,gene_path=gene_path)
#show the first six lines of PTMB matrix
head(PTMB_matrix)[1:6,1:6]
#> Pat03 Pat06 Pat07 Pat08
#> Hematopoietic cell lineage 0.0000000 0.00000 0.0000000 0
#> Complement and coagulation cascades 0.0000000 0.00000 4.1867281 0
#> Platelet activation 0.0000000 0.00000 2.1018828 0
#> Toll-like receptor signaling pathway 0.0000000 0.00000 0.8930421 0
#> NOD-like receptor signaling pathway 0.4252838 2.25137 2.8148339 0
#> RIG-I-like receptor signaling pathway 0.0000000 0.00000 2.6852846 0
#> Pat100 Pat101
#> Hematopoietic cell lineage 0.000000 0.000000
#> Complement and coagulation cascades 3.145643 0.000000
#> Platelet activation 3.938654 0.000000
#> Toll-like receptor signaling pathway 0.000000 0.000000
#> NOD-like receptor signaling pathway 1.962122 0.000000
#> RIG-I-like receptor signaling pathway 0.000000 3.281809
In order to identify cancer-specific dysfunction pathways, we develop
a multiple machine learning method using pathway-based tumor mutational
burden profiles. The multiple machine learning method consists of three
sequential steps, as follows:
The function get_final_signature in the
pathwayTMB package can implement the above
process.
# filter the survival-related pathways
#set.seed(1)
#final_character<-get_final_signature(PTMB=PTMB_matrix,sur=sur)
#view the final_character
final_character
#> a1
#> "Natural killer cell mediated cytotoxicity"
#> a4
#> "Antigen processing and presentation"
#Drawing Kaplan Meier Survival Curves using the final survival-related PTMB.
plotKMcurves(t(PTMB_matrix[final_character,]),sur=sur,returnAll = FALSE,risk.table = TRUE)
#a mutually exclusive co-occurrence chart showing the top 20 genes for mutation rates
gene_fre<-apply(mut_matrix,1,function(x){length(which(x!=0))/length(x)})
genes<-names(sort(gene_fre,decreasing = TRUE))[1:20]
plotMutInteract(freq_matrix=mut_matrix, genes=genes,returnAll = FALSE)
#pathways' mutually exclusive co-occurrence chart
plotMutInteract(freq_matrix=PTMB_matrix,genes=final_character,
nShiftSymbols =0.3,returnAll = FALSE)
#calculate the PTMB-related riskscore
riskscore<-plotKMcurves(t(PTMB_matrix[final_character,]),sur=sur,plots = FALSE)$risk_score
cut_off<-median(riskscore)
#draw an GenePathwayOncoplots
GenePathwayOncoplots(maffile=maf,freq_matrix =mut_matrix,risk_score=riskscore,
cut_off=cut_off,final_character=final_character,gene_path = gene_path,removeNonMutated = FALSE)
#> Drawing upto top 3 mutated pathways
#> -Reading
#> -Validating
#> -Silent variants: 26
#> -Summarizing
#> --Possible FLAGS among top ten genes:
#> MUC16
#> USH2A
#> -Processing clinical data
#> --Missing clinical data
#> -Finished in 0.161s elapsed (0.213s cpu)
#> -Reading
#> -Validating
#> -Silent variants: 26
#> -Summarizing
#> --Possible FLAGS among top ten genes:
#> MUC16
#> USH2A
#> -Processing clinical data
#> -Finished in 0.217s elapsed (0.274s cpu)
#> --Possible FLAGS among top ten genes:
#> MUC16
#> USH2A
#> -Processing clinical data
#get the path of samples' immunotherapy response data
res_path<- system.file("extdata","response.csv",package = "pathwayTMB")
response<-read.csv(res_path,header=TRUE,stringsAsFactors =FALSE,row.name=1)
plotROC(riskscore=riskscore,response=response,main="Objective Response",print.auc=TRUE,grid = TRUE)
#> Setting levels: control = 0, case = 1
#> Setting direction: controls > cases