State Key Laboratory of Digital Medical Engineering, Southeast
University, China
Laboratoire d’Optique et Biosciences (LOB), Ecole
Polytechnique, France
G-quadruplexes (G4s) are non-canonical secondary structures that usually form in G-rich regions (Varshney et al. 2020). Bioinformatic studies have revealed the preferential enrichment of G4s in human functional regions, including promoters, telomeres, and enhancers (Huppert and Balasubramanian 2007; Phan and Mergny 2002; Zhang et al. 2024). Notably, G4s arise from specific sequence motifs that enable their structure formation. These motifs can be disrupted by single nucleotide variants (SNVs) — the most common genetic variants.
G4SNVHunter
is designed to evaluate the impact of SNVs
on G4 formation, but it can also assess insertions and deletions
(indels) and other short nucleotide variants. This package leverages one
of the most popular G4 structure prediction algorithms, G4Hunter (Bedrat, Lacroix, and Mergny 2016),
conceptualized by Mergny et al. (Bedrat,
Lacroix, and Mergny 2016), to evaluate the propensity of G4
formation before and after the introduction of variants. Users can then
design ‘wet’ experiments based on G4SNVHunter
output to explore and validate how variants affect biological functions
through their impact on G4 structures.
As the first step, you need to install G4SNVHunter
,
which can be done with a simple command, please ensure that your network
connection is stable.
# Currently, install from github is recommended,
# as it provides the latest version.
devtools::install_github("rongxinzh/G4SNVHunter", dependencies = TRUE)
NOTE: alternatively, G4SNVHunter
can be
installed from Bioconductor. However, we currently do not
recommend using the Bioconductor release version, as significant
updates have been made that will not be included until the second half
of 2025.
Then load G4SNVHunter
to access its functions.
library(G4SNVHunter)
During this tutorial, we need to use a few additional packages. Since
we specified dependencies = TRUE
when installing
G4SNVHunter
package, these additional packages have already
been installed.
We can load them directly.
library(BSgenome.Hsapiens.UCSC.hg19)
library(GenomicRanges)
library(DT)
library(rtracklayer)
library(dplyr)
The first step is to load your variant file into a
GRanges
object.
# Example VCF file; both VCF and MAF are supported
vcf_file <- system.file("extdata", "example_variants_chr16.vcf", package = "G4SNVHunter")
# Load VCF; file_type must be specified
variants <- loadVariant(vcf_file, file_type = "vcf")
## VCF loading summary:
## - Initial variants: 5000
## - Filtered out by type: 0
## - Final variants retained: 5000
seqlevels(variants) <- paste0("chr", seqlevels(variants))
Then, load your sequence file into a DNAStringSet
object.
# Sequence file; chromosome 16 (hg19) as an example
hg19 <- BSgenome.Hsapiens.UCSC.hg19
chr16_seq <- DNAStringSet(hg19$chr16)
# Chromosome names are needed for analysis
names(chr16_seq) <- "chr16"
Next step is to detect G4s based on your sequence object.
# Predict G4s
G4_detected <- G4HunterDetect(chr16_seq)
## Start processing...
## Now process: chr16.
## Merging predicted G4s...
## Done!
Afterwards, we can calculate the impact of variants on G4 formation.
# Predict variants impact on G4 formation
result <- G4VarImpact(G4 = G4_detected,
variants = variants,
ref_col = "REF",
alt_col = "ALT")
## Start processing...
## Processing finished!
Since most of the variants may have limited impact on G4 formation, we should filter the prediction results.
# Filter for those with substantial changes in structural formation
filtered_mutG4s <- filterVarImpact(result,
mut_score_threshold = 1.2,
score_diff_threshold = -0.35)
print(filtered_mutG4s)
## GRanges object with 4 ranges and 13 metadata columns:
## seqnames ranges strand | G4.info.score G4.info.max_score
## <Rle> <IRanges> <Rle> | <numeric> <numeric>
## [1] chr16 57500155-57500179 + | 1.56 1.56
## [2] chr16 28507628-28507653 - | -1.58 -1.56
## [3] chr16 47215047-47215069 - | -1.70 -1.52
## [4] chr16 55153357-55153382 - | -1.62 -1.64
## G4.info.sequence variant.info.ranges.start variant.info.ranges.end
## <character> <integer> <integer>
## [1] GGGTGGAATGTGAGGTTTGG.. 57500178 57500178
## [2] CCCTTTCCCCAAACAGCCCC.. 28507644 28507644
## [3] CGCCGCTGCCCCACCGCCCC.. 47215063 47215063
## [4] CACCCCGCGCTCACACCCCC.. 55153360 55153360
## variant.info.ranges.width variant.info.ID variant.info.REF variant.info.ALT
## <integer> <character> <character> <character>
## [1] 1 hsa4564360 G A
## [2] 1 hsa4476984 C G
## [3] 1 hsa4516572 C G
## [4] 1 hsa4551412 C A
## mutated.G4.seq mutated.G4.anno.seq mutated.max_score score.diff
## <character> <character> <numeric> <numeric>
## [1] GGGTGGAATGTGAGGTTTGG.. GGGTGGAATGTGAGGTTTGG.. 1.12 -0.44
## [2] CCCTTTCCCCAAACAGGCCC.. CCCTTTCCCCAAACAG[C>G.. -1.16 -0.40
## [3] CGCCGCTGCCCCACCGGCCC.. CGCCGCTGCCCCACCG[C>G.. -1.16 -0.36
## [4] CACACCGCGCTCACACCCCC.. CAC[C>A]CCGCGCTCACAC.. -1.20 -0.44
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Perfectly, we have now obtained the potential variants that can disrupt G4 formation.
Let’s visualize an example. Since this G4 is on the minus strand, we
specify the keep_gstrand
as TRUE
to ensure
that the displayed sequence corresponds to the G-rich strand.
plotImpactedG4(filtered_mutG4s[3], keep_gstrand = TRUE)
To run G4SNVHunter
, you need to provide two inputs:
the genomic sequences (a DNAStringSet
object)
the set of variants (a GRanges
object)
While the input formats for these data are quite flexible, they must
ultimately be converted into the appropriate formats:
DNAStringSet
for sequences and GRanges
for
variants.
The initial design of G4SNVHunter
only focused on SNVs;
however, starting from version 1.1.0
, it also supports
indels, MNVs, and other simple variants.
Detailed descriptions on how to prepare these inputs are provided below.
The genomic sequences refer to the chromosome sequences or fragments
where the variants are located. These can be entire chromosomes or large
segments extracted from them. G4SNVHunter
will predict G4
structures from the provided sequences and assess whether these variants
can affect their formation.
Please note that the variant coordinates must be relative to the
genomic sequence you provide and should be in 1-based
coordinates.
The genomic sequences must be formatted as a
DNAStringSet
object. For convenience, we provide the
built-in loadSequence
function to facilitate this process.
Alternatively, Alternatively, you may load a custom sequence file and
convert it into a DNAStringSet
object without using the
loadSequence
function.
The loadSequence
function accepts three input types:
A two-column data.frame
, with the first column
containing the sequence identifiers and the second column containing
their corresponding sequences. This should be specified using the
genome_seq
parameter.
The path to a stored FASTA file. The FASTA file must have a
.fa
, .fna
, or .fasta
extension.
This should be specified using the seq_path
parameter.
A text file (.txt
) that stores the sequence
identifiers and their corresponding sequences. The first column should
contain the sequence identifiers, and the second column should contain
the sequences. This should also be specified using the
seq_path
parameter.
Note: Please do not include column names in the file!
Here are some examples for you to load sequences into a
DNAStringSet
object using loadSequence
function:
Load from a data.frame
object
seq_df <- data.frame(chr = c("seq1", "seq2"),
sequence = c(paste0(rep("G", 100), collapse = ""),
paste0(rep("A", 100), collapse = "")))
seq <- loadSequence(genome_seq = seq_df)
Load from a fasta
file
# File path to the sequences in fasta format
fa_path <- system.file("extdata", "seq.fa", package = "G4SNVHunter")
seq <- loadSequence(seq_path = fa_path)
Load from a txt
file
# File path to the sequences in txt format
txt_path <- system.file("extdata", "seq.txt", package = "G4SNVHunter")
seq <- loadSequence(seq_path = txt_path)
We can also retrieve genome sequences from Bioconductor Annotation Packages. While this is convenient, it requires you to install some related packages in advance. For example:
# Load sequence for chromosome 21 (hg19)
hg19 <- BSgenome.Hsapiens.UCSC.hg19
chr21_seq <- DNAStringSet(hg19$chr21)
# Chromosome names are needed for analysis
names(chr21_seq) <- "chr21"
The variant data needs to be processed into a GRanges
object. The loadVariant
function can be used to load the
VCF
or MAF
file into a GRanges
object.
# File path to the VCF file; file_type must be specified
vcf_file <- system.file("extdata", "example_variants_chr16.vcf", package = "G4SNVHunter")
# Load variants from the VCF file
var_vcf <- loadVariant(vcf_file, file_type = "vcf")
seqlevels(var_vcf) <- paste0("chr", seqlevels(var_vcf))
print(var_vcf)
## GRanges object with 5000 ranges and 3 metadata columns:
## seqnames ranges strand | ID REF ALT
## <Rle> <IRanges> <Rle> | <character> <DNAStringSet> <DNAStringSetList>
## hsa4580922 chr16 60712602 * | hsa4580922 T C
## hsa4511691 chr16 35270160 * | hsa4511691 C G,A
## hsa4420479 chr16 15115434 * | hsa4420479 C T
## hsa4402848 chr16 11905849 * | hsa4402848 G A
## hsa4567387 chr16 58072153 * | hsa4567387 C T
## ... ... ... ... . ... ... ...
## hsa4445679 chr16 21313291 * | hsa4445679 C A
## hsa4462089 chr16 25367052 * | hsa4462089 C T
## hsa4679986 chr16 80173446 * | hsa4679986 C A
## hsa4695174 chr16 82172241 * | hsa4695174 C T
## hsa4523500 chr16 49187596 * | hsa4523500 C T
## -------
## seqinfo: 1 sequence from an unspecified genome
Another example to load a MAF
file.
# File path to the MAF file; file_type must be specified
maf_file <- system.file("extdata", "example_variants_chr16.maf", package = "G4SNVHunter")
# Load variants from the MAF file
var_maf <- loadVariant(maf_file, file_type = "maf")
# To simplify the dataset, we can select only the columns we needed for our analysis.
var_maf <- subset(var_maf, select = c("Reference_Allele", "Tumor_Seq_Allele2"))
seqlevels(var_maf) <- paste0("chr", seqlevels(var_maf))
print(var_maf)
## GRanges object with 13 ranges and 2 metadata columns:
## seqnames ranges strand | Reference_Allele Tumor_Seq_Allele2
## <Rle> <IRanges> <Rle> | <character> <character>
## [1] chr16 3299712 * | A G
## [2] chr16 30004597 * | C T
## [3] chr16 31093071 * | G T
## [4] chr16 57918254 * | G T
## [5] chr16 618163 * | C G
## ... ... ... ... . ... ...
## [9] chr16 47732491 * | G A
## [10] chr16 83994365 * | C T
## [11] chr16 3107541 * | C A
## [12] chr16 29818528 * | C A
## [13] chr16 1308058 * | A C
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
If you have custom variant data, you can easily convert it to a
GRanges
object yourself. For example,
# Path to your custom variant file
snp_path <- system.file("extdata", "snp.txt", package = "G4SNVHunter")
# Load your variants into memory
snp <- read.table(snp_path, sep = "\t", header = FALSE)
# Convert them to GRanges
snp <- GRanges(seqnames = snp$V1,
ranges = IRanges(start = snp$V2, width = 1),
rsid = snp$V3,
ref = snp$V4,
alt = snp$V5)
print(snp)
## GRanges object with 15 ranges and 3 metadata columns:
## seqnames ranges strand | rsid ref alt
## <Rle> <IRanges> <Rle> | <character> <character> <character>
## [1] chr21 15397335 * | rs1244782663 G A
## [2] chr21 39931215 * | rs910271214 G A
## [3] chr21 22915361 * | rs1210166884 A G
## [4] chr21 42644592 * | rs554035836 A G
## [5] chr21 10743898 * | rs955290928 A G
## ... ... ... ... . ... ... ...
## [11] chr21 30836340 * | rs1212424980 T C
## [12] chr21 36426864 * | rs1478442431 G T
## [13] chr21 27584238 * | rs1191559536 T C
## [14] chr21 36106373 * | rs561521189 C T
## [15] chr21 21840835 * | rs1301205088 T C
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
You can directly predict G4s from a DNAStringSet
object
using the G4HunterDetect
function. This function is based
on the G4Hunter algorithm.
For example,
# Sequence file in fasta file format
fa_path <- system.file("extdata", "seq.fa", package = "G4SNVHunter")
# Load sequences
seq <- loadSequence(seq_path = fa_path)
# Predict G4s
G4_detected <- G4HunterDetect(seq)
## Start processing...
## Processing seq No.4/5 [==================>-----] 80% in 0sProcessing seq No.5/5 [========================] 100% in 0s
## Merging predicted G4s...
## Done!
Then, we can examine the prediction results, which are stored in a
GRanges
object.
You can use functions like print
to directly view the
results. However, in this instance, we’ll leverage the datatable
function
to view the predicted G4s, as this method provides a more user-friendly
display interface.
Let’s take a look at the G4s,
datatable(as.data.frame(G4_detected), options = list(scrollX = TRUE))
This function will return a GRanges
object containing
all potential G4s predicted by the G4Hunter algorithms. Fow each row, it
mainly includes:
The sequence identifier where the G4 is located
(seqnames
),
The position of the G4 relative to the sequence you provided
(ranges
, 1-based
coordinate),
The strand on which the G4 is located (strand
), with
+
indicating the G4 is on the positive strand, and
-
indicating it is on the negative strand,
The G4Hunter score for the entire G4 sequence
(score
), this G4 may represent a merged sequence of
multiple overlapping G4s.
The maximum G4Hunter score of the windows covering that G4 during
the execution of the G4Hunter algorithm (max_score
, used to
determine if the G4 surpasses the threshold),
The predicted G4 sequence on the positive strand
(sequence
, with a C-rich sequence means G4 on the reverse
strand).
Please note that score
represents the overall score of
the entire G4 sequence, which may include contributions from multiple
overlapping G4s. In contrast, max_score
indicates the
highest score within this region, calculated by sliding a window across
the sequence. Starting with version 1.1.0
, we will use the
change in max_score
to evaluate alterations in G4
formation.
See the illustration below.
Additionally, the prediction parameter settings are stored in the
global metadata
of the GRanges
object, which
can be accessed using:
# Predicting parameters
print(metadata(G4_detected))
## $threshold
## [1] 1.5
##
## $window_size
## [1] 25
##
## $include_sequences
## [1] TRUE
##
## $strands
## [1] "b"
Users can customize several parameters for prediction. For example,
# Predict G4s by customizing parameters
G4_detected <- G4HunterDetect(seq, threshold = 1.5, window_size = 20)
## Start processing...
## Merging predicted G4s...
## Done!
threshold
: G4Hunter will search for G4s in windows
above this threshold (absolute value). Default is 1.5
.
Unless there are special needs, we do not recommend setting the
threshold below 1.2
.
window_size
: The window size (bp) for G4Hunter
prediction. Default is 25
. Another commonly used window
size is 20
. However, 25
is generally
preferred.
include_sequences
: Whether to include the predicted
G4 sequences in the output. Default is TRUE
.
strands
: Indicate which strand requires prediction,
with b
for both strands and p
for positive
strand only. Please note that if your genome is single-stranded, this
parameter should be set to p
to prevent the program from
predicting G4s on a non-existent negative strand.
We generally do not recommend modifying certain parameters, such as
window_size
and threshold
, as their default
settings are already optimal.
We provide the exportG4
function for you to easily
export the predicted G4s. You don’t need to specify the file type, as
our program will determine it automatically; however, only
.txt
, .csv
, and .xlsx
formats are
supported.
For example, you can export the predicted G4s to a CSV file,
out_csv <- file.path(tempdir(), "results.csv")
# export as csv format
exportG4(G4_detected, out_csv, revcomp_antisense = FALSE)
or export to other formats as needed,
out_txt <- file.path(tempdir(), "results.txt")
out_xlsx <- file.path(tempdir(), "results.xlsx")
# export as .txt format
exportG4(G4_detected, out_txt, include_metadata = FALSE)
# export as .xlsx format
exportG4(G4_detected, out_xlsx)
You can use plotG4Info
function to visualize the
statistical information of G4s predicted by the
G4HunterDetect
function.
plotG4Info(G4_detected)
The left panels (A
, C
, E
)
display the overall distributions of the absolute values of
max_score
(absolute value), score
(absolute
value), and G4 length, respectively. The right panels (B
,
D
, F
) show the distributions of
max_score
, score
, and length separately for
G4s on the positive and negative strands. G4s on the positive strand
have positive scores, while those on the negative strand have negative
scores, resulting in a bimodal distribution for both
max_score
and score
. Please note that the sign
of the score simply denotes the G4’s location on either the positive or
negative strand; the absolute score value indicates its actual
G4-forming potential.
We provide two modes in G4VarImpact
function for users
to assess the potential impact of variants on G4 formation:
Single-Variant Mode (s
mode) -
assess the impact of each variant individually.
Multi-Variant Mode (m
mode) - for
each sample, if multiple variants are located within the same G4 region,
this mode will assess their combined effect on that G4.
In short, the single-variant mode assesses the impact of each variant on the G4 structure individually, whereas the multi-variant mode evaluates the combined effects of multiple variants on a given G4 in a sample-specific manner.
See the illustration below.
We have prepared the example variant data for human chromosome 21 (hg19). You can easily load them,
data(snv_gr)
You can quickly and conveniently predict the G4 sequences on
chromosome 21 using the G4HunterDetect
function.
# Predict the G4s in human chr 21 (hg19)
chr21_G4 <- G4HunterDetect(chr21_seq)
## Start processing...
## Now process: chr21.
## Merging predicted G4s...
## Done!
The default mode of G4VarImpact
is single-variant mode
(mode = 's'
), which requires only the G4 data in
GRanges
format (as returned by the
G4HunterDetect
function), the variant data in
GRanges
format, and the column names for the reference and
alternate bases.
Then, the assessment can be easily done by,
snv_eff_s <- G4VarImpact(chr21_G4,
snv_gr,
ref_col = "ref",
alt_col = "alt")
Let’s view the first three entries,
datatable(as.data.frame(snv_eff_s[1:3]), options = list(scrollX = TRUE))
This function will return a GRanges
object with detailed
information, including: the original G4s (G4.info.*), variants
(variant.info.*), the mutated sequence (mutated.G4.seq), the annotated
mutation sequence (mutated.G4.anno.seq), the new G4Hunter max_score
(mutated.max_score), and the score difference (score.diff).
Please be aware that G4s with no overlap to any of the provided variants will not be included in the output.
You can set the mode
parameter in
G4VarImpact
function to 'm'
to enable
multi-variant mode. This mode is particularly useful for specific
analyses, such as examining variants derived from cancer patients.
When using this mode, you must specify the column name for sample IDs
(sampleid_col
).
# Column names of the Sample ID and SNV ID must be specified
snv_eff_m <- G4VarImpact(chr21_G4,
snv_gr,
ref_col = "ref",
alt_col = "alt",
mode = "m",
sampleid_col = "sampleid")
Let’s view the first three entries,
datatable(as.data.frame(snv_eff_m[1:3]), options = list(scrollX = TRUE))
Under 'm'
mode, the G4VarImpact
function
will return a GRanges
object containing: the original G4s
(G4.info.*), the (combined) variant information (variant.info.*), the
mutated sequence with all variants incorporated (mutated.G4.seq), the
annotated mutation sequence (mutated.G4.anno.seq), the new G4Hunter
max_score (mutated.max_score), and the score difference
(score.diff).
Please note that in multi-variant mode, the combined impact of multiple variants within the same G4 region will be evaluated for each sample.
To illustrate, the following G4 is overlapped by two variants
(id_3608
and id_49857
) in sample
GQ3Y
, and consequently, the resulting mutant maximum
G4Hunter score is calculated based on their combined effect.
datatable(as.data.frame(snv_eff_m[528]), options = list(scrollX = TRUE))
The variant data may contain alternate alleles, In 's'
mode, G4SNVHunter
will separate these alleles and evaluate
each alternate form individually.
For example, let’s create the variants,
variants_aa <- GRanges(
seqnames = "CHR",
ranges = IRanges(start = c(2, 5, 13, 45), end = c(2, 5, 14, 47)),
strand = "*",
rsid = c("ID1", "ID2", "ID3", "ID4"),
sid = "samplex",
ref = c("G", "T", "GG", "GCT"),
alt = c("C", "C,G", "G", "G")
)
print(variants_aa)
## GRanges object with 4 ranges and 4 metadata columns:
## seqnames ranges strand | rsid sid ref alt
## <Rle> <IRanges> <Rle> | <character> <character> <character> <character>
## [1] CHR 2 * | ID1 samplex G C
## [2] CHR 5 * | ID2 samplex T C,G
## [3] CHR 13-14 * | ID3 samplex GG G
## [4] CHR 45-47 * | ID4 samplex GCT G
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
also create the G4,
test_seq <- "GGGATGGGATGTGGTAGGGATGCGGGTGACATCAGCTAGCATCAGCTACGA"
test_seq <- DNAStringSet(test_seq)
names(test_seq) <- "CHR"
test_G4 <- G4HunterDetect(test_seq)
## Start processing...
## Now process: CHR.
## Merging predicted G4s...
## Done!
print(test_G4)
## GRanges object with 1 range and 3 metadata columns:
## seqnames ranges strand | score max_score sequence
## <Rle> <IRanges> <Rle> | <numeric> <numeric> <character>
## [1] CHR 1-26 + | 1.58 1.52 GGGATGGGATGTGGTAGGGA..
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Let’s see what happens in 's'
mode,
G4_var_impact <- G4VarImpact(test_G4, variants_aa,
ref_col = "ref", alt_col = "alt")
## Start processing...
## Processing finished!
We have three variants overlapping the predicted G4, but obtained four evaluations because variant #ID2 is an alternate allele with two reference-to-alternate substitutions.
datatable(as.data.frame(G4_var_impact), options = list(scrollX = TRUE))
However, in 'm'
mode, the situation would become more
complex and different. In general, G4SNVHunter
will
consider all possible combinations within a particular sample.
G4_var_impact_m <- G4VarImpact(test_G4, variants_aa,
ref_col = "ref", alt_col = "alt",
mode = "m", sampleid_col = "sid")
## Start processing...
## Processing finished!
We will obtain two evaluations, which are coming from the combinations of mutant alleles:
[G>C]-[T>C](alt1)-[GG>G]
[G>C]-[T>G](alt2)-[GG>G]
Obviously, the combination of
[G>C]-[T>C](alt1)-[GG>G]
shows higher effect on G4
formation than [G>C]-[T>G](alt2)-[GG>G]
datatable(as.data.frame(G4_var_impact_m), options = list(scrollX = TRUE))
Given that some variants may have minimal effects on G4 formation, we need to filter out those variants that are worth investigating further for experimental validation or additional analysis.
We provide a convenient function, filterVarImpact
, to
help with this filtering process.
There are three threshold parameters for users to adjust:
raw_score_threshold
, mut_score_threshold
, and
score_diff_threshold
.
If raw_score_threshold
(a positive number, but no more
than 4) is specified, filterVarImpact
will filter out
entries where the absolute value of the original maximum G4Hunter score
is below this threshold.
If mut_score_threshold
(a positive number, but no more
than 4) is specified, filterVarImpact
will retain entries
where the maximum G4Hunter score of the mutated G4 sequences does not
exceed this threshold.
If score_diff_threshold
(a negative number, but no less
than -4) is specified, filterVarImpact
will retain entries
where the decrease in the maximum G4Hunter score after mutation exceeds
this threshold. For example, if
score_diff_threshold = -0.2
, those entries will be
retained: \(\left|
\text{G4HunteMaxScore}_{\text{mut_seq}} \right| - \left|
\text{G4HunterMaxScore}_{\text{raw_seq}} \right| \leq -0.2\)
We recommend setting raw_score_threshold
above 1.5 and
mut_score_threshold
below 1.2 (at least). This is an
empirically based guideline, and you are certainly free to adjust them
to be more stringent or to use these parameters for flexible
customization as needed.
Please note that you must specify at least one threshold parameter, but specifying all of them is not required.
For example, using the results returned in mode m
filtered_snv_eff <- filterVarImpact(snv_eff_m,
mut_score_threshold = 1.2,
score_diff_threshold = -0.35)
We can examine the filtered entries to identify variants that have substantial impact on G4 formation.
datatable(as.data.frame(filtered_snv_eff), options = list(scrollX = TRUE))
The mutant G4s can be exported using the exportMutG4
function. Similar to the exportG4
function, the file
extension can be
.txt
, .csv
, or .xlsx
. Please note
that explicit specification of the file extension is not required, as
the program will automatically determine it.
For example,
out_csv <- file.path(tempdir(), "mut_G4.csv")
# export as csv format
exportMutG4(filtered_snv_eff, out_csv)
You can use the plotVarImpact
function to visualize
changes in G4Hunter scores. It supports both single-variant
and multi-variant
mode outputs, including their filtered
results.
We can observe a general trend of decreased formation potential for the mutated G4s.
plotVarImpact(snv_eff_m)
Now, let’s take a look at how the G4Hunter scores have changed in the
filtered_snv_eff
object.
plotVarImpact(filtered_snv_eff)
Finally, you can use plotImpactedG4
to visualize the G4
sequences affected by the variants.
Please note that plotImpactedG4
supports only one entry
at a time. To visualize multiple entries, consider using a custom loop
(e.g., for
or lapply
).
plotImpactedG4(filtered_snv_eff[1])
The author would like to thank Dr. Wenyong Zhu and Dr. Xiao Sun from Southeast University for their contributions: Dr. Zhu for providing the illustrations and testing the package, and Dr. Sun for reviewing this Vignettes.
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.5
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Chicago
## tzcode source: internal
##
## attached base packages:
## [1] tools stats4 stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_1.1.4 DT_0.33
## [3] BSgenome.Hsapiens.UCSC.hg19_1.4.3 BSgenome_1.74.0
## [5] rtracklayer_1.66.0 BiocIO_1.16.0
## [7] Biostrings_2.74.1 XVector_0.46.0
## [9] GenomicRanges_1.58.0 GenomeInfoDb_1.42.3
## [11] IRanges_2.40.1 S4Vectors_0.44.0
## [13] BiocGenerics_0.52.0 G4SNVHunter_1.1.0
## [15] testthat_3.2.3
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.17.1 jsonlite_2.0.0
## [4] magrittr_2.0.3 GenomicFeatures_1.58.0 farver_2.1.2
## [7] rmarkdown_2.29 fs_1.6.6 zlibbioc_1.52.0
## [10] vctrs_0.6.5 memoise_2.0.1 Rsamtools_2.22.0
## [13] RCurl_1.98-1.17 ggpointdensity_0.1.0 RcppRoll_0.3.1
## [16] htmltools_0.5.8.1 S4Arrays_1.6.0 usethis_3.1.0
## [19] progress_1.2.3 curl_6.2.2 SparseArray_1.6.2
## [22] sass_0.4.10 bslib_0.9.0 htmlwidgets_1.6.4
## [25] desc_1.4.3 cachem_1.1.0 GenomicAlignments_1.42.0
## [28] mime_0.13 lifecycle_1.0.4 pkgconfig_2.0.3
## [31] Matrix_1.7-3 R6_2.6.1 fastmap_1.2.0
## [34] GenomeInfoDbData_1.2.13 MatrixGenerics_1.18.1 shiny_1.10.0
## [37] digest_0.6.37 AnnotationDbi_1.68.0 rprojroot_2.0.4
## [40] pkgload_1.4.0 crosstalk_1.2.1 RSQLite_2.3.11
## [43] labeling_0.4.3 httr_1.4.7 abind_1.4-8
## [46] compiler_4.4.2 remotes_2.5.0 bit64_4.6.0-1
## [49] withr_3.0.2 BiocParallel_1.40.2 viridis_0.6.5
## [52] DBI_1.2.3 pkgbuild_1.4.7 MASS_7.3-65
## [55] DelayedArray_0.32.0 sessioninfo_1.2.3 rjson_0.2.23
## [58] ggdensity_1.0.0 zip_2.3.2 httpuv_1.6.16
## [61] ggseqlogo_0.2 glue_1.8.0 restfulr_0.0.15
## [64] promises_1.3.2 grid_4.4.2 generics_0.1.4
## [67] isoband_0.2.7 gtable_0.3.6 tidyr_1.3.1
## [70] data.table_1.17.0 hms_1.1.3 pillar_1.10.2
## [73] later_1.4.2 lattice_0.22-7 bit_4.6.0
## [76] tidyselect_1.2.1 miniUI_0.1.2 knitr_1.50
## [79] gridExtra_2.3 SummarizedExperiment_1.36.0 xfun_0.52
## [82] Biobase_2.66.0 devtools_2.4.5 brio_1.1.5
## [85] matrixStats_1.5.0 stringi_1.8.7 UCSC.utils_1.2.0
## [88] yaml_2.3.10 evaluate_1.0.3 codetools_0.2-20
## [91] tibble_3.2.1 cli_3.6.5 xtable_1.8-4
## [94] jquerylib_0.1.4 dichromat_2.0-0.1 Rcpp_1.0.14
## [97] png_0.1-8 XML_3.99-0.18 parallel_4.4.2
## [100] ellipsis_0.3.2 ggplot2_3.5.2 blob_1.2.4
## [103] prettyunits_1.2.0 profvis_0.4.0 urlchecker_1.0.1
## [106] bitops_1.0-9 viridisLite_0.4.2 VariantAnnotation_1.52.0
## [109] scales_1.4.0 openxlsx_4.2.8 purrr_1.0.4
## [112] crayon_1.5.3 rlang_1.1.6 cowplot_1.1.3
## [115] KEGGREST_1.46.0