0. Overview

wGRN (http://wheat.cau.edu.cn/wGRN/) is a free-accessible interactive platform for guiding functional gene discovery using wheat integrative regulatory networks. The platform assembles regulatory relationships from large-scale functional datasets, and provides a series of versatile analysis tools for the community to discovery functional genes for crop improvement. We will update the platform regularly.


An updated annotation of the reference genome

The wGRN platform is implemented using the genome assembly IWGSC RefSeq v2.1 of Triticum aestivum cv. Chinese Spring.

To construct high-quality regulatory networks, we updated the IWGSC RefSeq v2.1 genome annotation and generated a new annotation, wGRN_v1, which contains 133,804 protein-coding genes, 12,022 lncRNAs, and 16,016 transcripts from unmapped sequences. Compare to the previous annotation, the wGRN_v1 annotation corrected 58,367 gene models, assigned 14,839 genes from low confidence to high confidence, and added 10,297 new genes. The wGRN_v1 annotation is based on the IWGSC RefSeq v2.1 assembly (Zhu et al., 2021; IWGSC 2018). A total of 474 RNA-seq data and 132 PacBio data were used.

Protein-coding gene: Three approaches, including ab initio prediction, homology search and transcriptomics data including RNA-seq and Iso-Seq data, were used to predict the protein-coding gene models. All gene model evidence from three approaches were combined using EVM and updated using PASA. The gene models with homology or expression evidence were retained. Gene models overlapping with TEs were excluded. We assigned the previous gene IDs for the gene models overlapping with the IWGSC RefSeq v2.1 annotation.

lncRNA: We used a strict identification pipeline to uncover lncRNAs in wheat. Stringtie and TACO was utilized to assemble the transcripts into a unified set of transcripts. The transcripts with an TPM of more than 0.5 and a length of more than 200 bp were retained. The transcripts overlapping with protein-coding genes and TEs were excluded. The Swiss-prot and wheat protein database, CPC, LGC, and CNCI were used to further remove the transcripts with protein-coding potential.

Novel sequence: We assembled transcripts from unmapped RNA-seq and Iso-Seq sequences. We first used Trinity to assemble unmapped sequences, and then used CD-HIT-EST to reduce redundancy. Transcripts with similar BLAST hits to the genome were further excluded. DeconSeq was further used to remove sequence contamination from human, bacteria, virus, and fungi genomes. The non-redundant transcript fragments are further merged using CAP3. Only transcript with expression evndence were retained.


The development of wGRN platform

We constructed a wheat integrative regulatory network (wGRN) by combining diverse complementary functional datasets including gene expression, sequence motif, transcription factor (TF) binding, chromatin accessibility, and evolutionarily conserved regulation. wGRN contains 7.2 million genome-wide interactions covering 5,947 TFs and 127,439 target genes. We implemented the wGRN platform for the community to explore regulation relationships and discovery phenotype-associated functional genes. The platform integrates multiple types of datasets, including integrative GRN, gene expression, gene annotation, eQTL, miRNA-target interaction, and epigenetic data and so on, into more than ten versatile analysis tools for the advancement of functional gene exploration.

Gene expression regulatory network:

GENIE3 network The GENIE3 network was constructed using the GENIE3 tool, based on feature selection with tree-based ensemble methods.

PCC network The PCC network was constructed using Pearson correlation coefficient. Targets for each TF were further filtered in the subsequent integration procedures.

cCOE network We introduced coexpression-based interactions in cereal crop rice using the PCC and GENIE3 method. The cCOE network was used to identify evolutionarily conserved regulation in wheat.

Physical regulatory network:

PWM network The PWM network was constructed by mapping non-redundancy and high-quality TF binding sites on an extended locus of 2.5 Kb upstream or downstream from the TSS of targets.

cPWM network Evolutionarily conserved TF binding sites were used to generate the cPWM network.

TF binding network DAP-seq data from urartu wheat or bread wheat were used to identify TF binding peaks.

Open chromatin network Chromatin accessibility is considered as the evidence of potential TF binding to targets.

Integrative inference of regulatory network We applied a weighted integration approach to combine seven independent networks and produce directed interaction scores for TF-target pairs. The interactions obtained by only the homolog-map approach were removed to reduce the false positives. All scores in independent networks were normalized to reflect the contribution of supporting evidence. A dynamic parameter α were introduced to optimize predictive performance for TF-target interactions.


The input and output of wGRN

To make users easy access to wGRN, we provided a flowchart to show what functional information wGRN can generate using different types of input data.


1. Homepage

1. The top navigation menu contains links to different analysis tools

2. Brief introduction about wGRN.

3. Quick links of four main tools, including Search, Regulator prediction, Pathway network, and QTG miner.

4. The footer shows the sponsors and website visit statistics. Please contact us if questions or feedbacks.


2. Search GRN

Using this tool, you can query regulation information of the interested gene.

1. In the input panel, you can select three search methods, including all interactions (both upstream and downstream interactions), upstream regulators, and downstream targets. If input the gene IDs in the IWGSC RefSeq v1.1 annotation, it will be automatically converted to that in the wGRN_v1 annotation. In addition, different parameter α were used to generate optimal network, strict network (less interactions), relaxed network (more interactions), and more relaxed network (most interactions). You can share this page by clicking on the button Share.

2. Gene information including gene ID, gene location, NCBI NR description, Swiss-prot description, transcription factor (TF) family, and external links. The gene ID and location were obtained from the wGRN_v1 annotation. NCBI (https://www.ncbi.nlm.nih.gov/) non-redundant (NR) and UniProt (https://www.uniprot.org/) Swiss-prot protein database were used to perform the functional annotation. PlantTFdb (http://planttfdb.gao-lab.org/) and iTAK (http://bioinfo.bti.cornell.edu/tool/itak) were used to identify TFs. In addition, this panel provides quick links to the TGT database (Chen et al., 2020) and the wGRN JBrowse.

3. Expression levels of genes across diverse tissues. A total of 407 RNA-seq data were used to analyse the expression level of genes in different tissues. The bar show mean and SE of expression levels (i.e. TPMs). The expression levels were calculated from Kallisto.

4. The table shows TF-target interactions, alongside function description of genes. Available TF motifs from Arabidopsis were also shown. The detailed desctiption of supporting evidence can be found in the ‘The integrative inference of wheat regulatory network’ section. A higher score represents higher confidence for the interaction.

5. GRN visualization. Colors represent different gene types (TF family or target).

6. Statistics of interactions supported by diverse interaction evidences, and degrees of nodes in GRN. Degrees reflect the importance of nodes (genes) within network. Network metrics were calculated using igraph standard functions.

7. Genome browser JBrowse. This panel shown five tracks, inlcuding the annotated gene model from wGRN_v1, . Click “Full-screen view” for more.

In addition, in each interaction table in wGRN, you can view more details of the selected regulation by clicking on each regulation. If you select a regulation (i.e. a row), a link will appear at the bottom. Then, you can click on the ‘View more details of the regulation’ button to view the comprehensive information of the regulation, including gene ID, location, annotation, family, links to other pages, supporting evidence, and binding sites. The binding site table contains the predicted sequences of the TF motif that occurred near the transcription start site of a target gene in a 2.5 kb upstream or downstream interval.


3. Regulator prediction

Using this tool, You can identify key regulators that function in regulating the expression of the input gene set. All interactions in wGRN are used to perform regulator enrichment analysis. For example, taking a set of differentially expressed genes in response to heat stress as input, HSF family could be significantly enriched.

1. Genes of interest. For example, the genes could be differentially expressed genes or the manually collected genes that are functionally related. This tool supports two ways to input genes.

2. Optional parameters. By default, this tool uses all annotated genes, and considers the regulator with a Padjust value of less than 0.01 and a odds ratio of more than 2 as the significant enrichment.

3. Regulator prediction result. Each row represent a predicted regulator. The regulators were sorted according to P value. The R package clusterProflier were used. The table can be downloaded by clicking the download button.

4. Visualization of the regulator enrichment. X axis represent -log10(FDR), and Y axis represents enrichment fold (odds ratio). Colors represents TF families.


4. Function inference

Using this tool, you can query the function prediction of each TF based on gene ontology (GO), plant trait ontology (TO), and plant ontology (PO) annotation of its targets. PO and TO annotations were acquired from the Planteome (https://planteome.org/), plantGSAD (http://systemsbiology.cau.edu.cn/PlantGSEAv2/), and Oryzabase (https://shigen.nig.ac.jp/rice/oryzabase/) database.

1. In this parameter panel, you can input a gene ID. Three types of function prediction, including GO, PO, and TO, were supported here.

2. This panel shows the annotated GO, TO, and PO term of the interested genes.

3. This panel shows the function prediction of the interested gene. Taking GO prediction as the example, lollipop plot shows enriched GO biological precesses. Color represents total number of genes. Only terms with FDR of less than 0.05 were considered as significant enrichment. Only the top 20 most significant terms were used to plot. In addition, the table shows all predicted terms. The user can download all predicted terms by clicking the download button.


5. Pathway network

Using this tool, you can query the regulatory network representing the interested GO, TO, or PO pathway. The pathway network representing “response to heat” GO term was shown here. The pathway networks were constructed using the annotated genes, the function prediction of TFs, and interactions in wGRN.

1. In this parameter panel, you can select a interested pathway from GO, TO, and PO prediction. 3,169 GO networks, 508 TO networks, and 214 PO networks are available here. If select the option ‘Only TF’, only the predicted TFs will be shown.

2. The TF list with function prediction in the pathway. Only the term with a P value of less than 0.05 was shown.

3. Visualization of the pathway network. Colors represent different TF families and targets.

4. The table shows TF-target interaction relationships, alongside information about function description of genes, TF motif, and supporting evidence.


6. Coexpression

Using this tool, you can perform coexpression analysis for the interested genes. Pearson correlation coefficient (PCC) will be calculated using the cor function in R. TPM values from 407 RNA-seq data will be used.

1. In the input panel, you can input a gene list and set a PCC cutoff to remove lowly correlated gene pairs. If you want to analyse unmapped transcripts from the wGRN_v1 annotation, you should select the ‘include transcript not in the genome’ option. The TPMs will be used.

2. The heatmap shows Pearson correlation coefficient among gene pairs.

4. The Table shows coexpressed gene pairs with PCCs of more than cutoff. The coexpressed gene pairs were sorted according to PCC values.

5. The scatter plot shows the correlation of gene expression between pair-wise genes.


7. Homoeolog traid

Using this tool, You can query expression and regulation bias among wheat homoeologs.

1. In this parameter panel, you can input gene ID, and selected to analyse upstream TFs or downstream targets of the interest gene.

2. Gene IDs in homoeolog group. Homoeolog triad list were inferred using RBH table from GeneTribe (Chen et al., 2020).

3. Regulation bias among homoeologs. The plot shows relative total number of interactions of the interested gene. Only the interactions supported by at least two homoeologs will be used. The visualization method was referred to (Ramírez-González et al., 2018)

4. Expression bias among homoeologs. A total of 407 RNA-seq data were used. All expression data were presented as mean. The relative expression among homoeologs were shown.

5. Pairwise comparsion for interactions of homoeologs. The higher the Jaccard similarity, the more similar the two sets of regulation. P value to decide the enrichment significance is calculated through hypergeometric distribution.

6. Counts of homoeologs that interactes with the target.


8. QTL miner

Using this tool, you can prioritize candidate genes for gene mapping studies including GWAS, map-based cloning, and BSA. Taking QTLs from drought stress GWAS study (Mao et al., 2022) and a set of known genes responsing to drought stress as input, the tools accurately prioritized candidates.

1. In the input panel, you can input QTL regions. If select to input QTL regions from the IWGSC RefSeq v1.1 assembly, all gene models in the IWGSC RefSeq v1.1 annotation will be transferred to gene models in the wGRN_v1 annotation by homology search and then used for subsequent analysis. In addition, the QTL-related genes should be genes functionally related to the genes within QTL regions. In this example, the inputted QTL-related genes are drought responsive genes. GeneTribe was used to perform homology inference.

2. Scores, node degrees, QTG location, and functional annotation of predicted QTG (quantitative trait gene). The node degree reflect the importance of genes from QTL regions. The score were calculated as (degree-min_degree)/(max_degree-min_degree).

3. Chromosome distribution and priority score of QTGs. Green, A subgenome. Purple, B subgenome. Orange, D subgenome.

4. The table shows interaction between QTGs and QTL-related genes.


9. Gene browser

Using this tool, You can quickly search annotation of the interested genes. GeneTribe was used to perform homology inference.

1. Search gene information by keyword. The novel transcripts from unmapped sequences can also be supported here.

2. Search gene information by genome location.

3. Gene information table containing gene ID, location, NCBI NR description, Swiss-prot description, TF family, rice homolog, rice description, Arabidopsis homolog, Arabidopsis description.


10. ID conversion

Using this tool, you can submit one or more gene IDs to obtain its corresponding IDs in different annotation versions of wheat cv. Chinese Spring. This tool supports the gene models from wGRN_v1, IWGSC RefSeqv2.1, IWGSC RefSeq v1.1, IWGSC RefSeq v1.0, TGAC v1, and IWGSC v0p4. GeneTribe was used to perform homology inference.

1. Input gene IDs.

2. gene IDs in different annotation versions of wheat cv. Chinese Spring.


11. GRN extraction

Using this tool, you can obtain regulatory networks from the input gene.

1. In the input panel, you can input gene list of interest. In addition, different parameter α were used to obtained optimal network, relaxed network (more interactions), and strict network (less interactions).

2. GRN plot. Colors represent different gene families (TF famililes or target).

3. The table shows TF-target interaction relationships, alongside information about function description of genes, TF motif, and supporting evidence.


12. GRN comparsion

Using this tool, you can compare interactions and GRNs of two genes.

1. In this parameter panel, you can select the interaction type and input gene IDs. Two analysis methods, including upstream regulators and downstream targets, are supported here.

2. Statistics of shared interactions, P value to decide the enrichment significance is calculated through hypergeometric distribution. Share score is defined as: ( ( # shared genes ) / # targets_1 + ( # shared genes ) / # targets_2 ) / 2.

3. GRN plot showing shared edges between GRNs from two genes. Colors represent different gene types (TF famililes or target).

4. Visualization of the respective GRN. Colors represent different gene famililes (TF famililes or target).

5. The table contains TF-target interaction relationships, alongside information about function description of genes, TF motif, and supporting evidence.


13. eQTL

Using this tool, you can search for eQTL (expression quantitative trait loci) from 2-week-old seedlings and double-ridge spikes (He et al., 2022; Wang et al., 2022).

1. Search eQTL by gene ID or location. You can download the table by clicking the download button.

2. Full table of eQTL information, including gene, eQTL location, significance, eQTL type, and tissue.


14. miRNA-target interaction

Using this tool, you can search for miRNA-target interactions. The psRNATarget (https://www.zhaolab.org/psRNATarget/) and psRobot (http://omicslab.genetics.ac.cn/psRobot/) were used to predict interactions relationships. All gene models in the wGRN_v1 annotation and wheat miRNAs from sRNAanno database (http://www.plantsrna.org/) were used as input.

1. Search miRNA-target interactions by gene ID or miRNA ID.

2. Full table of miRNA-target interaction information, including miRNA, target, alignment, and supporting evidence from two tools.


15. TF enrichment

Using this tool, you can perform TF enrichment ananlysis. For example, taking the DEGs responsing to heat stress as the input, HSF family have most significantly enrichment.

1. Genes of interest.

2. Optional parameters including custom background genes, cutoff of significant value and odds ratio, and multi-test method.

3. TF enrichment analysis result. Each row represent a enriched TF. P value to decide the enrichment significance is calculated through hypergeometric distribution.

4. Visualization of TF enrichment. Y axis represent -log10(FDR). Colors represents gene numbers.


16. BLAST

Using this tool, you can search homologous sequences in the wGRN_v1 annotation. The blast database is constructed from mRNA, CDS, and protein sequences of the wGRN annotation. In addition, novel transcripts from unmapped sequences were also supported. The sequenceserver tool (https://sequenceserver.com/) was used.

1. In this panel, you can input sequences. The sequences can be of nucleotides or proteins.

2. Select one or more database to perform BLAST analysis. The tool will automatically judge a suitable tool according to the input sequence and the selected databases.

3. BLAST parameters. You can click the questionmark icon for the detailed description.


17. Expression

Using this tool, you can submit one or more gene IDs to query expression levels of genes in diverse tissues and developmental stages.

1. In this input panel, you can input gene IDs. Two normalization options are supported here. If you use gene models from unmapped transcripts in the wGRN_v1 annotation, you should select the ‘include transcript not in the genome’ option.

2. Information and references of RNA-seq samples. Three time-series transcriptome from developing grain, developing spike, and leaf senescence were used.

4. Visualization of expression patterns of genes across samples. A total of 407 RNA-seq data were used to show expression levels of genes across five tissues.


18. JBrowse

Using this tool, you can browse multi-omics data track of interested gene. Notably, we also implemented the wGRN_v1 annotation for users to screen interested gene models. In this example, wGRN_v1 corrected CDS and added UTRs of TraesCS1A03G0652000. The JBrowse tool (https://jbrowse.org/jbrowse1.html) was used here.

1. In the left panel, you can select tracks of interest, including chromatin accessibility (ATAC-seq/DNase-seq), histone modification (H3K27me3, H3K4me3, and H3K9ac ChIP-seq), and expression levels (RNA-seq). STAR and BWA were used to align sequencing data to the genome.

2. Right panel shows selected tracks.