CanProVar

Description: CanProVar is designed to store and display single amino acid alterations including both germline and somatic variations in the human proteome, especially those related to the genesis or development of human cancer based on the published literatures. Cancer-related variations and corresponding annotations can be queried through the web-interface using Protein IDs in the Ensembl, IPI, RefSeq, and Uniport/Swiss-Prot databases or gene names and Entrez gene IDs. Fasta files with variation information are also available for download.

URLs:  http://canprovar.zhang-lab.org/ (version 1); http://canprovar2.zhang-lab.org/ (version 2)

Reference: Jing Li, Dexter T Duncan, Bing Zhang. CanProVar: a human cancer proteome variation database. Hum Mutat. 31(3):219-28, 2010.

CEA

Description: CEA (Complex-Enrichment Analysis) uses a protein interaction network-assisted approach to improve protein identification in shotgun proteomics. A large proportion of possible proteins are eliminated as a result of insufficient experimental evidence in shotgun proteomics data analysis. CEA can be used to rescue the eliminated proteins based on a simple assumption: possible proteins are more likely to be present in the original sample if they exist in a complex enriched with confidently identified proteins. In various data sets tested, CEA increased protein identification by 10-30 percent with an estimated accuracy of 85 percent.

URL: http://bioinfo.vanderbilt.edu/cea

Reference: Jing Li, Lisa J Zimmerman, Byung-Hoon Park, David L Tabb, Daniel C Liebler, Bing Zhang. Network-assisted protein identification and data interpretation in shotgun proteomics. Mol Syst Biol. 5:303, 2009.

customProDB

Description: customProDB is an R package that enables the easy generation of sample-specific protein databases from RNA-Seq data for proteomics search.

URL: http://www.bioconductor.org/packages/devel/bioc/html/customProDB.html

Reference: Xiaojing Wang, Bing Zhang. customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search. Bioinformatics. 29:3235-7, 2013.

GLAD4U

Description: The goal of Gene List Automatically Derived For You (GLAD4U) is to implement an algorithm searching the scientific literature (Pubmed) to retrieve the list of publications corresponding to a user’s query. The algorithm will then translate the list of publications into a list of genes referenced in these publications. The last step is to present the user with the list of prioritized genes, from the most to the least referenced genes within the search space.

URL: http://glad4u.zhang-lab.org

Reference: Jerome Jourquin, Dexter Duncan, Zhiao Shi, Bing Zhang. GLAD4U: deriving and prioritizing gene lists from PubMed literature. BMC Genomics. 13(Suppl 8):S20, 2012.

GPU-FAN

Description: Network analysis plays an important role in systems biology. However, network analysis algorithms are usually computationally intensive. Modern General Purpose computation on Graphics Processing Units (GPGPUs) provides a cost-effective platform for this type of applications. We have initiated a project to enable fast network analysis on GPUs. The first version of the software package gpu-fan (GPU-based Fast Analysis of Networks) includes methods for computing four shortest path-based centrality metrics on NVIDIA’s CUDA platform. Speedup of 10x ~ 50x was observed for simulated scale-free networks and real world protein interaction and gene co-expression networks.

URL: http://gpu-fan.zhang-lab.org

Reference: Zhiao Shi, Bing Zhang. Fast network centrality analysis using GPUs. BMC Bioinformatics. 12:149, 2011.

GOTM (GOTM has been merged into WebGestalt in 2010)

Description: GOTM is a Gene Ontology (GO) enrichment analysis tool. It compares a user uploaded gene list with all GO categories to identify those with enriched number of user uploaded genes. The result is visualized in a directed acyclic graph (DAG) in order to maintain the relationship among the enriched GO categories. It is designed for the quick analysis of gene lists generated from microarray, proteomics, and other large scale studies. 

URL: http://www.webgestalt.org

Reference: Bing Zhang, Denise Schmoyer, Stefan Kirov, Jay Snoddy. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinformatics. 5:16, 2004.

ICE

Description: The Iterative Clique Enumeration (ICE) algorithm identifies relatively independent co-expression modules from gene co-expression networks in order to facilitate further analyses of the transcriptional mechanisms encoded in the networks.

URL: http://ice.zhang-lab.org

Reference: Zhiao Shi, Catherine K Derow, Bing Zhang. Co-expression module analysis reveals biological processes, genomic gain, and regulatory mechanisms associated with breast cancer progression. BMC Systems Biology. 4:74, 2010.

NetGestalt

Description: NetGestalt is a data integration framework that allows simultaneous presentation of large-scale experimental and annotation data from various sources in the context of a biological network to facilitate data visualization, analysis, interpretation, and hypothesis generation.

URL: http://www.netgestalt.org 

Reference: Zhiao Shi, Jing Wang, Bing Zhang. NetGestalt: integrating multidimensional omics data over biological networks. Nature Methods. 10:597-8, 2013.

NetSAM

Description: NetSAM (Network Seriation and Modularization) is an R package that takes an edge-list representation of a network as an input and generates files that can be used as an input for the one-dimensional network visualization tool NetGestaltor other network analysis. NetSAM uses random walk distance-based hierarchical clustering to identify the hierarchical modules of the network (network modularization) and then uses the optimal leaf ordering (OLO) method to optimize the one-dimensional ordering of the genes in each module by minimizing the sum of the pair-wise random walk distance of adjacent genes in the ordering (network seriation).

URL: http://www.bioconductor.org/packages/release/bioc/html/NetSAM.html

Reference: Zhiao Shi, Jing Wang, Bing Zhang. NetGestalt: integrating multidimensional omics data over biological networks. Nature Methods. 10:597-8, 2013.

NetWalker

Description: NetWalker takes a network and a list of nodes from the network as input and calculates steady-state probabilities (final scores) for all nodes in the network based on the random walk technology. Statistical analysis is implemented to evaluate the significance of the final scores. Specifically, for each node, a global p value is calculated to evaluate the overall significance of the node with regard to the input nodes, while a local p value is calculated to ensure that the significance is not simply due to network topology.

URL: http://netwalker.zhang-lab.org

Reference: Bing Zhang, Zhiao Shi, Dexter T. Duncan, Naresh Prodduturi, Lawrence J Marnett, Daniel C Liebler. Relating protein adduction to gene expression changes: a systems approach. Molecular Biosystems. 2011.

WebGestalt

Description: WebGestalt is a “WEB-based GEne SeT AnaLysis Toolkit”. It is designed for functional genomic, proteomic and large-scale genetic studies from which large number of gene lists (e.g. differentially expressed gene sets, co-expressed gene sets etc) are continuously generated. WebGestalt incorporates information from different public resources and provides an easy way for biologists to make sense out of gene lists.

URL: http://www.webgestalt.org

References:
Bing Zhang, Stefan A. Kirov, Jay R. Snoddy. WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res. 33(Web Server issue), W741-8, 2005.

Jing Wang, Dexter Duncan, Zhiao Shi, Bing Zhang. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res. 41(Web Server issue), W77-83, 2013.