Accelerating Data Driven Discovery

The Multi-Omics Data Analysis group assists with analysis of data generated by the major technology platforms of the CPRIT Core Facility, including mass spectrometry (MS) metabolomics, MS-based proteomics and reverse phase protein array (RPPA) proteomics. In addition, we accept and processes high-throughput sequencing data including:

  • Coding and noncoding transcriptomics: RNA-Seq, smallRNA-Seq
  • Single cell transcriptomics scRNA-Seq
  • Cistrome: ChIP-Seq, Reduced Representation Bisulfite Sequencing (RRBS), and Whole Genome Bisulfite Sequencing (WGBS)
  • Genomics: Whole Genome Sequencing (WGS) or Whole Exome sequencing (WES)

Using a consistent data work-flow, we assist with Primary/Tier 1 data analysis in close collaboration with the Core Facility leadership. Additionally, we provide as independent support services Integrative/Tier 2 analysis for multiple omics data to lead to systems biology level insight, and also to generate robust testable hypotheses. Integrative analysis can be further performed using data sets from national and international projects, such as The Cancer Genome Atlas (TCGA), Encode, the NIH Epigenomic Roadmap, International Human Epigenome Consortium (IHEC), The Metabolomics Workbench, as well as data sets from scientific community repositories such as the NIH Gene Expression Omnibus (GEO) and NIH Short Read Archive (SRA).  

Major Services

1. Consultation

The Multi-Omics Data Analysis Core provides consultation on multiple topics prior to analysis:

1)  Consultation on experimental design

2)  Consultation on integration of CPRIT and other core facilities data

3)  Consultation on integration of publicly available data

4)  After completion of analysis, review results with primary investigator

2. Primary analysis

The Multi-Omics Data Analysis group performs primary analysis for the following data types generated in the CPRIT funded cores:

  • MS Metabolomics and Lipidomics
  • MS Proteomics
  • RPPA Proteomics

In addition, it performs primary analysis for other data types generated in Baylor core labs:

  • Coding and noncoding transcriptomics: RNA-Seq, smallRNA-Seq
  • Single cell transcriptomics scRNA-Seq
  • Cistrome: ChIP-Seq, Reduced Representation Bisulfite Sequencing (RRBS), and Whole Genome Bisulfite Sequencing (WGBS)
  • Genomics: Whole Genome Sequencing (WGS) or Whole Exome sequencing (WES)

3. Integrative Data Analysis

Integrative analysis typically involves pathway enrichment, computed using methods specific to each omics platform. Further, integration of different omics technologies, such as metabolomics and transcriptomics, or cistromics and transcriptomics, can be carried out. In addition to investigator data generated at the Baylor core facilities, we can assist with analysis and integration using publicly available datasets.

4. Data Deposition

Raw and processed data will be deposited in repositories such as GEO, Metabolomics Workbench. The core will collect from investigators sample metadata, fill in XML-format metadata sheets, and deposit the data and the metadata in public data repositories. Our goals are to meet or sometimes exceed state or federal mandated data deposition standards, to better serve the scientific community as well as enable rigorous data analysis reproducibility.

5. Facilities

The bioinformatics analysis will be performed using the infrastructure of the Dan L Duncan Cancer Center Computing Facilities:  In addition to up to date desktop computers for all faculty and staff, which include both 32-bit and 64-bit personal computers for most bioinformatically oriented members, we have two major computing facilities -- one in the Breast SPORE facilities on the main BCM campus and one on the Energy Transfer Data Center approximately one mile away. Both sit inside the BCM firewall, have nightly offsite backups, are protected through non-aqueous fire suppression, have redundant power and, for high and moderate capability machines, are accessible via a 10g Ethernet switched local area network (LAN). The availability of two physically separated facilities dramatically improves availability by allowing for more rapid recovery from a disaster such as a fire or flood that incapacitates one facility. We have a 35 node high performance compute cluster (each with 2-8 cores each and newer nodes with 96 or 128 GB RAM) representing a total of 375 CPUs with 34 fast-access terabytes SAN storage for any high performance compute needs. We are in the process of expanding and substantially upgrading this capacity for 2013 with an extensible NetApp storage appliance. For archival storage, 10s-100s of TB can be readily leased at very low cost from BCM Information Technology. Four cluster nodes are set aside for interactive jobs; the remaining nodes are available for batch jobs. Queues are managed by Sun Grid Engine and the system itself is administered by an expert system architect with >10 years of experience in HPC. Access to these resources is supported by partial chargeback, commensurate with level of use. Because the cluster is self-contained (ie, located in one location without yet having an identical sister cluster offsite), the nodes themselves do not enjoy the full benefit of disaster recovery from both sites, whereas the archive storage does. Outside of HPC availability, there are three Sun Sunfire X4170 Virtualization Servers at SUDC and two Cisco UCS C210 M2 Virtualization Servers at Breast facilities. Servers at both sites use VMware for creation of virtual servers that can run any operating system with varying system requirements. Each location’s virtualization servers have attached 37TB NetApp storage with vMotion in place to manage failover of the virtual machines from one site to another, should disaster situations arise. In addition, there are two HP servers with direct-attached 96TB storage for Oracle 11g (backed up off-site nightly); four Sun physical servers with 37 terabytes of storage running the Solaris operating system; etc.

Charge Back Rates

Charge back rates for primary analysis of data generated by the CPRIT Core Facility are included as package in prices for each of the technology platforms. Charge back rates for secondary and integrative data analysis and other bioinformatics services are determined individually with  investigators based on number of samples and analysis type.

References (Core Supported Publications)


1. Suter MA, Aagaard KM, Coarfa C, Robertson M, Zhou G, Jackson BP, Thompson D, Putluri V, Putluri N, Hagan J, Wang L, Jiang W, Lingappan K, Moorthy B. Association between elevated placental polycyclic aromatic hydrocarbons (PAHs) and PAH-DNA adducts from Superfund sites in Harris County, and increased risk of preterm birth (PTB). Biochem Biophys Res Commun. 2019 Aug. PMID: 31208719

2. Bader DA, Hartig SM, Putluri V, Foley C, Hamilton MP, Smith EA, Saha PK, Panigrahi A, Walker C, Zong L, Martini-Stoica H, Chen R, Rajapakshe K, Coarfa C, Sreekumar A, Mitsiades N, Bankson JA, Ittmann MM, O'Malley BW, Putluri N, McGuire SE. Mitochondrial pyruvate import is a metabolic vulnerability in androgen receptor-driven prostate cancer. Nat Metab. 2019 Jan. PMID: 31198906

3. Vantaku V, Dong J, Ambati CR, Perera D, Donepudi SR, Amara CS, Putluri V, Ravi SS, Robertson MJ, Piyarathna DWB, Villanueva M, von Rundstedt FC, Karanam B, Ballester LY, Terris MK, Bollag RJ, Lerner SP, Apolo AB, Villanueva H, Lee M, Sikora AG, Lotan Y, Sreekumar A, Coarfa C, Putluri N. Multi-omics Integration Analysis Robustly Predicts High-Grade Patient Survival and Identifies CPT1B Effect on Fatty Acid Metabolism in Bladder Cancer. Clin Cancer Res. 2019 Jun. PMID: 30846479

4. Dasgupta S, Putluri N, Long W, Zhang B, Wang J, Kaushik AK, Arnold JM, Bhowmik SK, Stashi E, Brennan CA, Rajapakshe K, Coarfa C, Mitsiades N, Ittmann MM, Chinnaiyan AM, Sreekumar A, O'Malley BW. (2015). Coactivator SRC-2-dependent metabolic reprogramming mediates prostate cancer survival and metastasis. J Clin Invest. 125(3):1174-1188. PMID: 25664849. PMCID: PMC4362260.

5. Kettner NM, Voicu H, Finegold MJ, Coarfa C, Sreekumar A, Putluri N, Katchy CA, Lee C, Moore DD, Fu L. Circadian Homeostasis of Liver Metabolism Suppresses Hepatocarcinogenesis. Cancer Cell. 2016 Dec 12;30(6):909-924. PMID: 27889186.

6. Rundstedt FV, Kimal R, Ma J, Arnold J, Gohlke J, Putluri V, Krishnapuram R, Piyarathna DB, Lotan Y, Gödde D, Roth S, Störkel S, Levitt JM, Michailidis G, Lerner SP, Sreekumar A, Coarfa C, Putluri N.Integrative pathway analysis of metabolic signature in bladder cancer - a linkage to the Cancer Genome Atlas Project and prediction of survival. J Urol. 2016 Jan 20. PMID: 26802582. PMCID: PMC4693629

7. Park JH, Vithayathil S, Kumar S, Sung PL, Dobrolecki LE, Putluri V, Bhat VB, Bhowmik SK, Gupta V, Arora K, Wu D, Tsouko E, Zhang Y, Maity S, Donti TR, Graham BH, Frigo DE, Coarfa C, Yotnda P, Putluri N, Sreekumar A, Lewis MT, Creighton CJ, Wong LJ, Kaipparettu BA. Fatty Acid Oxidation-Driven Src Links Mitochondrial Energy Reprogramming and Oncogenic Properties in Triple-Negative Breast Cancer. Cell Rep. 2016 Mar 8;14(9):2154-65. PMID: 26923594. PMCID: PMC4809061

RPPA Proteomics

1. Jayaraman P, Parikh F, Newton JM, Hanoteau A, Rivas C, Krupar R, Rajapakshe K, Pathak R, Kanthaswamy K, MacLaren C, Huang S, Coarfa C, Spanos C, Edwards DP, Parihar R, Sikora AG. TGF-β1 programmed myeloid-derived suppressor cells (MDSC) acquire immune-stimulating and tumor killing activity capable of rejecting established tumors in combination with radiotherapy. Oncoimmunology. 2018 Jul. PMID: 30288358

2. Ware MJ, Nguyen LP, Law JJ, Krzykawska-Serda M, Taylor KM, Cao HST, Anderson AO, Pulikkathara M, Newton JM, Ho JC, Hwang R, Rajapakshe K, Coarfa C, Huang S, Edwards D, Curley SA, Corr SJ. A new mild hyperthermia device to treat vascular involvement in cancer surgery. Sci Rep. 2017 Sep. PMID:289001263.

3. Holdman XB, Welte T, Rajapakshe K, Pond A, Coarfa C, Mo Q, Huang S, Hilsenbeck SG, Edwards DP, Zhang X, Rosen JM. Upregulation of EGFR signaling is correlated with tumor stroma remodeling and tumor recurrence in FGFR1-driven breast cancer. Breast Cancer Res. 2015 Nov. PMID: 26581390

MS Proteomics

1. Fleet T, Zhang B, Lin F, Zhu B, Dasgupta S, Stashi E, Tackett B, Thevananther S, Rajapakshe KI, Gonzales N, Dean A, Mao J, Timchenko N, Malovannaya A, Qin J, Coarfa C, DeMayo F, Dacso CC, Foulds CE, O'Malley BW, York B. (2015). SRC-2 orchestrates polygenic inputs for fine-tuning glucose homeostasis. Proc. Natl. Acad. Sci. PMID: 26487680. PMCID: PMC4640775.


1. Fan Q, Mao H, Angelini A, Coarfa C, Robertson MJ, Lagor WR, Wehrens XHT, Martin JF, Pi X, Xie L. Depletion of Endothelial Prolyl Hydroxylase Domain Protein 2 and 3 Promotes Cardiomyocyte Proliferation and Prevents Ventricular Failure Induced by Myocardial Infarction. Circulation. 2019 Jul. PMID: 31356139

2. Hai L, Szwarc MM, Lonard DM, Rajapakshe K, Perera D, Coarfa C, Ittmann M, Fernandez-Valdivia R, Lydon JP. Short-term RANKL exposure initiates a neoplastic transcriptional program in the basal epithelium of the murine salivary gland. Cytokine. 2019 Nov. PMID: 31226438

3. Yuan X, Chang CY, You R, Shan M, Gu BH, Madison MC, Diehl G, Perusich S, Song LZ, Cornwell L, Rossen RD, Wetsel R, Kimal R, Coarfa C, Eltzschig HK, Corry DB, Kheradmand F. Cigarette smoke-induced reduction of C1q promotes emphysema. JCI Insight. 2019 May PMID: 31112138

4. Taneja G, Maity S, Jiang W, Moorthy B, Coarfa C, Ghose R. Transcriptomic profiling identifies novel mechanisms of transcriptional regulation of the cytochrome P450 (Cyp)3a11 gene. Sci Rep. 2019 Apr. PMID: 31040347

5. Villanueva H, Grimm S, Dhamne S, Rajapakshe K, Visbal A, Davis CM, Ehli EA, Hartig SM, Coarfa C, Edwards DP. The Emerging Roles of Steroid Hormone Receptors in Ductal Carcinoma in Situ (DCIS) of the Breast. J Mammary Gland Biol Neoplasia. 2018 Dec. PMID:30338425