Statistics and Machine Learning

R for Big Data

Bioconductor | R package development | Tidyverse


July 2016 – Present
Connecticut, USA

Full Computational Scientist

The Jackson Laboratory

Responsibilities include:

  • Leading the development of a computational pipeline to analyze hybrid-capture long-read sequencing data.
  • Analysis of genomic datasets, such as RNA-seq, ATAC-seq and Pacbio long-read SMRT-seq.
  • Apply Machine Learning to identify RNA-based neoantigens for cancer immunotherapy.
  • Mentoring of lab members including students and postdocs.
  • Teaching R, Bioconductor and Genomics to scientists at JAX.
October 2012 – June 2016
Montreal, Canada

Bioinformatics Postdoctoral Fellow

Institute for Immunology and Cancer Research, University of Montreal


  • I worked on the genomics of T-acute lymphoblastic leukemia (T-ALL), a prevalent childhood blood cancer.
  • Analysis of genomics data such as RNA-seq, Exome-seq and ChIP-seq.
  • Analysis of mass spectrometry data (proteomics).
  • Developed a methodology to identify active regulators from RNA-seq and ChIP-seq datasets, which lead to the discovery of a network of genes that initiate T cell leukemia.
  • Developed a Shiny app to study pathway regulation in ChIP-seq data.

August 2007 – July 2012
Houston, USA

Fulbright Doctoral Fellow

University of Texas MD Anderson Cancer Center


  • I developed methods to characterize the transcriptional and non-coding RNA networks in M. Tuberculosis (Mtb) during macrophage infection.
  • Participated in TB PANNET Consortium (EU/USA), an international effort that carried out a comprehensive identification of non-coding RNAs in Mtb using a combination of custom microarrays and RNA-seq.
  • Development of methods to identify targets and pathways regulated by cis-encoded and trans-encoded non-coding RNAs.

March 2006 – May 2007
Rio de Janeiro, Brazil


National Laboratory for Scientific Computing


  • Reconstruction of Transcriptional Networks in Escherichia coli.
  • Developed a neural network to predict motif structures in transcriptional networks.
  • Analysis of microarrays for gene expression.

March 2004 – February 2006
Recife, Brazil

Masters Research

Center for Informatics, Federal University of Pernambuco


  • Application of Bayesian networks and partial correlation analysis for reconstruction of gene regulatory networks (protein-DNA interactions) in yeast
  • Analysis of microarrays for gene expression.
June 2002 – February 2004
Florianopolis, Brazil

Undergraduate Researcher Fellowship

Genomic Engineering Group, Federal University of Santa Catarina


  • Design and implementation of computational tools for modeling of metabolic and regulatory pathways using RDF graphs.
  • Design and implementation of object-oriented XML databases for genomic data.
August 2000 – December 2000
Florianopolis, Brazil

Teaching assistant

Mathematics Department, Federal University of Santa Catarina


  • Linear Algebra and Analytical Geometry.

Selected Publications

The Journal of Clinical Investigation, 2018

The Journal of Clinical Investigation, 2016

PLoS Pathogens, 2012

Recent Publications

More Publications

. Ribonuclease inhibitor 1 regulates erythropoiesis by controlling GATA1 translation. The Journal of Clinical Investigation, 2018.

DOI F1000 Prime recommended

. High-throughput screening in niche-based assay identifies compounds to target preleukemic stem cells. The Journal of Clinical Investigation, 2016.

DOI Eureka Alert eCancer News

. The {LMO}2 oncogene regulates {DNA} replication in hematopoietic cells. PNAS, 2016.


. SCL, LMO1 and Notch1 reprogram thymocytes into self-renewing cells. PLoS Genetics, 2014.


. An Unexpected Role for Ribonuclease Inhibitor (RNH1) in Erythropoiesis. Blood, 2014.



Here are the students I’ve mentored recently at JAX.


  • Qianchang Wang, Academic Year Intern at the Jackson Laboratoy. Project: A novel method for clustering and quantification of cancer isoforms detected using hybrid-capture long read sequencing.


  • Mary Accurso, Co-Op Associate/Summer Intern at the Jackson Laboratoy.
    Project: Prediction of soluble and membrane isoforms in the immune transcriptome sequenced using PacBio long read technology.


  • maser R/Bioconductor package.
    This package provides functionalities for analysis, annotation and visualizaton of alternative splicing events.




  • 860 837 2103
  • diogo.veiga
  • 10 Discovery Drive, The Jackson Laboratory for Genomic Medicine, Connecticut, 06032, USA