Namespace:    kundajelab

PI: Anshul Kundaje
Institution: Stanford University
Project description:

Deep learning models in genomics:
Exploring new deep learning architectures to improve the classification accuracy of deep learning models in genomics. More generally, work focuses on leveraging deep learning for genomics in conjunction with interpretation techniques to extract novel insights about regulatory genomics.
Decoding regulatory DNA sequence in keratinocyte differentiation:
Development and differentiation are biological processes that involve cascades of transcription factors interacting with dynamic chromatin landscapes to produce cell-type specific transcriptional programs. Epidermal differentiation, in which a self-renewing progenitor keratinocyte becomes a terminally differentiated keratinocyte, is well suited for studying fine-grained changes in chromatin and transcription and addressing fundamental questions about the dynamic combinatorial logic of regulation. To answer these questions, genomic profiling of transcriptional state (using 3’ RNA-seq) and chromatin state (using ATAC-seq and ChIP-seq on histone marks) was captured at 12 hour intervals across 6 days of in vitro differentiation of primary keratinocytes. We inferred transcriptional and epigenetic trajectories across time to elucidate dynamically coordinated modules of genes and regulatory elements. We then developed deep, multi-task convolutional neural networks to learn predictive DNA sequence drivers of chromatin dynamics. To discover motifs and coordinated motif sets (grammars) from the neural net, we used backpropagation methods to derive nucleotide level importance scores in regulatory elements across time that are then used to extract grammars that are predictive of accessibility. We use these grammars in conjunction with expression and chromosome conformation assays to annotate functional modules that define known and novel differentiation programs. The resulting framework provides a generalizable approach to dissecting dynamic maps of combinatorial regulation encoded in DNA sequence.

Software: Conda, Keras, TensorFlow, Scikit-learn, SciPy

Back to namespaces