hack4dnucleome.github.io/project_proposal.html at main · ruy204/hack4dnucleome.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
<!DOCTYPE html>

<html lang="en">
    <head>

        <link rel="stylesheet" type="text/css" href="templates/slate.css">
            <meta http-equiv="CONTENT-TYPE" content="text/html; charset=UTF-8">
                <meta http-equiv="X-UA-Compatible" content="IE=edge">
                    <meta name="viewport" content="initial-scale=1, width=device-width">
                        <meta content="4DNucleome Hackathon" name="description">
                            <meta content="Projects" name="author">

                                <!-- Global site tag (gtag.js) - Google Analytics -->
                                <script async src="https://www.googletagmanager.com/gtag/js?id=UA-169392828-1"></script>
                                <script>
                                    window.dataLayer = window.dataLayer || [];
                                    function gtag(){dataLayer.push(arguments);}
                                    gtag('js', new Date());
                                    gtag('config', 'UA-169392828-1');
                                </script>

                                <title>4D Nucleome Hackathon</title>
                                <style>
                                    a:unvisited {color: lightseagreen;}
                                    a:visited {color: lightseagreen;}
                                    a:link {color: lightseagreen;}
                                    a:hover {color: white;}
                                    b {color: lightseagreen;}
                                    body {font-family: 'Book Antiqua', Palatino, serif';font-size: 25px;}
                                </style>
                                <div class="jumbotron" style="display: flex; align-items: center;">
                                    <img src="4dn_logo_raster.png" style="border-radius: 50%; height: 100px; margin-right: 20px;">
                                    <div>
                                        <h1 style="font-size: 50px; margin: 0;">4D Nucleome Hackathon</h1>
                                        <hr>
                                    </div>
                                </div>
                                <div class="jumbotron">
                                <p style="font-size:18px", align="justify">
                                <a href="#Project 1">Project 1</a> &nbsp;&bull;&nbsp
                                <a href="#Project 2">Project 2</a> &nbsp;&bull;&nbsp
                                <a href="#Project 3">Project 3</a> &nbsp;&bull;&nbsp
                                <a href="#Project 4">Project 4</a> &nbsp;&bull;&nbsp
                                <a href="#Project 5">Project 5</a> &nbsp;&bull;&nbsp
                                <a href="#Project 6">Project 6</a> &nbsp;&bull;&nbsp
                                <a href="#Project 7">Project 7</a> &nbsp;&bull;&nbsp
                                <a href="index.html" target="_self" class="aaastandout"><b>Back to main</b></a>
                                </p>
                                </div>


    </head>

    <body role="document" style='font="Book Antiqua', Palatino, serif"'>

        <a name="Project 1" id="Project 1"><div class="jumbotron" name="Project 1" id="Project 1"><a></a>

                <div class="page-header">
                <h2 style="font-size:26px"><b>Project 1. Find sequence elements implicated in tissue-specific Hi-C contacts through machine learning.</b></h2>

                <p style="font-size:17px", align="justify">
                    <b>Leading lab</b>: Noble Lab at University of Washington <br>
                    <b>Stakeholders</b>: Anupama Jha (<a href = "mailto: anupamaj@uw.edu">anupamaj@uw.edu</a>), Justin Sanders (<a href = "mailto: jsander1@uw.edu">jsander1@uw.edu</a>), Gang Li (<a href = "mailto: gangliuw@uw.edu">gangliuw@uw.edu</a>) and William Stafford Noble (<a href = "mailto: wnoble@uw.edu">wnoble@uw.edu</a>) <br>
                    <b>Desired deliverable</b>:  A benchmark of current Hi-C machine learning models for finding known and novel sequence elements relevant for tissue-specific Hi-C contacts. <br>
                    <b>Expected coding experience level</b>: Intermediate-Advanced. <br>
                    <b>Motivation</b>: Two major classes of machine learning models exist for predicting Hi-C contacts.
The first model type predicts Hi-C contacts from DNA sequence alone, and it is evaluated on
held-out chromosomes in the same tissue. Such models can study the impact of sequence
variations on Hi-C contacts within a tissue. Some examples are Akita, Orca, and DeepC. The
second model type predicts Hi-C contacts by combining DNA sequence with other epigenetic
measurements like ATAC-seq, DNAse-seq and TF-ChIP-seq, and it can predict Hi-C contacts
on held-out chromosomes within a tissue and in new tissues conditional on the availability of the
epigenetic tracks. Some examples are epiphany and Origami. While the prediction performance
of these models has been thoroughly studied, evaluations regarding their interpretation,
especially in the context of capturing sequence elements relevant to tissue-specific gene
regulation, are lacking. <br>
                </p>

        </div></div></a>

        <a name="Project 2" id="Project 2"><div class="jumbotron" name="Project 2" id="Project 2"><a></a>

                <div class="page-header">
                <h2 style="font-size:26px"><b>Project 2. Inference of chromatin looping status from live cell imaging data. </b></h2>

                <p style="font-size:17px", align="justify">
                    <b>Leading lab</b>: Li lab at University of North Carolina, Chapel Hill (UNC-CH) and Hu lab at Cleveland Clinic <br>
                    <b>Stakeholders</b>: Hongyu Yu (<a href = "mailto: hongyuyu@email.unc.edu">hongyuyu@email.unc.edu</a>) and Shreya Mishra (<a href = "mailto: mishras10@ccf.org">mishras10@ccf.org</a>) <br>
                    <b>Desired deliverable</b>:  (1) Develop a computational pipeline to estimate the frequency and duration of chromatin looping status from live cell imaging data. (2) Characterize the cell-to-cell variability of the kinetics of chromatin looping events. (3) Compare chromatin looping dynamics among different genomic loci. <br>
                    <b>Expected coding experience level</b>: Basic familarity with Python and R. <br>
                    <b>Motivation</b>: The recently developed live cell imaging technology (PMID: 31124784, 30038397, 33310227) provides a powerful tool to study the kinetics of chromatin spatial organization in live cells, facilitating a deep understanding of chromatin folding dynamics and gene regulation. In contrast to rapid development of experimental technologies, tailored computational methods for analyzing live cell imaging data are still lacking. In the project, the goal is to develop a stand-alone, user-friendly computational pipeline to infer the underlying chromatin looping status from live cell imaging data. The software to be developed can be applied to characterize the temporal dynamics of both CTCF-CTCF loops and enhancer-promoter interactions, and has the potential to shed novel insights on transcriptional bursting and gene regulation.<br>
                </p>

        </div></div></a>

    <a name="Project 3" id="Project 3"><div class="jumbotron" name="Project 3" id="Project 3"><a></a>

            <div class="page-header">
            <h2 style="font-size:26px"><b>Project 3. In silico variant prioritization using sequence-based predictive models. </b></h2>

            <p style="font-size:17px", align="justify">
                <b>Leading lab</b>: Pollard Lab at UCSF <br>
                <b>Stakeholders</b>: Katie Gjoni (<a href = "mailto: katie.gjoni@gladstone.ucsf.edu">katie.gjoni@gladstone.ucsf.edu</a>), Shu Zhang (<a href = "mailto: shu.zhang@gladstone.ucsf.edu">shu.zhang@gladstone.ucsf.edu</a>) and Katie Pollard (<a href = "mailto: katherine.pollard@gladstone.ucsf.edu">katherine.pollard@gladstone.ucsf.edu</a>) <br>
                <b>Desired deliverable</b>: Implementing various machine learning models to score variants for disruption to predicted results. <br>
                <b>Expected coding experience level</b>: Strong background in python, and experience working in bash. <br>
                <b>Motivation</b>: Predictive bioinformatics algorithms that take DNA sequences as input are frequently used to
test the effects of genetic variants in high throughput in silico perturbation experiments. There
previously lacked a standard procedure in formatting individual variants into a set of
ready-to-use inputs. However, our lab recently developed SuPreMo, a computational tool to
prepare individual genetic variants into model-ready DNA sequences. In addition, SuPreMo is
directly integrated with Akita, a model that predicts genome folding. This project intends to
extend this work to generate a set of SuPreMo-based tools that comprehensively scores
variants for effects on various chromatin profiles and gene expression. This entails three coding
days of 1) implementing machine learning models, 2) processing their outputs to allow for fair
comparison of outputs, and 3) evaluating statistical methods for comparing outputs from
reference and mutated sequences. These tools would be important for prioritizing putative
pathogenic variants for experimental studies, decoding the grammar of noncoding DNA
sequences, discovering new sequence motifs, designing tissue-specific enhancers, and
uncovering novel roles of sequence elements.<br>
            </p>

    </div></div></a>

    <a name="Project 4" id="Project 4"><div class="jumbotron" name="Project 4" id="Project 4"><a></a>

            <div class="page-header">
            <h2 style="font-size:26px"><b>Project 4. Integrative analysis of single-cell Hi-C datasets. </b></h2>

            <p style="font-size:17px", align="justify">
                <b>Leading lab</b>: Jian Ma Lab at CMU <br>
                <b>Stakeholders</b>: Akanksha Sachan (<a href = "mailto: akanksha.11.05.07@gmail.com">akanksha.11.05.07@gmail.com</a>) and Wendy Yang (<a href = "mailto: muyuy@andrew.cmu.edu">muyuy@andrew.cmu.edu</a>) <br>
                <b>Desired deliverable</b>: In this project, we aim to build a workflow for scHi-C data analysis using the
existing software (Higashi and Fast-Higashi) to achieve the following two goals: (1)
generate cell embeddings that reflect the cell types in the dataset; (2) identify the
single-cell level 3D genome features based on the imputed scHi-C contact maps and
study the cell-to-cell variability of those features. We will primarily focus on the first goal. <br>
                <b>Expected coding experience level</b>: Python, bash/terminal experience. <br>
                <b>Motivation</b>: Cellular heterogeneity can be observed by embedding their genome architectural
features. This also enables connecting genomic architecture to other epigenetic data
modalities to assess its functional roles comprehensively. <br>
            </p>

    </div></div></a>

    <a name="Project 5" id="Project 5"><div class="jumbotron" name="Project 5" id="Project 5"><a></a>

            <div class="page-header">
            <h2 style="font-size:26px"><b>Project 5.  Integrating 3D genome architecture for enhanced gene expression predictions. </b></h2>

            <p style="font-size:17px", align="justify">
                <b>Leading lab</b>: Christina Leslie Lab at MSKCC <br>
                <b>Stakeholders</b>: Alireza Karbalayghareh (<a href = "mailto: karbalayghareh@gmail.com">karbalayghareh@gmail.com</a>) and Rui Yang (<a href = "mailto: ruy4001@med.cornell.edu">ruy4001@med.cornell.edu</a>) <br>
                <b>Desired deliverable</b>: In this project, we aim to explore the impact of 3D genome structure on gene expression. GraphReg, a deep learning model, leverages 3D interactions along with 1D epigenomic data or genomic DNA sequences to predict gene expression. During the hackathon, participants will have the opportunity to engage in:
a) Biological application of the model: Participants will apply the GraphReg model to germinal center B cell data. Using feature attributions, they will investigate potential enhancer-promoter interactions or distal regulatory elements influencing gene expression.
b) Technical benchmarking of the data: Epiphany and ChromaFold are two deep learning models to predict Hi-C contact maps using 1D epigenomic tracks or scATAC-seq matrices. Participants will benchmark the effectiveness of GraphReg predictions by comparing experimental Hi-C data against model-predicted data, and analyze the results.
<br>
                <b>Expected coding experience level</b>: Basic knowledge about Python, bash/terminal experience. <br>
                <b>Motivation</b>: In this project, we hope to explore two directions: a) a biological application of GraphReg on novel dataset; b) Benchmark the application of real experimental data vs. predicted Hi-C data from previously published models.  <br>
            </p>

    </div></div></a>

        <a name="Project 6" id="Project 6"><div class="jumbotron" name="Project 6" id="Project 6"><a></a>

        <div class="page-header">
        <h2 style="font-size:26px"><b>Project 6. Deep polymer model benchmark. </b></h2>

        <p style="font-size:17px", align="justify">
            <b>Leading lab</b>: Plewczynski Laboratory from University of Warsaw in Center of New Technologies CeNT, Warsaw, Poland <br>
            <b>Stakeholders</b>: NA <br>
            <b>Desired deliverable</b>: Collections of 3D models for selected cell lines and experimental assays. <br>
            <b>Expected coding experience level</b>: Basic familarity with Python and R. <br>
            <b>Motivation</b>: The aim of the project is the identification of software to be suggested as the state-of-the-art practice in the modeling of spatial chromatin organization. One of the factors that impacts the chromatin structure are epigenomic modifications, such as methylation of histones (proteins around which the DNA strand is wrapped to form nucleosomes). In recent years, various approaches have been taken to predict the structure of chromatin, the majority of which leverages data about epigenomic modifications, which serves as input for the models. Therefore, the project we propose will consist of two parts - the evaluation of existing deep learning models for the prediction of epigenomic modifications based on DNA sequence and models for the prediction of chromatin structure based on DNA sequence and epigenomic modifications. Different approaches have their advantages, as well as drawbacks. In order for the community of researchers to benefit from the software, it is important to identify opportunities to improve it. During the hackathon we will validate the models based on experimental data and their performance will be determined based on metrics such as performance, inference time, interpretability, and consumed computational resources. The goal of our project is to provide the direction for reaching a high level of confidence in the modeling of chromatin structure.<br>
        </p>

        </div></div></a>

        <a name="Project 7" id="Project 7"><div class="jumbotron" name="Project 7" id="Project 7"><a></a>

        <div class="page-header">
        <h2 style="font-size:26px"><b>Project 7. Unveiling the role of chromatin accessibility on RNA splicing during neural development. </b></h2>

        <p style="font-size:17px", align="justify">
            <b>Leading lab</b>: YIN SHEN Lab from UCSF <br>
            <b>Stakeholders</b>: Jing Wang (<a href = "mailto: Jing.Wang4@ucsf.edu">Jing.Wang4@ucsf.edu</a>), Ian Jones, Yifan Sun (<a href = "mailto: Yin.Shen@ucsf.edu">Yin.Shen@ucsf.edu</a>) <br>
            <b>Desired deliverable</b>:  The deliverable of this study will be a comprehensive analysis outlining correlations between chromatin accessibility profiles obtained from ATAC-seq and PLAC-seq data and the corresponding RNA splicing patterns extracted from RNA-seq data. This will potentially include visualizations, statistical models, and identified key regulatory elements impacting splicing events. <br>
            <b>Expected coding experience level</b>: Basic familarity with Python and R. <br>
            <b>Motivation</b>: This study aims to elucidate the relationship between chromatin accessibility and RNA splicing using multi-omics data obtained from human brain samples. Leveraging RNA-seq, ATAC-seq, and PLAC-seq datasets, our primary objective is to analyze the influence of chromatin accessibility patterns on the splicing landscape across multiple subtypes the human brain.<br>
        </p>

    </div></div></a>

    </body>

</html>