DipteshDas.github.io/project.html at main · DipteshDas/DipteshDas.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Diptesh Das - Projects</title>
    <link rel="stylesheet" href="styles.css">
</head>
<body>
    <header>
        <h1>Diptesh Das</h1>
        <p>Researcher at the Department of Computational Biology and Medical Sciences, The University of Tokyo.</p>
        <nav>
            <ul>
                <li><a href="index.html">Home</a></li>
                <li><a href="https://www.tsudalab.org/en/members/" target="_blank">Lab</a></li>
                <li><a href="project.html">Projects</a></li>
                <li><a href="https://scholar.google.co.in/citations?user=8DWrGBwAAAAJ&hl=en" target="_blank">Google Scholar</a></li>
            </ul>
        </nav>
    </header>
    <section id="projects">
        <h2>Projects (Selected)</h2>
        <div class="project">
            <h3>Statistically Robust Sparse High-order Interaction Model</h3>
            <p style="text-align:justify;">Deep learning models often achieve high accuracy but lack interpretability, making them unsuitable for critical applications such as medical diagnosis, biomolecule design, criminal justice, etc.
                The Sparse High-Order Interaction Model (SHIM)
                addresses this limitation by providing both transparency and predictive reliability. However, realworld data often contain outliers, which can distort model performance. To overcome this, we
                propose Huberized-SHIM, an extension of SHIM
                that integrates Huber loss-based robust regression to mitigate the impact of outliers. We introduce a homotopy-based exact regularization path
                algorithm and a novel tree-pruning criterion to
                efficiently manage interaction complexity. Additionally, we incorporate the conformal prediction
                framework to enhance statistical reliability. Empirical evaluations on synthetic and real-world datasets
                demonstrate the superior robustness and accuracy of Huberized-SHIM in high-stakes decisionmaking contexts.
            </p>
            <a href="https://openreview.net/pdf?id=BL6vK8GeoR" target="_blank">[paper]</a>
            <!-- <a href="" target="_blank">[code]</a>             -->
        </div>
        <div class="project">
            <h3> DT-sampler: A SAT-based Decision Tree Ensemble</h3>
            <p style="text-align:justify;">Interpretable (or explainable) machine learning models, such as decision trees, play a crucial role in the context of trustworthy AI. However, finding optimal decision trees (i.e., minimum size and maximum accuracy trees) is not a simple task and remains an active area of research. While a single decision tree has limited expressivity, using an ensemble of decision trees can effectively capture the complex structures found in many real-world applications. Many existing tree ensemble methods are greedy and suboptimal, and often suffer from randomness in the tree generation process. In this paper, we introduce DT-sampler, a SAT-based decision tree ensemble which allows explicit control over both the size and accuracy of the sampled trees. We developed a novel SAT-based encoding method that utilizes only branch nodes, resulting in a compact representation of decision tree space. Additionally, standard point predictions made using decision tree ensembles do not offer any statistical guarantee over miscoverage rate. We employ conformal prediction (CP), a distribution-free statistical framework which provides a valid finite-sample coverage guarantee, to demonstrate that DT-sampler is statistically more efficient and produces stable results when compared with random forest classifier. We demonstrate the effectiveness of our method through several benchmark and real-world datasets.
            </p>
            <a href="https://dl.acm.org/doi/10.1145/3787470.3787484" target="_blank">[paper]</a>
            <a href="https://github.com/tsudalab/DT-sampler-CP" target="_blank">[code]]</a>
        </div>
        <div class="project">
            <h3> CRYSIM: Prediction of Symmetric Structures of Large Crystals with GPU-based Ising Machines</h3>
            <p style="text-align:justify;">Solving black-box optimization problems with Ising machines is increasingly common in materials science. However, their application to crystal structure prediction (CSP) is still ineffective due to symmetry agnostic encoding of atomic coordinates. We introduce CRYSIM, an algorithm that encodes the space group, the Wyckoff positions combination, and coordinates of independent atomic sites as separate variables. This encoding reduces the search space substantially by exploiting the symmetry in space groups. When CRYSIM is interfaced to Fixstars Amplify, a GPU-based Ising machine, its prediction performance is competitive with CALYPSO and Bayesian optimization for crystals containing more than 150 atoms in a unit cell. Although it is not realistic to interface CRYSIM to current small-scale quantum devices, it has the potential to become the standard CSP algorithm in the coming quantum age.
            </p>
            <a href="https://www.nature.com/articles/s42005-025-02380-y" target="_blank">[paper]</a>
            <a href="https://github.com/tsudalab/CRYSIM" target="_blank">[code]]</a>
        </div>
        <div class="project">
            <h3> Molecule Graph Networks with Many-Body Equivariant Interactions</h3>
            <p style="text-align:justify;">Message passing neural networks have demonstrated significant efficacy in predicting molecular interactions. Introducing equivariant vectorial representations augments expressivity by capturing geometric data symmetries, thereby improving model accuracy. However, two-body bond vectors in opposition may cancel each other out during message passing, leading to the loss of directional information on their shared node. In this study, we develop Equivariant N-body Interaction Networks (ENINet) that explicitly integrates l = 1 equivariant many-body interactions to enhance directional symmetric information in the message passing scheme. We provided a mathematical analysis demonstrating the necessity of incorporating many-body equivariant interactions and generalized the formulation to N-body interactions. Experiments indicate that integrating many-body equivariant representations enhances prediction accuracy across diverse scalar and tensorial quantum chemical properties.
            </p>
            <a href="https://pubs.acs.org/doi/full/10.1021/acs.jctc.5c00466" target="_blank">[paper]</a>
            <a href="https://github.com/tsudalab/ENINet" target="_blank">[code]]</a>
        </div>
        <div class="project">
            <h3>Preference-Optimized Pareto Set Learning (PO-PSL) for Blackbox Optimization</h3>
            <p style="text-align:justify;">Multi-Objective Optimization (MOO) presents a significant challenge in various
                real-world applications. For complex problems, it is usually impossible to find a single solution that optimizes
                all objectives simultaneously. In experimental design scenarios, obtaining the entire Pareto set (PS) is beneficial
                as it allows for flexible exploration of the design space. We have developed an efficient Pareto set learning (PSL)
                algorithm that learns the continuous manifold of the Pareto front (PF). This enables a robot or a domain expert to
                explore the PF in real-time, eliminating the need to reconstruct the PF for new trade-off preferences among
                objectives.
            </p>
            <a href="https://arxiv.org/abs/2408.09976" target="_blank">[paper]</a>
            <a href="https://github.com/tsudalab/POPSL" target="_blank">[code]</a>
        </div>
        <div class="project">
            <h3> A Confidence Machine for Sparse High-order Interaction Model</h3>
            <p style="text-align:justify;">The Sparse High-order Interaction Model (SHIM) is an interpretable yet non-linear
                machine learning model. It is a useful model that can capture the interactions of many features, which is
                crucial in many real-world applications, such as gene-gene interactions and identifying groups of mutations.
                However, finding a point prediction in regression is often not enough, and many real-world high-stakes
                decision-making problems demand a prediction band (or interval) that encloses the point prediction. We
                developed an efficient algorithm that can produce statistically efficient (narrow) prediction intervals
                containing the point prediction of a SHIM.
            </p>
            <a href="https://onlinelibrary.wiley.com/doi/pdf/10.1002/sta4.633" target="_blank">[paper]</a>
            <a href="https://github.com/DipteshDas/CP-SHIM" target="_blank">[code]]</a>
        </div>
        <!-- <div class="project">
            <h3> Feature Importance Measurement based on Decision Tree Sampling</h3>
            <p style="text-align:justify;">Random forest is effective for prediction tasks but the randomness of tree
                generation hinders interpretability in feature importance analysis. To address this, we proposed DT-Sampler,
                a SAT-based method for measuring feature importance in a tree-based model. Our method has fewer parameters
                than random forest and provides higher interpretability and stability for the analysis of real-world problems.
            </p>
            <a href="https://openreview.net/forum?id=Mn4AXZwJIZ#all" target="_blank">[paper]</a>
            <a href="https://github.com/tsudalab/DT-sampler" target="_blank">[code]]</a>
        </div>        -->
        <div class="project">
            <h3>Fast and More Powerful Selective Inference for Sparse High-order Interaction Model</h3>
            <p style="text-align:justify;">Finding statistically significant (low p-values) high-order
                feature interactions are challenging because of the intrinsic high
                dimensionality of the combinatorial effects. Another problem
                in data-driven modeling is the effect of “cherry-picking” (i.e.,
                selection bias). We developed a fast algorithm using a branch-and-bound tree pruning strategy that can correct
                the selection bias and provide statistically valid (provides selection bias corrected p-values) high-order
                feature interactions.</p>
                <a href="https://ojs.aaai.org/index.php/AAAI/article/view/21238" target="_blank">[paper]</a>
            <a href="https://github.com/DipteshDas/SI-SHIM" target="_blank">[code]</a>
        </div>
        <div class="project">
            <h3>
                Sparse High-order Interaction Model with Rejection option (SHIMR)</h3>
            <p style="text-align:justify;">SHIMR is an interpretable, non-linear machine learning model that includes a
                rejection option, which is essential for high-stakes decision-making, such as in medical diagnosis.
                This model can identify uncertain areas within the data and has the ability to refrain from making a
                decision when it lacks confidence. For instance, it can automatically pinpoint samples that are close
                to the decision boundary and choose not to make a decision for those instances. SHIMR is equipped to
                address class imbalance issues, and its visualization module illustrates the relationships between model
                scores, feature interactions, and their importance, making it a valuable tool for promoting trustworthy AI. </p>
                <a href="https://peerj.com/articles/6543/" target="_blank">[paper]</a>
                <a href="https://github.com/tsudalab/SHIMR?tab=readme-ov-file#shimr-sparse-high-order-interaction-model-with-rejection-option"
            target="_blank">[code]</a>
        </div><br\><br/><br/><br/>
        <!-- Add more projects as needed -->
    </section>
    <footer>
        <p>Contact: [firstname].[lastname]@edu.k.u-tokyo.ac.jp</p>
    </footer>
</body>
</html>