|
| 1 | +\documentclass[11pt]{article} |
| 2 | + |
| 3 | +\usepackage[margin=1in]{geometry} |
| 4 | +\usepackage{amsmath,amssymb,physics} |
| 5 | +\usepackage{graphicx} |
| 6 | +\usepackage{hyperref} |
| 7 | +\usepackage{booktabs} |
| 8 | + |
| 9 | +\title{Machine Learning Analysis of DCA-Z Distributions in ALICE Data:\\ |
| 10 | +From Discriminative Classification to Generative Modeling} |
| 11 | + |
| 12 | +\author{} |
| 13 | +\date{} |
| 14 | + |
| 15 | +\begin{document} |
| 16 | + |
| 17 | +\maketitle |
| 18 | + |
| 19 | +\begin{abstract} |
| 20 | + |
| 21 | +In high-energy nuclear collision experiments such as ALICE at CERN, |
| 22 | +the longitudinal distance of closest approach (DCA-Z) of reconstructed |
| 23 | +tracks provides a powerful observable for distinguishing between |
| 24 | +single-vertex events and pileup events originating from multiple |
| 25 | +interactions. Traditional approaches rely on parametric fits or |
| 26 | +statistical indicators such as the bimodal coefficient, which can be |
| 27 | +computationally expensive and ambiguous for large-scale data |
| 28 | +sets. This proposal outlines a comprehensive machine learning |
| 29 | +framework for analyzing DCA-Z distributions using both discriminative |
| 30 | +and generative methods. We propose convolutional neural networks |
| 31 | +(CNNs) as primary classifiers, complemented by autoencoders, |
| 32 | +variational autoencoders (VAEs), and diffusion models to capture |
| 33 | +underlying structure, perform anomaly detection, and generate |
| 34 | +synthetic data. The project aims to deliver scalable, robust, and |
| 35 | +physically interpretable methods for vertex multiplicity |
| 36 | +classification in ALICE data. |
| 37 | + |
| 38 | +\end{abstract} |
| 39 | + |
| 40 | +\section{Introduction and Motivation} |
| 41 | + |
| 42 | +In high-energy nuclear collision experiments, identifying whether an |
| 43 | +event originates from a single interaction vertex or multiple |
| 44 | +overlapping interactions (pileup) is a central problem. The DCA-Z |
| 45 | +distribution of reconstructed tracks provides a sensitive probe of |
| 46 | +this structure. |
| 47 | + |
| 48 | +Traditional approaches rely on: |
| 49 | +\begin{itemize} |
| 50 | +\item Parametric fitting of peaks |
| 51 | +\item Statistical measures such as skewness, kurtosis, and bimodality coefficients |
| 52 | +\end{itemize} |
| 53 | + |
| 54 | +However, these methods suffer from: |
| 55 | +\begin{itemize} |
| 56 | +\item Ambiguity in peak definition |
| 57 | +\item Sensitivity to noise and detector resolution |
| 58 | +\item Poor scalability for large datasets |
| 59 | +\end{itemize} |
| 60 | + |
| 61 | +This motivates a transition toward machine learning approaches that: |
| 62 | +\begin{itemize} |
| 63 | +\item Learn directly from data |
| 64 | +\item Capture complex peak structures |
| 65 | +\item Scale efficiently to large datasets |
| 66 | +\end{itemize} |
| 67 | + |
| 68 | +\section{Problem Formulation} |
| 69 | + |
| 70 | +Each event is represented by a DCA-Z distribution, discretized into a histogram: |
| 71 | +\[ |
| 72 | +\mathbf{x} = (x_1, x_2, \dots, x_N), |
| 73 | +\] |
| 74 | +where $x_i$ represents counts in bin $i$. |
| 75 | + |
| 76 | +The goal is to learn a mapping: |
| 77 | +\[ |
| 78 | +f(\mathbf{x}) \rightarrow y, |
| 79 | +\] |
| 80 | +where: |
| 81 | +\begin{itemize} |
| 82 | +\item $y = 0$: single-vertex (unimodal) |
| 83 | +\item $y = 1$: multi-vertex (pileup) |
| 84 | +\end{itemize} |
| 85 | + |
| 86 | +Extensions include: |
| 87 | +\begin{itemize} |
| 88 | +\item Regression: predicting number of vertices |
| 89 | +\item Unsupervised learning: discovering latent structure |
| 90 | +\end{itemize} |
| 91 | + |
| 92 | +\section{Discriminative Machine Learning Approaches} |
| 93 | + |
| 94 | +\subsection{Fully Connected Neural Networks} |
| 95 | + |
| 96 | +A baseline approach is a multilayer perceptron: |
| 97 | +\[ |
| 98 | +f(\mathbf{x}) = \sigma(W_L \cdots \sigma(W_1 \mathbf{x})). |
| 99 | +\] |
| 100 | + |
| 101 | +Advantages: |
| 102 | +\begin{itemize} |
| 103 | +\item Simple implementation |
| 104 | +\item Fast inference |
| 105 | +\end{itemize} |
| 106 | + |
| 107 | +Limitations: |
| 108 | +\begin{itemize} |
| 109 | +\item No explicit modeling of local structure |
| 110 | +\end{itemize} |
| 111 | + |
| 112 | +\subsection{Convolutional Neural Networks (CNNs)} |
| 113 | + |
| 114 | +We propose CNNs as the primary model. |
| 115 | + |
| 116 | +The convolution operation: |
| 117 | +\[ |
| 118 | +y_i = \sum_{j} w_j x_{i+j} |
| 119 | +\] |
| 120 | + |
| 121 | +captures: |
| 122 | +\begin{itemize} |
| 123 | +\item Peak shapes |
| 124 | +\item Local correlations |
| 125 | +\item Peak separation |
| 126 | +\end{itemize} |
| 127 | + |
| 128 | +Advantages: |
| 129 | +\begin{itemize} |
| 130 | +\item Physically meaningful (matched filtering) |
| 131 | +\item Robust to noise |
| 132 | +\item Efficient parameter sharing |
| 133 | +\end{itemize} |
| 134 | + |
| 135 | +\subsection{Recurrent Neural Networks (RNNs)} |
| 136 | + |
| 137 | +RNNs treat the histogram as a sequence: |
| 138 | +\[ |
| 139 | +h_t = f(x_t, h_{t-1}) |
| 140 | +\] |
| 141 | + |
| 142 | +However: |
| 143 | +\begin{itemize} |
| 144 | +\item No natural temporal structure exists |
| 145 | +\item Less efficient than CNNs |
| 146 | +\end{itemize} |
| 147 | + |
| 148 | +Thus, RNNs are not expected to outperform CNNs. And you should spend time on them. |
| 149 | + |
| 150 | +\subsection{Autoencoders} |
| 151 | + |
| 152 | +Autoencoders learn compressed representations: |
| 153 | +\[ |
| 154 | +\mathbf{x} \rightarrow \mathbf{z} \rightarrow \hat{\mathbf{x}}. |
| 155 | +\] |
| 156 | + |
| 157 | +Applications: |
| 158 | +\begin{itemize} |
| 159 | +\item Anomaly detection (pileup as deviation) |
| 160 | +\item Feature extraction |
| 161 | +\end{itemize} |
| 162 | + |
| 163 | +\section{Generative Modeling Approaches} |
| 164 | + |
| 165 | +\subsection{Variational Autoencoders (VAEs)} |
| 166 | + |
| 167 | +VAEs introduce a probabilistic latent space: |
| 168 | +\[ |
| 169 | +z \sim \mathcal{N}(\mu(\mathbf{x}), \sigma(\mathbf{x})) |
| 170 | +\] |
| 171 | + |
| 172 | +Objective: |
| 173 | +\[ |
| 174 | +\mathcal{L} = \mathbb{E}[\log p(\mathbf{x}|z)] - D_{\text{KL}}(q(z|\mathbf{x}) || p(z)) |
| 175 | +\] |
| 176 | + |
| 177 | +Advantages: |
| 178 | +\begin{itemize} |
| 179 | +\item Interpretable latent variables |
| 180 | +\item Semi-supervised learning |
| 181 | +\item Synthetic data generation |
| 182 | +\end{itemize} |
| 183 | + |
| 184 | +\subsection{Diffusion Models} |
| 185 | + |
| 186 | +Diffusion models learn data distributions through a noise process: |
| 187 | +\[ |
| 188 | +x_t = \sqrt{\alpha_t} x_0 + \sqrt{1-\alpha_t} \epsilon |
| 189 | +\] |
| 190 | + |
| 191 | +Applications: |
| 192 | +\begin{itemize} |
| 193 | +\item Generating realistic DCA-Z distributions |
| 194 | +\item Denoising detector effects |
| 195 | +\item Modeling uncertainties |
| 196 | +\end{itemize} |
| 197 | + |
| 198 | +\subsection{Normalizing Flows} |
| 199 | + |
| 200 | +Flows provide exact likelihoods: |
| 201 | +\[ |
| 202 | +p(\mathbf{x}) = p(z) \left|\det \frac{\partial z}{\partial x}\right| |
| 203 | +\] |
| 204 | + |
| 205 | +Applications: |
| 206 | +\begin{itemize} |
| 207 | +\item Likelihood-based classification |
| 208 | +\item Model comparison |
| 209 | +\end{itemize} |
| 210 | + |
| 211 | +\section{Physics Considerations} |
| 212 | + |
| 213 | +\subsection{Correlation Structure} |
| 214 | + |
| 215 | +The problem is fundamentally about detecting: |
| 216 | +\begin{itemize} |
| 217 | +\item Peak multiplicity |
| 218 | +\item Peak overlap |
| 219 | +\item Detector smearing |
| 220 | +\end{itemize} |
| 221 | + |
| 222 | +\subsection{Label Ambiguity} |
| 223 | + |
| 224 | +Peak definitions depend on: |
| 225 | +\begin{itemize} |
| 226 | +\item Minimum width |
| 227 | +\item Peak separation |
| 228 | +\end{itemize} |
| 229 | + |
| 230 | +This introduces: |
| 231 | +\begin{itemize} |
| 232 | +\item Systematic uncertainties |
| 233 | +\item Label noise |
| 234 | +\end{itemize} |
| 235 | + |
| 236 | +\subsection{Class Imbalance} |
| 237 | + |
| 238 | +Pileup events are typically rare: |
| 239 | +\begin{itemize} |
| 240 | +\item Requires weighted loss functions |
| 241 | +\item Use of focal loss |
| 242 | +\end{itemize} |
| 243 | + |
| 244 | +\section{Proposed Work Plan} |
| 245 | + |
| 246 | +\subsection{Phase 1: Baseline Models} |
| 247 | +\begin{itemize} |
| 248 | +\item Implement MLP and CNN classifiers |
| 249 | +\item Evaluate classification accuracy |
| 250 | +\end{itemize} |
| 251 | + |
| 252 | +\subsection{Phase 2: Enhanced Models} |
| 253 | +\begin{itemize} |
| 254 | +\item CNN with regression output (number of peaks) |
| 255 | +\item Uncertainty estimation |
| 256 | +\end{itemize} |
| 257 | + |
| 258 | +\subsection{Phase 3: Generative Models} |
| 259 | +\begin{itemize} |
| 260 | +\item Train VAE for latent structure learning |
| 261 | +\item Use autoencoders for anomaly detection |
| 262 | +\end{itemize} |
| 263 | + |
| 264 | +\subsection{Phase 4: Advanced Generative Modeling} |
| 265 | +\begin{itemize} |
| 266 | +\item Implement diffusion models |
| 267 | +\item Generate synthetic datasets |
| 268 | +\item Perform denoising and uncertainty quantification |
| 269 | +\end{itemize} |
| 270 | + |
| 271 | +\section{Expected Outcomes} |
| 272 | + |
| 273 | +\begin{itemize} |
| 274 | +\item Fast and scalable classification of pileup events |
| 275 | +\item Improved robustness compared to parametric methods |
| 276 | +\item Interpretable latent representations of vertex structure |
| 277 | +\item Generative models for simulation and uncertainty analysis |
| 278 | +\end{itemize} |
| 279 | + |
| 280 | + |
| 281 | +\end{document} |
0 commit comments