Skip to content

Commit 261c6ec

Browse files
committed
Create magnus.tex
1 parent e7758c1 commit 261c6ec

File tree

1 file changed

+281
-0
lines changed

1 file changed

+281
-0
lines changed
Lines changed: 281 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,281 @@
1+
\documentclass[11pt]{article}
2+
3+
\usepackage[margin=1in]{geometry}
4+
\usepackage{amsmath,amssymb,physics}
5+
\usepackage{graphicx}
6+
\usepackage{hyperref}
7+
\usepackage{booktabs}
8+
9+
\title{Machine Learning Analysis of DCA-Z Distributions in ALICE Data:\\
10+
From Discriminative Classification to Generative Modeling}
11+
12+
\author{}
13+
\date{}
14+
15+
\begin{document}
16+
17+
\maketitle
18+
19+
\begin{abstract}
20+
21+
In high-energy nuclear collision experiments such as ALICE at CERN,
22+
the longitudinal distance of closest approach (DCA-Z) of reconstructed
23+
tracks provides a powerful observable for distinguishing between
24+
single-vertex events and pileup events originating from multiple
25+
interactions. Traditional approaches rely on parametric fits or
26+
statistical indicators such as the bimodal coefficient, which can be
27+
computationally expensive and ambiguous for large-scale data
28+
sets. This proposal outlines a comprehensive machine learning
29+
framework for analyzing DCA-Z distributions using both discriminative
30+
and generative methods. We propose convolutional neural networks
31+
(CNNs) as primary classifiers, complemented by autoencoders,
32+
variational autoencoders (VAEs), and diffusion models to capture
33+
underlying structure, perform anomaly detection, and generate
34+
synthetic data. The project aims to deliver scalable, robust, and
35+
physically interpretable methods for vertex multiplicity
36+
classification in ALICE data.
37+
38+
\end{abstract}
39+
40+
\section{Introduction and Motivation}
41+
42+
In high-energy nuclear collision experiments, identifying whether an
43+
event originates from a single interaction vertex or multiple
44+
overlapping interactions (pileup) is a central problem. The DCA-Z
45+
distribution of reconstructed tracks provides a sensitive probe of
46+
this structure.
47+
48+
Traditional approaches rely on:
49+
\begin{itemize}
50+
\item Parametric fitting of peaks
51+
\item Statistical measures such as skewness, kurtosis, and bimodality coefficients
52+
\end{itemize}
53+
54+
However, these methods suffer from:
55+
\begin{itemize}
56+
\item Ambiguity in peak definition
57+
\item Sensitivity to noise and detector resolution
58+
\item Poor scalability for large datasets
59+
\end{itemize}
60+
61+
This motivates a transition toward machine learning approaches that:
62+
\begin{itemize}
63+
\item Learn directly from data
64+
\item Capture complex peak structures
65+
\item Scale efficiently to large datasets
66+
\end{itemize}
67+
68+
\section{Problem Formulation}
69+
70+
Each event is represented by a DCA-Z distribution, discretized into a histogram:
71+
\[
72+
\mathbf{x} = (x_1, x_2, \dots, x_N),
73+
\]
74+
where $x_i$ represents counts in bin $i$.
75+
76+
The goal is to learn a mapping:
77+
\[
78+
f(\mathbf{x}) \rightarrow y,
79+
\]
80+
where:
81+
\begin{itemize}
82+
\item $y = 0$: single-vertex (unimodal)
83+
\item $y = 1$: multi-vertex (pileup)
84+
\end{itemize}
85+
86+
Extensions include:
87+
\begin{itemize}
88+
\item Regression: predicting number of vertices
89+
\item Unsupervised learning: discovering latent structure
90+
\end{itemize}
91+
92+
\section{Discriminative Machine Learning Approaches}
93+
94+
\subsection{Fully Connected Neural Networks}
95+
96+
A baseline approach is a multilayer perceptron:
97+
\[
98+
f(\mathbf{x}) = \sigma(W_L \cdots \sigma(W_1 \mathbf{x})).
99+
\]
100+
101+
Advantages:
102+
\begin{itemize}
103+
\item Simple implementation
104+
\item Fast inference
105+
\end{itemize}
106+
107+
Limitations:
108+
\begin{itemize}
109+
\item No explicit modeling of local structure
110+
\end{itemize}
111+
112+
\subsection{Convolutional Neural Networks (CNNs)}
113+
114+
We propose CNNs as the primary model.
115+
116+
The convolution operation:
117+
\[
118+
y_i = \sum_{j} w_j x_{i+j}
119+
\]
120+
121+
captures:
122+
\begin{itemize}
123+
\item Peak shapes
124+
\item Local correlations
125+
\item Peak separation
126+
\end{itemize}
127+
128+
Advantages:
129+
\begin{itemize}
130+
\item Physically meaningful (matched filtering)
131+
\item Robust to noise
132+
\item Efficient parameter sharing
133+
\end{itemize}
134+
135+
\subsection{Recurrent Neural Networks (RNNs)}
136+
137+
RNNs treat the histogram as a sequence:
138+
\[
139+
h_t = f(x_t, h_{t-1})
140+
\]
141+
142+
However:
143+
\begin{itemize}
144+
\item No natural temporal structure exists
145+
\item Less efficient than CNNs
146+
\end{itemize}
147+
148+
Thus, RNNs are not expected to outperform CNNs. And you should spend time on them.
149+
150+
\subsection{Autoencoders}
151+
152+
Autoencoders learn compressed representations:
153+
\[
154+
\mathbf{x} \rightarrow \mathbf{z} \rightarrow \hat{\mathbf{x}}.
155+
\]
156+
157+
Applications:
158+
\begin{itemize}
159+
\item Anomaly detection (pileup as deviation)
160+
\item Feature extraction
161+
\end{itemize}
162+
163+
\section{Generative Modeling Approaches}
164+
165+
\subsection{Variational Autoencoders (VAEs)}
166+
167+
VAEs introduce a probabilistic latent space:
168+
\[
169+
z \sim \mathcal{N}(\mu(\mathbf{x}), \sigma(\mathbf{x}))
170+
\]
171+
172+
Objective:
173+
\[
174+
\mathcal{L} = \mathbb{E}[\log p(\mathbf{x}|z)] - D_{\text{KL}}(q(z|\mathbf{x}) || p(z))
175+
\]
176+
177+
Advantages:
178+
\begin{itemize}
179+
\item Interpretable latent variables
180+
\item Semi-supervised learning
181+
\item Synthetic data generation
182+
\end{itemize}
183+
184+
\subsection{Diffusion Models}
185+
186+
Diffusion models learn data distributions through a noise process:
187+
\[
188+
x_t = \sqrt{\alpha_t} x_0 + \sqrt{1-\alpha_t} \epsilon
189+
\]
190+
191+
Applications:
192+
\begin{itemize}
193+
\item Generating realistic DCA-Z distributions
194+
\item Denoising detector effects
195+
\item Modeling uncertainties
196+
\end{itemize}
197+
198+
\subsection{Normalizing Flows}
199+
200+
Flows provide exact likelihoods:
201+
\[
202+
p(\mathbf{x}) = p(z) \left|\det \frac{\partial z}{\partial x}\right|
203+
\]
204+
205+
Applications:
206+
\begin{itemize}
207+
\item Likelihood-based classification
208+
\item Model comparison
209+
\end{itemize}
210+
211+
\section{Physics Considerations}
212+
213+
\subsection{Correlation Structure}
214+
215+
The problem is fundamentally about detecting:
216+
\begin{itemize}
217+
\item Peak multiplicity
218+
\item Peak overlap
219+
\item Detector smearing
220+
\end{itemize}
221+
222+
\subsection{Label Ambiguity}
223+
224+
Peak definitions depend on:
225+
\begin{itemize}
226+
\item Minimum width
227+
\item Peak separation
228+
\end{itemize}
229+
230+
This introduces:
231+
\begin{itemize}
232+
\item Systematic uncertainties
233+
\item Label noise
234+
\end{itemize}
235+
236+
\subsection{Class Imbalance}
237+
238+
Pileup events are typically rare:
239+
\begin{itemize}
240+
\item Requires weighted loss functions
241+
\item Use of focal loss
242+
\end{itemize}
243+
244+
\section{Proposed Work Plan}
245+
246+
\subsection{Phase 1: Baseline Models}
247+
\begin{itemize}
248+
\item Implement MLP and CNN classifiers
249+
\item Evaluate classification accuracy
250+
\end{itemize}
251+
252+
\subsection{Phase 2: Enhanced Models}
253+
\begin{itemize}
254+
\item CNN with regression output (number of peaks)
255+
\item Uncertainty estimation
256+
\end{itemize}
257+
258+
\subsection{Phase 3: Generative Models}
259+
\begin{itemize}
260+
\item Train VAE for latent structure learning
261+
\item Use autoencoders for anomaly detection
262+
\end{itemize}
263+
264+
\subsection{Phase 4: Advanced Generative Modeling}
265+
\begin{itemize}
266+
\item Implement diffusion models
267+
\item Generate synthetic datasets
268+
\item Perform denoising and uncertainty quantification
269+
\end{itemize}
270+
271+
\section{Expected Outcomes}
272+
273+
\begin{itemize}
274+
\item Fast and scalable classification of pileup events
275+
\item Improved robustness compared to parametric methods
276+
\item Interpretable latent representations of vertex structure
277+
\item Generative models for simulation and uncertainty analysis
278+
\end{itemize}
279+
280+
281+
\end{document}

0 commit comments

Comments
 (0)