-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathsubmit.html
More file actions
463 lines (423 loc) · 33.2 KB
/
submit.html
File metadata and controls
463 lines (423 loc) · 33.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
<!DOCTYPE html>
<html lang="en-us">
<head>
<meta charset="UTF-8">
<title>Updates | TJ Machine Learning Club</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" type="text/css" href="stylesheets/normalize.css" media="screen">
<link href='https://fonts.googleapis.com/css?family=Open+Sans:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" type="text/css" href="stylesheets/stylesheet.css" media="screen">
<link rel="stylesheet" type="text/css" href="stylesheets/github-light.css" media="screen">
<link rel="stylesheet" type="text/css" href="css/demo.css" />
<link rel="stylesheet" type="text/css" href="css/component.css" />
<link rel="stylesheet" type="text/css" href="css/style1.css" />
<script src="js/modernizr.custom.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.1.0/jquery.min.js"></script>
<script src="js/expand.js"></script>
<link rel="apple-touch-icon" sizes="180x180" href="apple-touch-icon.png">
<link rel="icon" type="image/png" sizes="32x32" href="favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="favicon-16x16.png">
<link rel="manifest" href="manifest.json">
<link rel="mask-icon" href="safari-pinned-tab.svg" color="#5bbad5">
<meta name="theme-color" content="#ffffff">
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-105333430-1', 'auto');
ga('send', 'pageview');
</script>
</head>
<body>
<div class="mobile-menu">
<button id="trigger-overlay" type="button" style="background:none; border:none; position: fixed;">
<img src="img/menu.png" alt="Menu" style="max-width:2em">
</button>
</div>
<section class="page-header">
<h1 class="project-name" style="color:#fff">TJ Machine Learning Club</h1>
<h2 class="project-tagline" style="color:#fff">Making AI more accessible</h2>
<a href="https://docs.google.com/forms/d/e/1FAIpQLSe10g6jeb8k39RG4RZCrRRyWnltXBFEN6--q4t2xgAo6ddJQA/viewform?usp=sf_link" class="btn" style="color:#fff">Join Us Today</a>
<!-- <a href="https://github.com/nikhilsardana/tjmachinelearning/zipball/master" class="btn">Download .zip</a>
<a href="https://github.com/nikhilsardana/tjmachinelearning/tarball/master" class="btn">Download .tar.gz</a> -->
</section>
<div class="container">
<section class="section section--menu" id="Alonso">
<span class="link-copy"></span>
<nav class="menu menu--alonso">
<ul class="menu__list">
<li class="menu__item"><a href="index.html" class="menu__link">Home</a></li>
<li class="menu__item"><a href="schedule.html" class="menu__link">Lectures</a></li>
<li class="menu__item"><a href="rankings.html" class="menu__link">Rankings</a></li>
<li class="menu__item"><a href="resources.html" class="menu__link">Resources</a></li>
<li class="menu__item"><a href="projects.html" class="menu__link">Projects</a></li>
<li class="menu__item menu__item--current"><a href="#" class="menu__link">Updates</a></li>
<li class="menu__line"></li>
</ul>
</nav>
</section>
</div>
<div class="overlay overlay-hugeinc">
<button type="button" class="overlay-close">Close</button>
<nav>
<ul>
<li><a href="index.html">Home</a></li>
<li><a href="schedule.html">Lectures</a></li>
<li><a href="rankings.html">Rankings</a></li>
<li><a href="resources.html">Resources</a></li>
<li><a href="projects.html">Projects</a></li>
<li><a href="#">Updates</a></li>
</ul>
</nav>
</div>
<section class="main-content">
<div class="instructions">
<blockquote><h1 id=comptitle>Competition Instructions</h1></blockquote>
<section id="comp" style="display:none">
<h3 id="kaggletitle">Iceberg Competition Instructions</h3>
<section id="kaggle" style="display:none">
<h4>Competition Introduction</h4>
<p>For the next few weeks, in lieu of our own internal competitions, you have the opportunity to participate in a public <a href="https://kaggle.com">Kaggle</a> competition. Your performance on this competition will be counted in our club rankings.</p>
<p>The competition we have chosen is the <a href="https://www.kaggle.com/c/statoil-iceberg-classifier-challenge">Statoil/C-CORE Iceberg Classifier Challenge</a>. Essentially, the goal is to classify ships vs. icebergs given images. Just like any of our previous competitions, you aim to achieve the highest accuracy and be at the top of the leaderboard.
The procedure for participating is also similar to our in-class competitions. Download the training and testing data, train on the training data, generate a submission file from the testing data, and upload the submission file for grading. The private leaderboard results determine final rankings, and are made available after the final deadline.</p>
<p>Since this competition is difficult and long, you are allowed to work individually or in teams of 2.</p>
<h4>Differences between Public and Private Kaggle Competitions</h4>
<p>However, there are a few differences between a public Kaggle competitions and our private classroom competitions:</p>
<ul>
<li>Anyone can enter a public Kaggle competition (obviously). There are currently 2,000+ teams in the Iceberg competition.</li>
<li>There are monetary prizes! They range from $10,000 to >$1,000,000. The iceberg competition in particular has prizes of $25,000 for 1st, $15,000 for 2nd, and $10,000 for 3rd.</li>
</ul>
<p>One might ask: Why are such large sums of money being given out for classifying ships vs. icebergs? Well, this competition is sponsored by a shipping company. So, telling an iceberg vs. another ship more accurately could prevent a collision and save them millions of dollars. However, they clearly don't want to hire a data scientists, so they essentially crowdsourcing their solution through Kaggle. They take the winner's model and use it for their business.</p>
<p>Some more differences between a public Kaggle competition and our in-class competitions:</p>
<ul>
<li>Data size tends to be much larger (sometimes on the order of 1 TB). We partially chose this competition due to the low data size (~1 GB).</li>
<li>Even with this relatively low data size for a public competition, you will need a complex model. If the answer was trivial and could be achieved simply, they wouldn't be offering $50,000.</li>
<li>Because you will need a complex model, you will need a GPU to run your models. You can begin writing your code now, and we will try to get more machines with GPUs running in the syslab soon. The machine "infosphere" currently has the only high-performance GPUs in the syslab.</li>
<li>Rather than viewing our website for information about the data and competition procedures, everything is directly on the competition site itself. The competition link is below.</li>
<li>Public competitions last months, not one week. This competition ends on January 23rd, 2017. Do not delay. Begin as soon as possible. Everything takes far longer than expected to complete, especially when working with large amounts of data.</li>
<li>People discuss solutions and post code in the <a href="https://www.kaggle.com/c/statoil-iceberg-classifier-challenge/discussion">Discussion tab</a>. Of course, the leaders of competition aren't going to give to the world their winning solution while the competition is ongoing, but you can often find a half-decent model available. I recommend viewing and understanding what people have done and made available.</li>
</ul>
<h4>Tips</h4>
<p>Consider all the techniques we have recently covered: Convolutional networks, transfer learning, Inception and ResNet, image preprocessing, image normalization, data augmentation, etc. Some may be useful, others will not.</p>
<p>If you chose to use PyTorch, which is faster and lower-level but has less documentation, <a href="http://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html">this tutorial</a> on transfer learning may be helpful when starting out. If you use Keras, <a href="https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html">this tutorial</a> may be useful for beginners. If you're stuck, and can't figure out how to do something, Google it or check the documentation.</p>
<h4>Final words</h4>
<p>I will leave you with these final words: Public competitions are very hard. Don't be discouraged if you don't do well initially. Training, retraining, and tweaking your models are essential for your success. If you do well, say, in the top 5%, it is extremely impressive. I'm not aware of a high school student who has won any significant money from Kaggle competitions–think about it. Just as you will be working on this in your free time, so are PhD's, graduate students, and data scientists.</p>
<p>Good Luck!</p>
<a href="https://www.kaggle.com/c/statoil-iceberg-classifier-challenge">Competition Link</a>
<hr>
</section>
<h3 id=cnntitle>Convolutional Neural Networks Competition Instructions</h3>
<section id="cnn" style="display:none">
<p>Each predicted keypoint is specified by an (x,y) real-valued pair in the space of pixel indices. There are 15 keypoints, which represent the following elements of the face:</p>
<p>left_eye_center, right_eye_center, left_eye_inner_corner, left_eye_outer_corner, right_eye_inner_corner, right_eye_outer_corner, left_eyebrow_inner_end, left_eyebrow_outer_end, right_eyebrow_inner_end, right_eyebrow_outer_end, nose_tip, mouth_left_corner, mouth_right_corner, mouth_center_top_lip, mouth_center_bottom_lip</p>
<p>Left and right here refers to the point of view of the subject.</p>
<p>The input image is given in the last field of the data files, and consists of a list of pixels (ordered by row), as integers in (0,255). The images are 96x96 pixels.</p>
<h2>Data files</h2>
<ul>
<li><strong>train.csv</strong>: list of training 5000 images. Each row contains the (x,y) coordinates for 15 keypoints, and image data as row-ordered list of pixels. The first row is the header and associates each column with the feature. There are 31 values in each row, with the first 30 being x value for feature 1, y value for feature 1, etc and the 31st value being the image. The first 30 should be the outputs of your network and the 31st (image) should be the input.</li>
<li><strong>test.csv</strong>: list of 2049 test images. Each row contains ImageId and image data as row-ordered list of pixels.</li>
<li><strong>samplesubmission.csv</strong>: list of keypoints to predict. Each row has an Id and a value. The Id corresponds to the image and feature in the format "ImageId.FeatureId" where featureId is based on the order in which they are sorted in the train file. For example, the first feature is left_eye_center_x so to predict left_eye_center_x for image 1, I would have "1.1 34.555" as the first row.</li>
</ul>
<h2>Helpful tips</h2>
<ul>
<li><a href="https://tjmachinelearning.com/contests/cnn/IO_sample.py.txt">Sample code</a></li>
<li>Use Keras. Documentation available <a href="https://keras.io">here</a>. The first half of <a href="https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html">this tutorial</a> may come in handy. Our introductory lecture is available <a href="https://tjmachinelearning.com/lectures/library/keras.html">here</a>.</li>
<li>Start with a stadard neural network before moving on to a convolutional one. When you do use a convolutional network,
make sure to reshape your input to 96x96 instead of 1x9216 as it is right now. This is best done using numpy.</li>
<li>Use a linear activation in the final layer because the challenge is regression not classification. There should be 30 output nodes.</li>
<li>The Ids start at 1 not 0. Don't mix this up.</li>
<li>Don't forget your header in the submission file.</li>
</ul>
<p>To better understand the format, open the files in Excel. Feel free to ask any clarifying format questions to the officers.</p>
<p><a href="https://www.kaggle.com/c/tjmlfacialfeatures">Competition link</a></p>
<hr>
</section>
<h3 id=nntitle>Neural Networks Competition Instructions</h3>
<section id="nn" style="display:none">
<p><i>11/01/17 - </i>Your job is to write the code to create an neural network, train it on the training data, and use it to predict the classes of the testing data. We are trying to images of handwritten digits. The data we are using is from the famous MNIST dataset. Your neural network is supposed to classify which digit (0,1,2,3,4,5,6,7,8, or 9) the image represents, and the inputs are the 784 values that make up the 28x28 images.</p>
<p>The training data looks like this:</p>
<code>label, pixel 11, pixel 12, pixel 13, pixel 14, etc. (784 pixel values)<br>
label, pixel 21, pixel 22, pixel 23, pixel 24, etc. (784 pixel values)<br>
label, pixel 31, pixel 32, pixel 33, pixel 34, etc. (784 pixel values)<br>
etc. (60,000 lines)</code>
<p>The testing data looks like this:</p>
<p></p><code>id, pixel 11, pixel 12, pixel 13, pixel 14, etc. (784 pixel values)<br>
id, pixel 21, pixel 22, pixel 23, pixel 24, etc. (784 pixel values)<br>
id, pixel 31, pixel 32, pixel 33, pixel 34, etc. (784 pixel values)<br>
etc. (10,000 lines)</code></p>
<p>Where <code>pixel ij</code> is the pixel value in the <code>ith</code> row and <code>jth</code> column. Each image is 28x28. Each pixel value ranges from 0, black, to 255, white. The MNIST dataset is black and white, which is why each pixel value is a single value instead of an (R,G,B) triple.</p>
<p>Your end goal is to create a file which looks like:</p>
<p><code>id, solution <br>
1, predicted_label<br>
2, predicted_label<br>
3, predicted_label<br>
4, predicted_label<br>
etc. (10,000 lines)</code></p>
<p>All <a href="#standard">standard competition rules</a> apply. You are only allowed to use the numpy library. We highly recommend you use the library for vectors (bias vectors, etc.) and matrices (weights, partial matrices).</p>
<p>We've written a <a href="lectures/nn3/nn_shell.py.txt">small shell</a>. The shell has a network class, and each network is made up of a list of layers (which are a separate class). Each layer is designed to have its own vectors and matrices (biases and weights, etc.). You don't have to structure your network this way by any means, or have a layer class at all. Most of the time when people write neural networks from scratch, they only have a single network class.</p>
<p>The competition ends in two weeks, at 11:59:59 p.m. on 11/14/17, since our next meeting is 11/15. Also, since writing a neural network from scratch is an involved process, the competition will be worth double in our rankings.</p>
<p><a href="https://www.kaggle.com/c/nncontest3">Competition link</a></p>
<hr>
</section>
<h3 id=svmtitle>Support Vector Machine Competition Instructions</h3>
<section id="svm" style="display:none">
<p><i>10/04/17 - </i>Your job is to write the code to create an SVM, train it on the training data, and use it to predict the classes of the testing data.
We are trying to classify survival of passengers on the Titanic. The data we are using are from actual passengers on the ship.
Your SVM is supposed to classify whether a passenger survived [RIP (0) or Survived (1)], based on 11 different metrics (features).
</p>
<p>
The purpose of this contest is not to test your ability to write an SVM. Instead, we are using this opportunity to test two abilities:
</p>
<ol>
<li>Your ability to learn how to use Scikit-Learn</li>
<li>Your ability to work with real-world data</li>
</ol>
<p>
The second is far more important (and difficult) than the first. With that in mind, the training data now looks like this:
</p>
<p>
<code>
feature 1, survival, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, feature 8, feature 9, feature 10, feature 11<br>
feature 1, survival, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, feature 8, feature 9, feature 10, feature 11<br>
feature 1, survival, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, feature 8, feature 9, feature 10, feature 11<br>
etc. (636 lines) <br></code>
</p>
<p>
and the testing data looks like this: </p>
<p>
<code>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, feature 8, feature 9, feature 10, feature 11<br>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, feature 8, feature 9, feature 10, feature 11<br>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, feature 8, feature 9, feature 10, feature 11<br>
etc. (255 lines) <br>
</code>
</p>
<p>
Your end goal is to create a file which looks like: <br>
<code>
id, solution <br>
1, predicted_class <br>
2, predicted_class <br>
3, predicted_class <br>
4, predicted_class <br>
etc. (255 lines) <br>
</code>
For every set of features in line N in the training data, you should have a line <code>N, predicted_class</code> in your submission file.
</p>
<p>
<strong>There are missing data points. Some features are not useful. The difficult part of this contest is formatting the data given,
determining which features are useful,
which ones should be trained on, and how to deal with the missing data.</strong>
</p>
<p>
Your are allowed to (and should) use <a href="http://www.scikit-learn.org">Scikit-Learn</a> to create your SVM.
Scikit has detailed instructions on how to write an SVM using the library <a href="http://scikit-learn.org/stable/modules/svm.html">here</a>.
This is the trivial portion of the competition.
</p>
<p>We highly recommend you open both the training and testing csv files in a program like Excel, which will help you modify columns of data and perform calculations quickly.
</p>
<p>
<a href="contests/rf/IO_sample.py.txt">The Data I/O code</a> from the decision tree lecture may still be useful.
</p>
<p><a href="https://www.kaggle.com/t/84362979276c4a219871ba63161aadcc">Competition Link</a></p>
<p><a href="#standard">Standard competition rules</a> apply, except for Rule #1, since we are using Scikit-Learn.</p>
<hr>
</section>
<h3 id=rftitle>Random Forests Competition Instructions</h3>
<section id="rf" style="display:none">
<p><i>9/27/17 - </i>Your job is to write the code to create a random forest, train it on the training data, and use it to predict the classes of the testing data.
We are trying to classify survival of passengers on the Titanic. The data we are using are from actual passengers on the ship.
Your random forest is supposed to classify whether a passenger survived [RIP (0) or Survived (1)], based on 7 different metrics (features).
The training data looks like this: </p>
<p>
<code>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, survival<br>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, survival<br>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, survival<br>
etc. (500 lines) <br></code>
</p>
<p>
and the testing data looks like this: </p>
<p>
<code>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7<br>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7<br>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7<br>
etc. (214 lines) <br>
</code>
</p>
<p>
Your end goal is to create a file which looks like: <br>
<code>
id, solution <br>
1, predicted_class <br>
2, predicted_class <br>
3, predicted_class <br>
4, predicted_class <br>
etc. (214 lines) <br>
</code>
For every set of features in line N in the training data, you should have a line <code>N, predicted_class</code> in your submission file.
</p>
<p><a href="#standard">Standard competition instructions and rules</a> apply.</p>
<p><a href="https://www.kaggle.com/t/458d4811158b4d15a0c4e337189608ac">Competition Link</a>
<p> In case you care, the features correspond to: </p>
<code>
pclass - Ticket class: 1 = 1st, 2 = 2nd, 3 = 3rd </br>
sex - Sex: 0 = Male, 1 = Female <br>
Age - Age in years <br>
sibsp - # of siblings / spouses aboard the Titanic <br>
parch - # of parents / children aboard the Titanic <br>
fare - passenger fare <br>
embarked - Port of Embarkation: 0 = Southampton, 1 = Cherbourg, 2 = Queenstown<br>
survival - Survival: 0 = No, 1 = Yes<br>
</code>
</p>
<p>Shell code is available <a href="contests/rf/shell.py.txt">here</a>.
<a href="contests/dct/IO_sample.py.txt">The Data I/O code</a> from the decision tree lecture will still be useful, but will need to be adapted for this data.
</p>
<hr>
</section>
<h3 id=firsttitle>2017-2018 First Competition Instructions</h3>
<section id="first" style="display:none">
<h4>The Data</h4>
<p><i>9/20/17 - </i>Welcome to the first contest of the year! Your job is to write the code to create a decision tree, train it on the training data, and use it to predict the classes of the testing data.
We are trying to classify breast cancer data. The data we are using is actual data from breast cancer patients.
Your decision tree is supposed to classify the type of breast cancer they have (benign (0) or malignant (1)), based on 9 different metrics (features).
The training data looks like this: </p>
<p>
<code>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, feature 8, feature 9, class <br>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, feature 8, feature 9, class <br>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, feature 8, feature 9, class <br>
etc. (533 lines) <br></code>
</p>
<p>
and the testing data looks like this: </p>
<p>
<code>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, feature 8, feature 9 <br>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, feature 8, feature 9 <br>
feature 1, feature 2, feature 3, feature 4, feature 5, feature 6, feature 7, feature 8, feature 9 <br>
etc. (150 lines) <br>
</code>
</p>
<p>
Your end goal is to create a file which looks like: <br>
<code>
id, solution <br>
1, predicted_class <br>
2, predicted_class <br>
3, predicted_class <br>
4, predicted_class <br>
etc. (150 lines) <br>
</code>
For every set of features in line N in the training data, you should have a line <code>N, predicted_class</code> in your submission file. </p>
<p>Shell code is available on the <a href="schedule.html">lecture schedule page</a>.
The Data I/O code from the decision tree lecture will still be useful, but will need to be adapted for this data.
</p>
<h4>How to participate</h4>
<p>
Our contests will be held on <a href="https://kaggle.com">Kaggle</a>, using <a href="https://inclass.kaggle.com">Kaggle InClass</a>. This allows us to upload data and competition instructions, as well as impose submission deadlines. It also ranks submissions automatically!
To participate:</p>
<p>
<ol><li>Create a Kaggle account by clicking "sign up" in the top right.
</li><li>Click on <a href="https://www.kaggle.com/t/cb7c483774b8408582fd138a3a49b0d4">this link</a> (the competition link).
</li><li>Download the training and testing data.
</li><li>Download the I/O Code.
</li><li>Write your algorithm and train it on the training data.
</li><li>Then, test it on the testing data, creating a submission file with the predicted ground truth in the format shown in the sample submission file.
</li><li>Upload your submission file and see your results!
</li><li>Tweak your code, repeating steps 5-8 to improve your accuracy and move up the leaderboard.
</ol>
</p>
<p>
Some More Rules:
<ol>
<li>Use Python. Everything we do is in Python this year. Don't use a library, other than numpy. For this contest only, if you have no Python experience, you can use whatever language you are comfortable with.
</li><li>The Competition ends at 11:59:00 PM next Monday, 9/25.
</li><li>The leaderboard on Kaggle that you can see is the Public Leaderboard, which is your accuracy for 50% of the testing data. Your final rankings will be based on the Private leaderboard, which is based on the other 50% of the testing data and will become public as soon as the competition ends. This is to prevent you from just writing a decision tree that overfits the testing data, which defeats the purpose.
</li></ol>
</p>
<hr>
</section>
</section>
</div>
<blockquote><h1 id=standardtitle>Competition Rules and Procedures</h1></blockquote>
<section id="standard" style="display:none">
<p>These instructions are common to almost every competition, so we're only listing them once.</p>
<p>
<p><strong>To Participate:</strong></p>
<ol>
<li>Create a <a href="https://kaggle.com">Kaggle</a> account if you don't already have one by clicking "sign up" in the top right.
</li><li>Click on the competition link, which will be posted in the lecture table on <a href="schedule.html">this page</a>.
</li><li>Download the training and testing data.
</li><li>Download any shell code from the website.
</li><li>Write your algorithm and train it on the training data.
</li><li>Then, test it on the testing data, creating a submission file with the predicted ground truth in the format shown in the sample submission file.
</li><li>Upload your submission file and see your results!
</li><li>Tweak your code, repeating steps 5-8 to improve your accuracy and move up the leaderboard.
</ol>
<p><strong>Rules and Procedures</strong></p>
<ol>
<li>Use Python. Everything we do is in Python this year. Unless otherwise specified, do not use any ML library other than numpy.
</li><li>The Competition ends at 11:59:00 PM the following Tuesday.
</li><li>The leaderboard on Kaggle that you can see is the Public Leaderboard, which is your accuracy for some percentage of the testing data.
Your final rankings will be based on the Private leaderboard, which is based on the other part of the testing data and will become public as soon as the competition ends. This is to prevent you from just writing a decision tree that overfits the testing data, which defeats the purpose.
</li>
</ol>
</section>
<blockquote><h1 id=updatestitle>Club Updates</h1></blockquote>
<section id="updates" style="display:none">
<h3>
Code Submission
</h3>
<p><i>7/1/17 - </i>Last year, we used <a href="https://www.dropitto.me">dropitto.me</a> as a simple way of submitting your code for grading. Since the service shut down this summer, we
are currently working on finding a replacement. We'll update this page before competitions start in October.
As of now, we are planning on setting up a Kaggle classroom.
Kaggle provides a seamless interface for creating competitions, posting data, and grading submissions.
More information can be found on <a href="https://inclass.kaggle.com/">Kaggle's website</a>.</p>
<h3>
<a id="welcome-to-tjhsst-machine-learning" class="anchor" href="#welcome-to-tjhsst-machine-learning" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Next Year
</h3>
<p><i>6/21/17 - </i>The year is finally over! We've got two months of summer ahead of us, but your ML Club officers already have a few plans for September.
First and foremost, we plan on expanding into a second room so that everyone can come. </p>
<p>Secondly, we will split into two groups. This will allow the club to cover complex topics with advanced members while not alienating newcomers.
Higher-level lectures with less mathematical rigor will be given to Freshmen/Sophmores in APCS with little Python or AI experience.
These lectures will be similar to this past year's, covering standard ML algorithms and the basics of Deep Learning.
For those already familiar with machine learning, we're making new, rigorous lectures, covering topics from object detection to language translation and combating networks.
</p>
<p>Thirdly, we're formalizing our mentoring initiative.
We've already helped a variety of projects off the books, but the ever-growing popularity of machine learning as a research tool allows us to make this a permanent part of the club.
Look for forms in September.
</p>
<p>
Finally, our website will undergo heavy renovation as we upgrade all of the lectures.
We look forward to seeing everyone in the fall!</p>
<h3>
Election Results
</h3>
<p><i>6/15/17 - </i>
This year's elections were quite close.
We are pleased to announce the new officers for next year:
<br>Justin Zhang, Teaching Coordinator
<br>Sylesh Suresh, Teaching Coordinator
<br> Our incumbents, Mihir Patel and Nikhil Sardana, won re-election and retained their positions.
Thanks to all who ran and voted, and we wish the best to Rohan and Nathaniel as they head off to college.
</p>
<h3>
Competition Due Dates
</h3>
<p><i>12/14/16 - </i>Unless otherwise specified, your code is due the Monday after the competition is introduced at 11:59:59 PM. No late submissions will be accepted.
<h3>
48-Hour Sign-Up
</h3>
<p><i>11/30/16 - </i>We are sorry to everyone who couldn't sign up because all the slots were filled. We have gotten a 48-hour sign-up restriction, so sign up first thing Monday each week.
</p>
</section>
<!--close main content-->
</section>
<script src="js/classie.js"></script>
<script src="js/demo1.js"></script>
</body>
</html>