LipNet-C/dataflow-methodolgy-objective.rtf at master · Misprect/LipNet-C · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Times New Roman;}{\f1\fnil\fcharset0 Calibri;}{\f2\fnil\fcharset2 Symbol;}}
{\*\generator Riched20 10.0.22621}\viewkind4\uc1
\pard\sa200\sl276\slmult1\qc\f0\fs28\lang9 Data Flow\fs40\par

\pard{\pntext\f2\'B7\tab}{\*\pn\pnlvlblt\pnf2\pnindent0{\pntxtb\'B7}}\fi-360\li720\sa200\sl276\slmult1\fs28 At first, video data is converted to a custom extension .ALIGN or donwloaded directly. .ALIGN files represent frames of video data in alignment.\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 Pipelines will be created then.\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 After data will feed into Saptial Convolutional Neural Network Architecture\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 First three layers will be traditional combinations of CONV and MAX POOL layers\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 After FLATTEN layer BIDIRECTIONAL layers will be used for understanding and predicting phonemes.\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 Finally SOFTMAX layer is used as output layer.\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 Model develpment phase is over, now we can evaluate or make predictions\f1\fs22\par

\pard\sa200\sl276\slmult1\par
\par

\pard\sa200\sl276\slmult1\qc\f0\fs28 Methodology\par

\pard{\pntext\f2\'B7\tab}{\*\pn\pnlvlblt\pnf2\pnindent0{\pntxtb\'B7}}\fi-360\li720\sa200\sl276\slmult1 Dataset Collection - Dataset will collected from web source, for English, (optional) Hindi or Hinglish will be developed\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 Data Preprocessing - It will be done by making functions for different algorithms such as BILINEAR INTERPOLATION, etc\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 Data Distribution - This step will give final touch to data so that it can be feeded to model\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 Model Architecture Development - Spatiotemporal Convolutional Neural Network Architecture\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 Model Training and Development - Running iterations to achive accuracy\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 Model Evaluation - Validation and Testing\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 Model Deployment (optional) - Deployment on localhost, basically web development\f1\fs22\par

\pard\sa200\sl276\slmult1\par
\par

\pard\sa200\sl276\slmult1\qc\f0\fs28 Objective\par

\pard{\pntext\f2\'B7\tab}{\*\pn\pnlvlblt\pnf2\pnindent0{\pntxtb\'B7}}\fi-360\li720\sa200\sl276\slmult1 Lipreading model that will predict subtitles without audio using phoneme dictionary.\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 To achieve accuracy more than 60% which is much much better than human level performance.\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 Increase our understanding among Algorithms.\f1\fs22\par
{\pntext\f2\'B7\tab}\f0\fs28 (optional) Working on different languages for example English India, Hinglish, Hindi (Hinglish and Hindi models will be an innovation).\f1\fs22\par
}