--gem-precision and --int* are two ways to do the same thing. Functionality would still work and be accessible without the following.
|
cli.add<bool>("--int8", |
|
"Optimize speed even more aggressively sacrificing memory or precision by using 8bit integer GEMM with intgemm instead of floats. Only available on CPU. Corresponds to --gemm-precision int8"); |
|
cli.add<bool>("--int8Alpha", |
|
"Use a precomputed quantisation multipliers for the activations. Requires a special model. Corresponds to --gemm-precision int8Alpha"); |
|
cli.add<bool>("--int8shift", |
|
"Use a faster, shifted integer 8bit GEMM implementation. Corresponds to --gemm-precision int8shift"); |
|
cli.add<bool>("--int8shiftAlpha", |
|
"Use a faster, shifted integer 8bit GEMM implementation, with precomputed alphas. Corresponds to --gemm-precision int8shiftAlpha"); |
|
cli.add<bool>("--int8shiftAll", |
|
"Use a faster, shifted integer 8bit GEMM implementation even for matrices that don't have a bias. Beneficial on VNNI. Corresponds to --gemm-precision int8shiftAll"); |
|
cli.add<bool>("--int8shiftAlphaAll", |
|
"Use a faster, shifted integer 8bit GEMM implementation even for matrices that don't have a bias, with precomputed alphas. Should be the fastest option. Corresponds to --gemm-precision int8shiftAlphaAll"); |
|
cli.add<std::string>("--gemm-precision", |
|
"Use lower precision for the GEMM operations only. Supported values: float32, int16, int8, int8Alpha, int8shift, int8shiftAlpha, int8shiftAll, int8shiftAlphaAll", "float32"); |
|
cli.add<bool>("--dump-quantmult", |
--gem-precisionand--int*are two ways to do the same thing. Functionality would still work and be accessible without the following.marian-dev/src/common/config_parser.cpp
Lines 933 to 947 in 844800e