This section will explain all the parameters used in MGIZA++, listed in its original categories such as input/output/em.
Parameters may have different aliases, and if more than one aliases appear in the command line or configure file, the one appears latest will be kept, and the one appears in command line will override that in configure file.
The parameters defines the training sequence, i.e, how many iterations need to be performed for each type of model. The simpler models are used to initialize the more complexed model, so perform a number of iterations for simpler models are necessary. HMM is considered a replacement of model 2, most widely used training sequnce is:
-m1 5 -m2 0 -mh 5 -m3 3 -m4 3 -m5 0 -m6 0
| Name | Aliases | Type | Meaning |
|---|---|---|---|
| model1iterations | m1, noiterationsmodel1 | INT | Number of iterations when training model 1 |
| model2iterations | m2, noiterationsmodel2 | INT | Number of iterations when training model 2 |
| model3iterations | m3, noiterationsmodel3 | INT | Number of iterations when training model 3 |
| model4iterations | m4, noiterationsmodel4 | INT | Number of iterations when training model 4 |
| model5iterations | m5, noiterationsmodel5 | INT | Number of iterations when training model 5 |
| model6iterations | m6, noiterationsmodel6 | INT | Number of iterations when training model 6 |
| hmmiterations | mh, numberofiterationsforhmmalignmentmodel | INT | Number of iterations when training hmm |
| Name | Aliases | Type | Meaning |
|---|---|---|---|
| compactalignmentformat | BOOL | Whether output the compact alignment format, the compact format does not have text of source/target sentence, and output only alingment links, such as 1 1 2 5 |
|
| countoutputprefix | STRING | It is mostly used in distributed training, similar to outputprefix but will not dump the normalized model but the count tables before normalization | |
| dumpcount | BOOL | A trigger for dumping counts in addition to the normalized models. Note that the count will ONLY be dumped in the final step. | |
| dumpcountusingwordstring | BOOL | By default the word ids will appear in the count tables, if the parameter is set to true then the word surface form will be used. | |
| logfile | l | STRING | The path for log file |
| outputfilepreifx | o | STRING | The prefix for output files |
| outputpath | STRING | (Not used and will be removed in next version ) | |
| model1dumpfrequency | t1 | INT | Specify the model/alignment will be dump in every t1 iterations, 0 means no dump, 1 means dump every iteration. Set t1=m1, so only the last iteration is dumped |
| model2dumpfrequency | t2 | INT | Dump frequnece of model 2 |
| transferdumpfrequency | t2to3 | INT | If model 2 is used instead of HMM, an additional iteration is needed to transfer from model 2 to model 3 this triggers dumping of that iteration |
| model345dumpfrequency | t345 | INT | Dump frequnece of model 3/4/5/6 |
| hmmdumpfrequency | th | INT | Dump frequnece of HMM |
| nbestalignments | BOOLEAN | Whether dump N-Best alignment instead of Viterbi Alignmetn | |
| nodumps | BOOLEAN | If true, then no dumps will be made for model files and alignment files | |
| onlyaldumps | BOOLEAN | If true, only dump alignments (This will force the model 3/4/5 to output alignment of last step, i.e *.A3.final |
The output filenames is generated as follows:
outputname = outputprefix + "." + modelType + trainingStage + "." + iteration;
where outputprefix is specified in the parameter, model type can be t, a, d, n, D and h, the training stage can be 1 (model 1) 2 (model 2) 3 (model 3/4/5, for d model), 4, 5 and hmm.
| Name | Aliases | Type | Meaning |
|---|---|---|---|
| corpusfile | c | STRING | Input corpus file (The snt file) |
| testcorpusfile | tc | STRING | Input corpus file (The snt file), the file will only be used in alignment but the counts will not affect the models. |
| sourcevocabularyfile | s | STRING | Source vocabulary file (.vcb), note that in Moses, the definition of source and target are reversed, so when training ch-en, ch is target and en is source |
| targetvocabularyfile | t | STRING | target vocabulary file (.vcb) |
| restart | INT | (Only in MGIZA), restart training from a certain level, will be explained later | |
| previousa/d/d4/d42/hmm/n/t | STRING | Previous models for resume training |
MGIZA supports resuming training, which needs loading previous models and set correct restart level. Below is a table list the restart levels and model files needed for the level.
| Restart Level | Previous table(s) | ||||||
|---|---|---|---|---|---|---|---|
| restart | meaning | t(lex) | a(dist) | hmm | d(dist) | n(fert) | d4/d42 |
| 0 | normal training | ||||||
| 1 | continue model 1 from model 1 | ||||||
| 2 | initialize model 2 from model 1 | ||||||
| 3 | continue model 2 from model 2 | ||||||
| 4 | initialize hmm from model 1 | ||||||
| 5 | initialize hmm from model 2 | ||||||
| 6 | continue hmm from hmm | ||||||
| 7 | initialize model3 from hmm | ||||||
| 8 | initialize model3 from model2 | ||||||
| 9 | continue model3 from model 3 | ||||||
| 10 | initialize model4 from model 3 | ||||||
| 11 | continue model4 from model 4 | ||||||
| Name | Aliases | Type | Meaning |
|---|---|---|---|
| countincreasecutoff | FLOAT | When accumulating individual counts, if the increment is smaller than this amount, then the increment will not be added. (It seems the variable has NO EFFECT and will be overwritten by mincountincrease | |
| mincountincrease | FLOAT | Same as countincreasecutoff, but it is the one that really takes effect. | |
| countincreasecutoffal | FLOAT | The meaning is similar to mincountincrease, however it does not work for model 1, 2 and HMM but only works for model 3 and on | |
| probcutoff | FLOAT | When output model files, all entries smaller than the value will be ignored. | |
| probsmooth | FLOAT | When a probability entry cannot be found on models, the value will be used. | |
| peggedcutoff | FLOAT | When doing model 3,4,5, a different method is used which first get viterbi alignment and the shuffle several links to “sample” new alignment, this probability is a relative threshold which alignment will be accepted in the sampling procedure. The alignment will be accepted if score(a')>score(a^*) \times peggedcutoff. Where a^* is the viterbi alignment |
| Name | Aliases | Type | Meaning |
|---|---|---|---|
| emalsmooth | FLOAT | For HMM only, when specified in emSmoothHMM parameter, the jump probabilities will be interpolated with uniform distribution, and the value represents the weight for uniform distribution. (1.0 means always use uniform value, 0 means no interpolation. | |
| emSmoothHMM | SHORT | Flags for smoothing method for HMM model, the first bit toggle “modiefied counts” method, and the second bit toggles smoothing using emalsmooth parameter. | |
| model23smoothfactor | FLOAT | The smoothing factor for distortion model, the probability will be interpolate with uniform distribution, the smoothing factor is weight of uniform distribution, (1.0 means always use uniform value, 0 means no interpolation) | |
| model4smoothfactor | FLOAT | The smoothing factor for distortion model (model 4), the probability will be interpolate with uniform distribution, the smoothing factor is weight of uniform distribution, (1.0 means always use uniform value, 0 means no interpolation) | |
| model5smoothfactor | FLOAT | The smoothing factor for distortion model (model 4), the probability will be interpolate with uniform distribution, the smoothing factor is weight of uniform distribution, (1.0 means always use uniform value, 0 means no interpolation) | |
| nsmooth | Float | Smooth factor for word length dependent fertility parameters | |
| nsmoothgeneral | FLOAT | Smooth factor for global fertility probability |
For nsmooth and nsmoothgeneral, they are used in renormalization of fertility table. Fertility table holds the probability of P(N|e_i), N is the number of words e_i translates to. Some words only appear once or twice and it is impossible to get sufficient statistics for all possible Ns. Therefore during renormalization the probability will be smoothed by interpolating with two global probabilities, first is P(N|Len(e_i)), where Len(e_i) is byte length of e_i, and the other is P(N). So remember DIFFERENT ENCODING will give you DIFFERENT RESULT when training word alignment models.
| Name | Aliases | Type | Meaning |
|---|---|---|---|
| compactadtable | BOOL | Whether to use 3-dimension (=1) or 4-dimension (=0) distortion/alignment table | |
| deficientdistortionforemptyword | SHORT | ||
| depm4 | SHORT | Flags for dependencies in model 4 | |
| depm5 | SHORT | Flags for dependencies in model 5 | |
| emalignmentdependencies | SHORT | Flags of dependencies in the HMM alignment model | |
| emprobforempty | FLOAT | The probability of empty words when doing forward-backward training in HMM, will affect the NULL-word alignments in HMM | |
| m5p0 | FLOAT | The probability of empty word for model 5, if set to -1, then it will be trained | |
| p0 | FLOAT | The probability of empty word for model 3/4, if set to -1, then it will be trained. It is generally a bad idea to have it trained since model 3/4 are deficient. Usually ppl have magic numbers, typically 0.975 or so | |
| maxfertility | INTEGER | Maximum fertility allowed in fertility models. It will affect training in two ways: 1. The fertility table will have only maxfertility+1 probability for each word. 2. The input corpus will be processed, if a sentence pair exceeds the ratio, for example one word in source side and maxfertility+1 words in target side, the sentecne pair will be truncated |
About compactadtable. The original distortion model has three dependencies: P(i|j,L,M), where i,j are positions in source/target sentences and L,M are source/target sentence length. However the dimension becomes too high so we turn off M if compactadtable is 1.
| Bit | Meaning |
|---|---|
| 1 | setnece length |
| 2 | (target) previous class |
| 3 | (target) previous position |
| 4 | (source) previous position |
| 5 | (source) previous class |
| Bit | Meaning |
|---|---|
| Dependencies for first word | |
| 1 | source position |
| 2 | target position |
| 3 | source class |
| 4 | target class |
| Dependencies for second word and on | |
| 5 | source position |
| 6 | target position |
| 7 | source class |
| 8 | target class |
Below are parameters related to sentence probability (In most pipeline the sentence occurrence just set to 1, so ne effect at all, just ignore them, plus, have no idea why these parameters are there, it comes from GIZA++):
manlexfactor1 = 0 (When setence occurrence is set to -1.0, this value will be used) manlexfactor2 = 0 (When setence occurrence is set to -2.0, this value will be used) manlexmaxmultiplicity = 20 ()
The only parameter to control multi-threading is ncpus,
Will enable 5 threads, and the final output will also be separated in 5 parts,
prefix.A3.final.part0 ... prefix.A3.final.part5