MGIZA++ is a multi-threaded word alignment tool based on GIZA++. It extends GIZA++ in multiple ways:
Multi-threading
MGIZA++ can make use of multi-core platforms efficiently. Usually a quad-core machine can have a three-fold speedup over single-thread GIZA++.
Memory optimization
By eliminating duplicated tables, MGIZA++ can save a lot of memory comparing to GIZA++.
Resume training
MGIZA++ can resume training from any stage and continue training. For example you may be able to re-use previous available models and continue training directly from IBM Model 4 instead of all the way from Model 1.
Integrated with Chaski
MGIZA++ can be integrated into Chaski and run on cluters, which will give you even larger speedup.
Latest version of MGIZA++ can be download here:
| Version | Data | Link | Comment | Release Note |
|---|---|---|---|---|
| Version 0.6.3.1 | 2010-01-23 | Download | Minor code clean and move download to Sf | Release Note |
| Version 0.6.3 | 2010-01-11 | Download | Memory optimization and bug fix | Release Note |
| Version 0.6.2 | 2009-12-07 | Download | Minor interface change to keep compatibility with Chaski 0.2.2 | Release Note |
| Version 0.6.1 | 2009-11-17 | Download | Unnecessary dependencies removed | Release Note |
| Version 0.6 | 2009-11-10 | Download |
To compile MGIZA++ you need the following package installed:
After the dependencies are installed. As of version 0.6.1, you do not need the dependencies of berkeley db, but you still need boost library.. Just go to the source directory of the source and
./configure --prefix=${QMT_HOME} make make install
If you want to use MGIZA++ with Chaski, you need to add the environment variable QMT_HOME to your .bashrc.
For boost library you can either download it from http://www.boost.org or install the header package of your linux distribution.
The basic usage of MGIZA++ is easy, given that you know how to run GIZA++. MGIZA++ is compatible with GIZA++'s parameters, and you can run:
${QMT_HOME}/bin/mgiza -ncpu 5 [ALL-YOUR-GIZA-PARAMETERS]
to tell mgiza to run five-threads.
The alignment output of MGIZA++ is somehow different from GIZA++, given n-threads, the alignment output will be:
prefix.A3.final.part0 prefix.A3.final.part1 ... prefix.A3.final.part(n-1)
To combine the alignments you need to run:
${QMT_HOME}/scripts/merge_alignment.py ${prefix}.A3.final.part* > ${prefix}.A3.final
For advanced usage please refer to the following “HOWTOs”