====== MGIZA ======
MGIZA++ is a multi-threaded word alignment tool based on GIZA++. It extends GIZA++ in multiple ways:
** Multi-threading **
MGIZA++ can make use of multi-core platforms efficiently. Usually a quad-core machine can have a three-fold speedup over single-thread GIZA++.
** Memory optimization **
By eliminating duplicated tables, MGIZA++ can save a lot of memory comparing to GIZA++.
** Resume training **
MGIZA++ can resume training from any stage and continue training. For example you may be able to re-use previous available models and continue training directly from IBM Model 4 instead of all the way from Model 1.
** Integrated with Chaski **
MGIZA++ can be integrated into Chaski and run on cluters, which will give you even larger speedup.
===== Download =====
Latest version of MGIZA++ can be download here:
^ Version ^ Data ^ Link ^ Comment ^Release Note ^
| Version 0.6.3.1 | 2010-01-23 | [[https://sourceforge.net/projects/mgizapp/files/mgizapp-0.6.3.tar.gz/download|Download]] | Minor code clean and move download to Sf | [[Release Note MGiza#0.6.3 | Release Note]] |
| Version 0.6.3 | 2010-01-11 | [[http://www.cs.cmu.edu/~qing/release/mgiza-0.6.3-10-01-11.tar.gz|Download]] | Memory optimization and bug fix | [[Release Note MGiza#0.6.3 | Release Note]] |
| Version 0.6.2 | 2009-12-07 | [[http://www.cs.cmu.edu/~qing/release/mgiza-0.6.2-09-12-07.tar.gz|Download]] |Minor interface change to keep compatibility with [[chaski:Release Note Chaski#0.2.2|Chaski 0.2.2]] | [[Release Note MGiza#0.6.2 | Release Note]] |
| Version 0.6.1 | 2009-11-17 | [[http://www.cs.cmu.edu/~qing/release/mgiza-0.6.1-09-11-17.tar.gz|Download]] |Unnecessary dependencies removed | [[Release Note MGiza#0.6.1 | Release Note]] |
| Version 0.6 | 2009-11-10 | [[http://www.cs.cmu.edu/~qing/release/qmt-0.6-chc-09-11-10.tar.gz|Download]] | | |
===== Installation =====
To compile MGIZA++ you need the following package installed:
- Berkeley DB (libdb)
- Berkeley DB++ (libdb++)
- Boost library. (regex, string)
After the dependencies are installed. As of version 0.6.1, you do not need the dependencies of berkeley db, but you still need boost library.. Just go to the source directory of the source and
./configure --prefix=${QMT_HOME}
make
make install
If you want to use MGIZA++ with Chaski, you need to add the environment variable ''QMT_HOME'' to your ''.bashrc''.
For boost library you can either download it from [[http://www.boost.org]] or install the header package of your linux distribution.
===== Usage =====
The basic usage of MGIZA++ is easy, given that you know how to run GIZA++. MGIZA++ is compatible with GIZA++'s parameters, and you can run:
${QMT_HOME}/bin/mgiza -ncpu 5 [ALL-YOUR-GIZA-PARAMETERS]
to tell mgiza to run five-threads.
The alignment output of MGIZA++ is somehow different from GIZA++, given n-threads, the alignment output will be:
prefix.A3.final.part0
prefix.A3.final.part1
...
prefix.A3.final.part(n-1)
To combine the alignments you need to run:
${QMT_HOME}/scripts/merge_alignment.py ${prefix}.A3.final.part* > ${prefix}.A3.final
For advanced usage please refer to the following "HOWTOs"
* Force alignment and resume training [[mgiza:forcealignment]]