The page describes the installation of Chaski.
Please go to download and download latest release.
If you only want to run phrase extraction using Chaski, you can just proceed, if you also want to run word alignment distributedly, you need to download and compile MGIZA package. download.
To compile Chaski you must have Java SDK 1.6+ installed. Chaski require ant for compilation.
Decompress the tarball you have downloaded, go to the directory it just created, say
$CHASKI_HOME=/home/goodboy/ChaskiV3
type
ant
And a successful build response means the jar is already built.
To use Chaski you must set the environment variable in your .bashrc file:
export Chaski_HOME=/home/goodboy/ChaskiV3
or, for csh
setenv Chaski_HOME /home/goodboy/ChaskiV3
It is also recommended you put Chaski_HOME/scripts/ to your PATH variable so you can access chaski easily.
In order to run distributed word alignment, you also need to setup the environment variable QMT_HOME, pointing to the directory where the binary and scripts of MGIZA can be found. To check if the setup is correct, try to type
${QMT_HOME}/bin/mgiza
That should print out the options for MGIZA.
You must also test if Hadoop works, basically if you can run hadoop example tasks, then it works OK.
Now you can go to Chaski_HOME/sample directory, where a small sample is provided. Type
cd $Chaski_HOME/sample gedit sample.config
You now need to modify HDFS-specific parameters for working directories.
The following parameters need to be modified:
# Merged corpus file on HDFS corpus=/user/qing/ChaskiTest2/corpus # Output dir for extracted phrases phrase=/user/qing/ChaskiTest2/extract # Output dir for lexicon lexicon=/user/qing/ChaskiTest2/lexicon # Output dir for scored phrase/reorder table ptable=/user/qing/ChaskiTest2/ptable # Output directory of Moses Phrase table moses-p=/user/qing/ChaskiTest2/moses-phrase # Output directory of Moses lexiconized reorder table moses-r=/user/qing/ChaskiTest2/moses-reorder
You may easily replace /user/qing/ChaskiTest2/ to an HDFS directory that you have write access.
Now you can run
$Chaski_HOME/scripts/extract sample.config
Wait for several minutes 1) And then run
hadoop dfs -cat /user/qing/ChaskiTest2/moses-phrase/* | wc -l
If the output is 74712 then your installation is done.