Installation of Chaski

The page describes the installation of Chaski.

1. Download latest release

Please go to download and download latest release.

If you only want to run phrase extraction using Chaski, you can just proceed, if you also want to run word alignment distributedly, you need to download and compile MGIZA package. download.

2. Untar and compile

To compile Chaski you must have Java SDK 1.6+ installed. Chaski require ant for compilation.

Decompress the tarball you have downloaded, go to the directory it just created, say

 $CHASKI_HOME=/home/goodboy/ChaskiV3

type

 ant

And a successful build response means the jar is already built.

3. Set the environment

To use Chaski you must set the environment variable in your .bashrc file:

   export Chaski_HOME=/home/goodboy/ChaskiV3

or, for csh

   setenv Chaski_HOME /home/goodboy/ChaskiV3

It is also recommended you put Chaski_HOME/scripts/ to your PATH variable so you can access chaski easily.

In order to run distributed word alignment, you also need to setup the environment variable QMT_HOME, pointing to the directory where the binary and scripts of MGIZA can be found. To check if the setup is correct, try to type

   ${QMT_HOME}/bin/mgiza

That should print out the options for MGIZA.

4. Test

You must also test if Hadoop works, basically if you can run hadoop example tasks, then it works OK.

Now you can go to Chaski_HOME/sample directory, where a small sample is provided. Type

   cd $Chaski_HOME/sample
   gedit sample.config

You now need to modify HDFS-specific parameters for working directories.

The following parameters need to be modified:

# Merged corpus file on HDFS
corpus=/user/qing/ChaskiTest2/corpus
 
# Output dir for extracted phrases
phrase=/user/qing/ChaskiTest2/extract
 
# Output dir for lexicon
lexicon=/user/qing/ChaskiTest2/lexicon
 
# Output dir for scored phrase/reorder table
ptable=/user/qing/ChaskiTest2/ptable
 
# Output directory of Moses Phrase table
moses-p=/user/qing/ChaskiTest2/moses-phrase
 
# Output directory of Moses lexiconized reorder table
moses-r=/user/qing/ChaskiTest2/moses-reorder

You may easily replace /user/qing/ChaskiTest2/ to an HDFS directory that you have write access.

Now you can run

 $Chaski_HOME/scripts/extract sample.config

Wait for several minutes 1) And then run

 hadoop dfs -cat /user/qing/ChaskiTest2/moses-phrase/* | wc -l

If the output is 74712 then your installation is done.

1) It will look ridiculously slow for one thousand sentence, which usually take 15 seconds when run locally, it is because of the overhead of allocating nodes etc, and we do suggest you only use Chaski for reasonably large systems.
chaski/install.txt · Last modified: 2009/11/28 11:39 by edwardgao
CC Attribution-Noncommercial-Share Alike 3.0 Unported www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0