
407 South Craig Street
Pittsburgh, PA 15213
TEL: 412-268-5634
email: q
...@cs.cmu.edu
I am a Ph.D student in
Language Technology Institution∞,
School of Computer Science∞,
Carnegie Mellon University∞. Currently I am working with
Stephan Vogel∞ on GALE project, working on improving word alignment for phrase-based
machine translation systems and and parallelizing the training pipeline of machine translation.
Also, I am working with
Stephan Vogel∞ and
Noah Smith∞ on
INCA∞ project, building open source distributed statistical machine translation system on clusters, especially for
MapReduce frameworks such as Hadoop.
I help to organize the
Large-Scale Lunch∞, a monthly seminar about the practical usage of parallel and distributed computing for solving problems in applied computer science.
For whoever knows me, plz find me on facebook:
Facebook∞, also please visit the site
www.kyloo.net∞ where my wife and I throw all kinds of things in.
I am editing several pages in Wikipeida Chinese, if you know Chinese, please help me improve them:
Statistical Machine Translation∞ Machine Translation∞
Research
I focus on parallelizing the pipeline of machine translation framework, released software packages such as Multi-Threaded GIZA++
MGIZA++ ∞ and Parallel GIZA++
PGIZA++ ∞. And an internal version of parallel phrase extraction tool Chaksi is being developped and testing on Yahoo's M45 cluster. All these work will be part of
INCA∞ project, funded by NSF.
Also, I am interested in improving the word alignment quality for machine translation, and looking forward to integrating syntactical information into phrase based machine translation.
Before joining LTI at 2007, I worked on speech recognition, worked mainly on ASR decoders. Please refer to my
CV for more detail on the research.
Education
- Master Student
- Language Technology Institution, Carnegie Mellon University
- August, 2007 – August 2009
- GPA: 3.91
- Master of Engineering
- National Key Laboratory for Machine Perception, Peking University
- September, 2004 – June, 2007
- GPA: 3.82
- Diploma Thesis: Research and Implementation of Chinese Spoken Document Retrieval System
- Graduate with First Honor
- Bachelor of Science
- School of Mathematics Science, Peking University
- Major: Mathematics, Scientific and Engineering Computing
- September, 2000 – June 2004
- GPA: 3.25
- Graduate Research: Automatic Spoken English Quality Evaluation System
- Second Major: Economics
Software
Please visit
http://geek.kyloo.net/software∞ for a list of my softwrae.
- Chaksi∞: A software package for training phrase-based machine translation system on Hadoop clusters, together with MGIZA it can train large scale model in hours.
- MGIZA∞ : Multi-threaded GIZA. It is a extended and optimized version of GIZAPP, which can run multi-threaded, and provide additional functionalities/optimizations such as:
- Resume training from previous models. You may restart training from any step give previous model.
- Memory usage optimization. Eliminate duplicated tables in memory, which may save hundreds of megabytes of memory. It is crucial for distributed alignment.
- Integrate with Chaksi. The verison is fully integrated with Chaksi and therefore can be run on Hadoop clusters. (Currently only work for Hadoop 0.20.1+)
Publications
Journal Papers
- Qin Gao, Stephan Vogel. Training phrase-based machine translation models on the cloud: Open source machine translation toolkit Chaski. The Prague Bulletin of Mathematical Linguistics No. 93, 2010, pp. 37–46. ISBN 978-80-904175-4-0. doi: 10.2478/v10108-010-0004-8.
- Jonathan H. Clark, Jonathan Weese, Byung Gyu Ahn, Andreas Zollmann, Qin Gao, Kenneth Heafield, Alon Lavie. The Machine Translation Toolpack for LoonyBin: Automated Management of Experimental Machine Translation HyperWorkflows. The Prague Bulletin of Mathematical Linguistics No. 93, 2010, pp. 117–126. ISBN 978-80-904175-4-0. doi: 10.2478/v10108-010-0002-x.
Conference Papers
- Nguyen Bach, Qin Gao, Stephan Vogel, Source-side Dependency Tree Reordering Models with Subtree Movements and Constraints, MT Summit XII, 2009
- Francisco Guzman, Qin Gao, Stephan Vogel, Reassessment of the Role of Phrase Extraction in SMT, MT Summit XII, 2009
- Qin Gao, Stephan Vogel, "Parallel Implementations of Word Alignment Tool", Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 49-57, June, 2008 pdf∞ bib∞
- Nguyen Bach, Qin Gao, Stephan Vogel, "Improving Word Alignment with Language Model Based Confidence Scores", Proceedings of the Third Workshop on Statistical Machine Translation, pp. 151-154, June, 2008. pdf∞ bib∞
- Almut Silja Hildebrand, Kay Rottmann, Mohamed Noamany, Qin Gao, Sanjika Hewavitharana, Nguyen Bach, Stephan Vogel, "Recent Improvements in the CMU Large Scale Chinese-English SMT System", Proceedings of ACL-08: HLT, Short Papers, pp. 77-80, June, 2008 pdf∞ bib∞
- Qin Gao, Xiaojun Lin, Xihong Wu, "Just-in-time Latent Semantic Adaptation on Language Model for Chinese Speech Recognition Using Web Data", International Workshop on Spoken Language Technology(SLT), pp.50-53, September, 2006. abstract & fulltext∞
- Runqiang Han, Pei Zhao, Qin Gao, Zhiping Zhang, Hao Wu, Xihong Wu, "CASA Based Speech Separation for Robust Speech Recognition", International Conference on Speech and Language Processing(ICSLP), pp.78-81, September, 2006. pdf∞
Slides & Reports
- Large Scale Machine Translation Architectures, Reading report for Advanced Machine Translation Seminar, Spring 2008 ppt∞
- Parallelizing the Training Procedure of Statistical Phrase-based Machine Translation, Student Research Symposium, 2008 pdf∞
My Gadgets
(From oldest to newest)
- Palm Treo650 (Used to be my phone and my notebook and Gameboy simulator)
- N800 (My favorite, although I do not bring it around, I use it to read wikipedia every night, and it killed Treo with GarnetVM∞ )
- MIO 520 (PNA, running MIO Mapper as well as iGo 8, hacked and being used as Skype phone when I forget to bring N800. SDIO Wifi card required, of coz)
- EeePC 1000h (It is a laptop, but small enough to replace most gadgets, I am now playing around with it. Less fun with Windoz installed anyway)
Some links about me
Well I have to say the information from media is not always accurate, I must clarify although I work on cloud computing, I am not work directly on TransTac project which does the English-Iraqi translation
The Tartan Online: CMU does research on cloud computing∞
Pittsburgh TRIBUNE-REVIEW: CMU pushes into frontiers of 'cloud computing'∞
CMU works on processing remote computer data∞
追求卓越,勇创佳绩——北大获第九届“挑战杯”学术科技作品竞赛“优胜杯” (Some Chinese staff about my work on TV Caption Alignment)∞
Private Zone ||
CodeSnipplet ||
MySchedule ||
ConfigurationFiles
There are no comments on this page. [Add comment]