confidence estimation for machine translation we present a detailed study of confidence estimation for machine translation. various methods for determining whether mt output is correct are investigated, for both whole sentences and words. since the notion of correctness is not intuitively clear in this context, different ways of defining it are proposed. we present results on data from the nist 2003 chinese-to-english mt evaluation. we introduce a sentence level qe system where an arbitrary threshold is used to classify the mt output as good or bad. we study sentence and word level features for translation error prediction.