word sense disambiguation is often cast as a problem in supervised learning, where a disambiguator is induced from a corpus of manually sense—tagged text using methods from statistics or machine learning. these approaches typically represent the context in which each sense—tagged instance of a word occurs with a set of linguistically motivated features. a learning algorithm induces a representative model from these features which is employed as a classifier to perform disambiguation. this paper presents a corpus—based approach that results in high accuracy by combining a number of very simple classifiers into an ensemble that performs disambiguation via a majority vote. this is motivated by the observation that enhancing the feature set or learning algorithm used in a corpus—based approach does not usually improve disambiguation accuracy beyond what can be attained with shallow lexical features and a simple supervised learning algorithm. for example, a naive bayesian classifier (duda and hart, 1973) is based on a blanket assumption about the interactions among features in a sensetagged corpus and does not learn a representative model. despite making such an assumption, this proves to be among the most accurate techniques in comparative studies of corpus—based word sense disambiguation methodologies (e.g., (leacock et al., 1993), (mooney, 1996), (ng and lee, 1996), (pedersen and bruce, 1997)). these studies represent the context in which an ambiguous word occurs with a wide variety of features. however, when the contribution of each type of feature to overall accuracy is analyzed (eg. (ng and lee, 1996)), shallow lexical features such as co—occurrences and collocations prove to be stronger contributors to accuracy than do deeper, linguistically motivated features such as part—of—speech and verb—object relationships. it has also been shown that the combined accuracy of an ensemble of multiple classifiers is often significantly greater than that of any of the individual classifiers that make up the ensemble (e.g., (dietterich, 1997)). in natural language processing, ensemble techniques have been successfully applied to part— of—speech tagging (e.g., (brill and wu, 1998)) and parsing (e.g., (henderson and brill, 1999)). when combined with a history of disambiguation success using shallow lexical features and naive bayesian classifiers, these findings suggest that word sense disambiguation might best be improved by combining the output of a number of such classifiers into an ensemble. this paper begins with an introduction to the naive bayesian classifier. the features used to represent the context in which ambiguous words occur are presented, followed by the method for selecting the classifiers to include in the ensemble. then, the line and interesi data is described. experimental results disambiguating these words with an ensemble of naive bayesian classifiers are shown to rival previously published results. this paper closes with a discussion of the choices made in formulating this methodology and plans for future work.this work extends ideas that began in collaboration with rebecca bruce and janyce wiebe. a preliminary version of this paper appears in (pedersen, 2000). word sense disambiguation is often cast as a problem in supervised learning, where a disambiguator is induced from a corpus of manually sense—tagged text using methods from statistics or machine learning. this paper closes with a discussion of the choices made in formulating this methodology and plans for future work. each of the nine member classifiers votes for the most probable sense given the particular context represented by that classifier; the ensemble disambiguates by assigning the sense that receives a majority of the votes. a naive bayesian classifier assumes that all the feature variables representing a problem are conditionally independent given the value of a classification variable. these approaches typically represent the context in which each sense—tagged instance of a word occurs with a set of linguistically motivated features. this approach was evaluated using the widely studied nouns line and interest, which are disambiguated with accuracy of 88% and 89%, which rivals the best previously published results. experimental results disambiguating these words with an ensemble of naive bayesian classifiers are shown to rival previously published results.