File size: 892 Bytes
dc78b20 |
1 |
a practical part-of-speech tagger we present an implementation of a part-of-speech tagger based on a hidden markov model. the methodology enables robust and accurate tagging with few resource requirements. only a lexicon and some unlabeled training text are required. accuracy exceeds 96%. we describe implementation strategies and optimizations which result in high-speed operation. three applications for tagging are described: phrase recognition; word sense disambiguation; and grammatical function assignment. our semi-supervised model makes use of both labeled training text and some amount of unlabeled text. we train statistical models using unlabeled data with the expectation maximization algorithm. we report very high results (96% on the brown corpus) for unsupervised pos tagging using hidden markov models (hmms) by exploiting hand-built tag dictionaries and equivalence classes. |