File size: 2,461 Bytes
dc78b20
1
we evaluate exdisco by com- paring the pertbrmance of discovered patterns against that of manually constructed systems on actual extraction tasks. 0 introduct ion intbrmation extraction is the selective xtrac- tion of specified types of intbrmation from nat- ural language text. the intbrmation to be extracted may consist of particular semantic classes of objects (entities), relationships among these entities, and events in which these entities participate. the extraction system places this intbrmation into a data base tbr retrieval and subsequent processing. in this paper we shall be concerned primar- ily with the extraction of intbrmation about events. in the terminology which has evolved tiom the message understanding conferences (muc, 1995; muc, 1993), we shall use the term subject domain to refer to a broad class of texts, such as business news, and tile term scenario to refer to tile specification of tile particular events to be extracted. for example, the "manage- ment succession" scenario for muc-6, which we shall refer to throughout this paper, involves in- formation about corporate executives tarting and leaving positions. the fundamental problem we face in port- ing an extraction system to a new scenario is to identify the many ways in which intbrmation about a type of event may be expressed in the text;. typically, there will be a few common tbrms of expression which will quickly come to nfind when a system is being developed. how- ever, the beauty of natural language (and the challenge tbr computational linguists) is that there are many variants which an imaginative writer cast use, and which the system needs to capture. finding these variants may involve studying very large amounts of text; in the sub- ject domain. this has been a major impediment to the portability and performance of event ex- traction systems. we present; in this paper a new approach to finding these variants automatically flom a large corpus, without the need to read or amlo- tate the corpus. this approach as been evalu- ated on actual event extraction scenarios. in the next section we outline the strncture of our extraction system, and describe the discov- ery task in the context of this system.automatic acquisition of domain knowledge for information extraction roman yangarber, ralph grishman past tapanainen courant  inst i tute of conexor oy mathemat ica l  sciences helsinki, f in land new york university {roman [ grishman}@cs, nyu.