User:Alvations/Semeval-unwikified

SemEval
Academics
Disciplines:	Natural Language Processing ; Computational Linguistics ; Semantics
Umbrella ; Organization:	ACL-SIGLEX
Workshop Overview
Founded:; (Origin)	1998 (Senseval)
Latest:	Semseval 2 ; Summer 2010 (Ended); ACL @ Uppsala, Sweden
Upcoming:	Semseval 3 ; Summer 2012(tentative) ; ACL @ Jeju Island, Korea
History
Senseval-1	1998 @ Sussex
Senseval-2	2001 @ Toulouse
Senseval-3	2004 @ Barcelona
SemEval-1 /; Senseval-4	2007 @ Prague
SemEval-2	2010 @ Uppsala
	v; t; e;

SemEval (originally Senseval) is a series of workshops conducted to evaluate semantic analysis systems. Traditionally, computational semantic analysis focused on Word Sense Disambiguation (WSD) tasks. WSD is an open problem of natural language processing, which governs the process of identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings (polysemy).

ACL-SIGLEX (Special Interest Group on the LEXicon of the Association for Computational Linguistics)is the umbrella organization for SemEval semantic evaluations and the SENSEVAL word-sense evaluation exercises. The first three evaluation workshops, Senseval-1, Senseval-2 and Senseval-3, were focused on Word Sense Disambiguation Systems (WSD). More recently, Senseval had become SemEval, a series of evaluation exercises for semantic annotation involving a much larger and more diverse set of tasks ^[1]. Beginning with the 4th workshop, SemEval-1, the nature of the tasks evolved to include semantic analysis tasks outside of word sense disambiguation.

The framework of the SemEval/Senseval evaluation workshops emulates Message Understanding Conferences (MUCs) and other evaluation workshops ran by ARPA (Advanced Research Projects Agency, renamed the Defense Advanced Research Projects Agency (DARPA)).

Stages of SemEval/Senseval evaluation workshops^[2]

Firstly, all likely participants were invited to express their interest and participate in the exercise design.
A timetable towards a final workshop was worked out.
A plan for selecting evaluation materials was agreed.
'Gold standards' for the individual tasks were acquired, often human annotators were considered as a gold standard to measure precision and recall scores of computer systems. These 'gold standards' are what the computational systems strive towards. (In WSD tasks, human annotators were set on the task of generating a set of correct WSD answers(i.e. the correct sense for a given word in a given context)
The gold standard materials, without answers, were released to participants, who then had a short time to run their programs over them and return their sets of answers to the organizers.
The organizers then scored the answers and the scores were announced and discussed at a workshop

History[edit]

"-Eval" Etymology[edit]

"-Eval" is a fairly recent morpheme for conferences, workshops and algorithms related to computational evaluations. The "-Eval" innovation originate from the evaluation metric for computational grammar systems. Grammar Evaluation Interest Group (GEIG) evaluation metric, also termed as the Parseval metric ,^[3], a blend of grammatical "pars"ing and system "eval"uation. Progessively, a series of well intended puns motivates the popular use of the "-eval" morpheme:

Parseval's (commonly spelled as Percival), one of King Arthur's legendary Knights of the Round Table, involvement in the quest for the holy grail symbolizes computational linguists' ultimate quest for computer to understand natural language.

Parseval coincides with the Parseval theorem (a fourier series related theorem that most computer scientists are familiar with).

Pre-WSD evaluations[edit]

From the earliest days, assessing the quality of WSD algorithms had been primarily a matter of intrinsic evaluation, and “almost no attempts had been made to evaluate embedded WSD components”^[4]. Only very recently have extrinsic evaluations begun to provide some evidence for the value of WSD in end-user applications ^[5]. Until 1990 or so, dissions of the sense disambiguation task focused mainly on illustrative examples rather than comprehensive evaluation. The early 1990s saw the beginnings of more systematic and rigorous intrinsic evaluations, including more formal experimentation on small sets of ambiguous words ^[6].

Senseval to Semeval[edit]

In April 1997, a workshop entitled Tagging with Lexical Semantics: Why, What, and How? was held in conjunction with the Conference on Applied Natural Language Processing^[7]. At the time, there was a clear recognition that manually annotated corpora had revolutionized other areas of NLP, such as part-of-speech tagging and parsing, and that corpus-driven approaches had the potential to revolutionize automatic semantic analysis as well^[8]. Kilgarriff recalls that there was “a high degree of consensus that the ﬁeld needed evaluation,” and several practical proposals by Resnik and Yarowsky kicked off a discussion that led to the creation of the Senseval evaluation exercises.^[9]

Senseval-1 took place in the summer of 1998 for English, French, and Italian, culminating in a workshop held at Herstmonceux Castle, Sussex, England on September 2–4.

Senseval-2 took place in the summer of 2001, and was followed by a workshop held in July 2001 in Toulouse, in conjunction with ACL 2001. Senseval-2 included tasks for Basque, Chinese, Czech, Danish, Dutch, English, Estonian, Italian, Japanese, Korean, Spanish, Swedish.

Senseval-3 took place in March–April 2004, followed by a workshop held in July 2004 in Barcelona, in conjunction with ACL 2004. Senseval-3 included 14 different tasks for core word sense disambiguation, as well as identification of semantic roles, multilingual annotations, logic forms, subcategorization acquisition.

Semeval-1/Senseval-4 took place in 2007, followed by a workshop held in conjunction with ACL in Prague. Semeval-1 included 18 different tasks targeting the evaluation of systems for the semantic analysis of text.

Semeval-2 took place in 2010, followed by a workshop held in conjunction with ACL in Uppsala. Semeval-2 included 18 different tasks targeting the evaluation of semantic analysis systems.

Senseval & Semeval Tasks[edit]

Senseval-1 & Senseval-2 focused on evaluation WSD systems on major languages that were available corpus and computerized dictionary. Senseval-3 looked beyond the lexemes and started to evaluate systems that look into wider areas of semantics, viz. Semantic Roles (technically known as Theta roles in formal semantics), Logic Form Transformation (commonly semantics of phrases, clauses or sentences are represented in first-order logic forms) and Senseval-3 explores performances of semantics analysis on Machine Translations.

As the types of different computational semantic systems grows beyond the coverage of WSD, Senseval evolves into Semeval, where more aspects of computational semantic systems were evaluated. The tables below (1) reflects the workshop growth from Senseval to Semeval and (2) gives an overview of which area of computational semantics was evaluated throughout the Senseval/Semeval workshops.

Senseval & Semeval Tasks Overview[edit]

Workshop	No. of Tasks	Areas of study	Languages of Data Evaluated
Senseval-1	3	Word Sense Disambiguation (WSD) - Lexical Sample WSD tasks	English, French, Italian
Senseval-2	12	Word Sense Disambiguation (WSD) - Lexical Sample, All Words, Translation WSD tasks	Czech, Dutch, English, Estonian, Basque, Chinese, Danish, English, Italian, Japanese, Korean, Spanish,Swedish
Senseval-3	16 (including 2 cancelled tasks)	Logic Form Transformation, Machine Translation (MT) Evaluation, Semantic Role Labelling, WSD	Basque, Catalan, Chinese, English, Italian, Romanian, Spanish
SemEval-1	19 (including 1 cancelled task)	Cross-lingual, Frame Extraction, Information Extraction, Lexical Substitution, Lexical Sample, Metonymy, Semantic Annotation, Semantic Relations, Semantic Role Labelling, Sentiment Analysis, Time Expression, WSD	Arabic, Catalan, Chinese, English, Spanish, Turkish
SemEval-2	18 (including 1 cancelled task)	Coreference, Cross-lingual, Ellipsis, Information Extraction, Lexical Substitution, Metonymy, Noun Compounds, Parsing, Semantic Relations, Semantic Role Labeling, Sentiment Analysis, Textual Entailment,Time Expressions, WSD	Catalan, Chinese, Dutch, English, French, German, Italian, Japanese, Spanish

Areas of Evaluation[edit]

Areas of Study	Brief Description	Senseval-1	Senseval-2	Senseval-3	SemEval-1	SemEval-2
Coreference	Co-reference occurs when multiple expressions in a sentence or document refer to the same thing; or in linguistic jargon, they have the same "referent". The main goal is to perform and evaluate coreference resolution for six different languages with the help of other layers of linguistic information and using different evaluation metrics (MUC, B-CUBED, CEAF and BLANC).					✓
Cross-Lingual	The goal of this task is to provide a framework for the evaluation of systems for cross-lingual lexical substitution. Given a paragraph and a target word, the goal is to provide several correct translations for that word in a given language, with the constraint that the translations fit the given context in the source language.				✓	✓
Ellipsis	Verb Phrase Ellipsis (VPE) occurs in the English language when an auxiliary or modal verb abbreviates an entire verb phrase recoverable from the linguistic context. The study is envisioned in two subtasks: (1) automatically detecting VPE in free text; and (2) selecting the textual antecedent of each found VPE.					✓
Keyphrase Extraction (Information Extraction)	Keyphrases are words that capture the main topic of the document. The systems' goal is to produce the keyphrases for each article, given a set of scientific articles.					✓
Metonymy	Metonymy is a figure of speech used in rhetoric in which a thing or concept is not called by its own name. The goal is to identify whether the entity in that argument position satisfies the type expected by the predicate, given an argument of a predicate.				✓	✓
Noun Compounds	Noun compounds is a sequences of nouns acting as a single noun. Given a compound and a set of paraphrasing verbs and prepositions, the participants goal is to provide a ranking that is as close as possible to the one proposed by human raters.					✓
Semantic Relations	The goal is to improve deep semantic analysis through automatic recognition of semantic relations between pairs of words.				✓	✓
Semantic Role Labeling	The goal is to take Semantic role labelling (SRL) of nominal and verbal predicates beyond the domain of isolated sentences by linking local semantic argument structures to the wider discourse context.			✓	✓	✓
Sentimental Analysis	The basic task in sentiment analysis^[10]is classifying the polarity of a given text at the document, sentence, or feature/aspect level — whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative or neutral.				✓	✓
Time Expression	The goal is to identify the temporal structure of the text by (i) identification of events, (ii) identification of time expressions and (iii) identification of temporal relations.				✓	✓
Textual Entailment	Entailment is the relationship between two sentences where the truth of one (A) requires the truth of the other (B). The aim is to train and evaluate semantic parsers using textual entailments. "Correct parse decisions are captured by textual entailments; thus systems are to decide which entailments are implied based on the parser output only, i.e. there will be no need for lexical semantics, anaphora resolution etc." ^[11]					✓
Word Sense Disambiguation	A WSD process requires two strict things: a dictionary to specify the senses which are to be disambiguated and a corpus of language data to be disambiguated (in some methods, a training corpus of language examples is also required). The goal is developing computational algorithms to replicate human's ability in disambiguating the correct meaning (sense) of word in a given context.	✓	✓	✓	✓	✓

Senseval-1[edit]

The Senseval-1 evaluation exercise was attempting for the first time to run an ARPA-like competition between WSD systems, under the auspices of ACL-SIGLEX and EURALEX (European Association for lexicography), ELSNET and ECRAN (Extraction of Content Research At Near market) and SPARKLE (Shallow Parsing and Knowledge extraction for Language Engineering). There were two variants of computational WSD tasks, viz. "all-words" and "lexical-sample". In all words, participating systems have to disambiguate all words (or all open-class words) in a set of texts. In lexical-sample, first, a sample of words were selected. Then for each sample word, a number of corpus instances were selected. Participating systems then have to disambiguate just the sample-word instances.
For Senseval-1, the lexical-sample variant was chosen due to ^[12]

Cost-effectiveness of "gold-standards" (human annotation of sense tags)
Unavailability of a full dictionary for low or no cost
Many systems interested in participating were not ready for all-word task.
The lexical sample task would be more informative about the strength and failings of WSD research at that point of time. (The all-words task would provide too little data about problems presented by any particular word)

Senseval-1 Tasks

Tasks no.	Senseval-1 Tasks	Description	Languages
01 - 03	Lexical Sample	The lexicon was first sampled, then instances in context of the sample words were found and the evaluation was on those instances only.	English, French, Italian

Senseval-2[edit]

Senseval-2 evaluated WSD systems on three types of task over 12 languages. In the "all-words" task, the evaluation was on almost all of the content words in a sample of texts. In the "lexical sample" task, first sample the lexicon was selected, then corpus instances of the sample words were selected and WSD systems competed to disambiguated the sense in these instances. In the "translation task" (Japanese only), senses corresponded to distinct translations of a word into another language.

Senseval-2 Tasks

Tasks no.	Senseval-2 Tasks	Description	Languages
01 - 04	All-words	The evaluation of word sense disambiguation was on almost all of the content words in a sample of texts.	Czech, Dutch, English, Estonian
05 - 11	Lexical sample	The lexicon was first sampled, then instances in context of the sample words were found and the evaluation was on those instances only.	Basque, Chinese, Danish, English, Italian, Japanese, Korean, Spanish, Swedish
12	Translation	In the translation tasks, the senses corresponded to distinct translations of a word into another language as opposed to corpus instances of the words like "all-words" and "lexical sample task"	Japanese

Senseval-3[edit]

Senseval-3 was a follow-up to Senseval-1 and Senseval-2. Senseval-3 included 14 different tasks for core word sense disambiguation, as well as identification of semantic roles, multilingual annotations, logic forms, subcategorization acquisition.

Senseval-3 Tasks

Tasks no.	Senseval-3 Tasks	Description	Languages
01 - 02	All words	The evaluation of word sense disambiguation was on almost all of the content words in a sample of texts.	English, Italian
03 - 09, 15(cancelled)	Lexical Sample	The lexicon was first sampled, then instances in context of the sample words were found and the evaluation was on those instances only.	Basque, Catalan, Chinese, English,Italian, Romanian, Spanish, Swedish(cancelled)
10.	Automatic subcategorization acquisition	This task involved evaluating word sense disambiguation (WSD) systems in the context of automatic subcategorization acquisition.	English
11	Multilingual lexical sample	The task was very similar to the lexical sample task, except that rather than using the sense inventory from a dictionary use the translations of the target words into a second language as the "inventory".	English-French, English-Hindi
12	WSD of WordNet glosses	This task performed this tagging automatically using all hand-tagged glosses from eXtended WordNet as the test set, with the hand-tagging also serving as the gold standard for evaluation. The task will be performed as an "all-words" task, except that no context will be provided.	English
13	Semantic Roles	This task called for the development of systems to "Automatic Labeling of Semantic Roles". ^[13]	English
14	Logic Forms	This task was complementary to the mainstream task in Senseval. The goal was to transform English sentences into a first order logic notation.	English
16	Semantic Role Identification	(cancelled task)	Swedish

SemEval-1[edit]

Beginning with the 4th workshop, SemEval-2007 (SemEval-1), the nature of the tasks evolved to include semantic analysis tasks outside of word sense disambiguation. Semeval-1 included 18 different tasks targeting the evaluation of systems for the semantic analysis of text. The tasks were elaborated than Senseval as it crosses the different areas of studies in NLP

SemEval-1 Tasks

Tasks no.	SemEval-1 Tasks	Area of Study	Description	Languages
01.	Evaluating WSD on Cross Language Information Retrieval	Cross-lingual, Information Retrival, WSD	This was an application-driven task, where the application was a fixed cross-lingual information retrieval system.	English
02.	Evaluating Word Sense Induction and Discrimination Systems	Word Sense Induction	The goal of this task was to allow for comparison across sense-induction and discrimination systems, and also to compare these systems to other supervised and knowledge-based systems.	English
03.	Pronominal Anaphora Resolution in the Prague Dependency Treebank 2.0(cancelled task)	Anaphora	(cancelled task)	Czech (cancelled)
04.	Classification of Semantic Relations between Nominals	Semantic relations	The goal of this task was the classification of semantic relations between simple nominals (nouns or base noun phrases) other than named entities honey bee, for example, shows an instance of the Product-Producer relation.	English
05.	Multilingual Chinese-English Lexical Sample Task	Cross-lingual, WSD-lexical sample	The goal of this task was to create a framework for the evaluation of word sense disambiguation in Chinese-English machine translation systems.	Chinese, English
06.	Word-Sense Disambiguation of Prepositions	WSD	The task will be carried out in the same manner as previous Senseval lexical sample tasks, following the same methodology for evaluation(including the use of the same evaluation scripts, with sense tagging available for both fine-grained and coarse-grained disambiguation).	English
07.	Coarse-grained English all-words	WSD-coarse gained	The task was to a coarse-grained English all-words WSD task. One of the major obstacles to effective WSD is the fine granularity of the adopted computational lexicon, often the lexicon encodes sense distinctions which are too subtle even for human annotators ^[14]	English
08.	Metonymy Resolution at Semeval-2007	Metonymy	The task was a lexical sample task for English. Participants had to automatically classify preselected expressions of a particular semantic class (such as country names) as having a literal or a metonymic reading, given a four-sentence context.	English
09.	Multilevel Semantic Annotation of Catalan and Spanish	Semantic Annotation, Cross-lingual	In this task, the aim was evaluating and comparing automatic systems for semantic annotation at several levels for the Catalan and Spanish languages.	Catalan, Spanish
10.	English Lexical Substitution Task for SemEval-2007	Lexical Substitution	A substitution task where the task for both annotators and systems was to find a substitute for the target word in the test sentence	English
11.	English Lexical Sample Task via English-Chinese Parallel Text	WSD-Lexical Sample, Cross-lingual	It was an English lexical sample task for word sense disambiguation (WSD), where the sense-annotated examples were (semi)-automatically gathered from word-aligned English-Chinese parallel texts.	English, Chinese
12.	Turkish Lexical Sample Task	WSD-Lexical Sample	This was a Turkish WSD-Lexical Sample Task. The lexicon was first sampled, then instances in context of the sample words were found and the evaluation was on those instances only.	Turkish
13.	Web People Search	WSD-Name Entity Recognition	This task focuses on the disambiguation of person names in a Web searching scenario	English
14.	Affective Text	WSD, Sentimental Analysis	The goal of this task was to explore the connection between emotions and lexical semantics. Provided a short text (news headlines), the objective was to annotate the text for emotions using a predefined list of emotions (e.g. joy, fear, surprise), and/or for polarity orientation (positive/negative).	English
15.	TempEval: A proposal for Evaluating Time-Event Temporal Relation Identification	Time Expression	Text comprehensio ninvolves the capability to identify time expression (i.e. the events described in a text and locate these in time). This task was to identify event-time and event-event temporal relations in texts.	English
16.	Evaluation of wide coverage knowledge resources	WSD	The goal of this task was to measure the relative quality of the knowledge resources submitted for the task by performing an indirect evaluation by using all the resources delivered as Topic Signatures (TS).	English
17.	English Lexical Sample, English SRL and English All-Words Tasks	WSD-Lexical Sample, WSD-All Words	This task consists of lexical sample style training and testing data for 35 nouns and 65 verbs in the WSJ Penn Treebank II as well as the Brown corpus.	English
18.	Arabic Semantic Labeling	Semantic Role Labelling	The tasks will span both the WSD and Semantic Role labeling processes for this evaluation. Both sets of tasks will be evaluated on data derived from the same data set, the test set.	Arabic
19.	Frame Semantic Structure Extraction	Semantic Relation	This task consists of recognizing words and phrases that evoke semantic frames of the sort defined in the FrameNet project (http://framenet.icsi.berkeley.edu), and their semantic dependents, which were usually, but not always, their syntactic dependents (including subjects).	English

SemEval-2[edit]

SemEval-2010 (SemEval-2) was the 5th workshop on semantic evaluation. SemEval-2 added tasks that were from new areas of studies in computational semantics, viz., Coreference, Elipsis, Keyphrase Extraction, Noun Compounds and Textual Entailment. The first three workshops, Senseval-1 through Senseval-3, were focused on word sense disambiguation, each time growing in the number of languages offered in the tasks and in the number of participating teams. In the 4th workshop, SemEval-2007, the nature of the tasks evolved to include semantic analysis tasks outside of word sense disambiguation.

SemEval-2 Tasks

Tasks no.	SemEval-2 Tasks	Area of Study	Description	Languages
01.	Coreference Resolution in Multiple Languages	Coreference	This task was concerned with intra-document coreference resolution for six different languages. The complete task was divided into two subtasks for each of the languages(1) Detection of full coreference chains, composed by named entities, pronouns, and full noun phrases. (2)Pronominal resolution, i.e., finding the antecedents of the pronouns in the text.	Catalan, Dutch, English, German, Italian, Spanish
02.	Cross-Lingual Lexical Substitution	Cross-Lingual, Lexical Subsitution	The goal of this task was to provide a framework for the evaluation of systems for cross-lingual lexical substitution. Given a paragraph and a target word, the goal was to provide several correct translations for that word in a given language, with the constraint that the translations fit the given context in the source language	English, Spanish
03.	Cross-Lingual Word Sense Disambiguation	Cross-lingual, WSD	This task was an unsupervised Word Sense Disambiguation task for English nouns by means of parallel corpora. The sense label was composed of translations in the different languages and the sense inventory was built by three annotators on the basis of the Europarl parallel corpus by means of a concordance tool.	Dutch, French, German, Italian, Spanish
04.	VP Ellipsis - Detection and Resolution	Ellipsis	Verb Phrase Ellipsis (VPE) occurs in the English language when an auxiliary or modal verb abbreviates an entire verb phrase recoverable from the linguistic context (e.g. He spends his days [sketching passers-by](antecedent), or trying to(VPE). The proposed shared task consists of two subtasks: (1) automatically detecting VPE in free text; and (2) selecting the textual antecedent of each found VPE.	English
05.	Automatic Keyphrase Extraction from Scientific Articles	Information Extraction	Keyphrases are words that capture the main topic of the document. Participating systems was provided with set of scientific articles and they produced the keyphrases for each article.	English
06.	Classification of Semantic Relations between MeSH Entities in Swedish Medical Texts(cancelled task)	Information Extraction	(cancelled)	English
07.	Argument Selection and Coercion	Metonymy	This task involves identifying the compositional operations involved in argument selection. The task was defined as follows: for each argument of a predicate, identify whether the entity in that argument position satisfies the type expected by the predicate.	English
08.	Multi-Way Classification of Semantic Relations Between Pairs of Nominals	Semantic Relations, Information Extraction	This task was a deep semantic analysis to automatically recognuze semantic relations between pairs of words. The task was designed to compare different approaches to the problem and to provide a standard testbed for future research, which can benefit many applications in Natural Language Processing. ^[15]	English
09.	Noun Compound Interpretation Using Paraphrasing Verbs	Noun Compound	For each Noun compounds, there will be paraphrasing verbs and prepositions interpretation. Given the compound and the set of paraphrasing verbs and prepositions, the participants must provide a ranking that was as close as possible to the one proposed by human raters.	English
10.	Linking Events and their Participants in Discourse	Semantic Role Labelling, Information Extraction	The task involved two subtasks, which will be evaluated independently (participants can choose to enter either or both): For the Full Task the target predicates in the (test) data set will be annotated with gold standard word senses (frames). For the NIs only task, participants will be supplied with a test set which was already annotated with gold standard local semantic argument structure; only the referents for null instantiations had to be found.	English
11.	Event Detection in Chinese News Sentences	Semantic Role Labelling, WSD	The goal of the task was to detect and analyze some basic event contents in real world Chinese news texts. It consists of finding key verbs or verb phrases to describe these events in the Chinese sentences after word segmentation and part-of-speech tagging, selecting suitable situation description formula for them, and anchoring different situation arguments with suitable syntactic chunks in the sentence.	Chinese
12.	Parser Training and Evaluation using Textual Entailment	Textual Entailment	This was a targeted textual entailment task designed to train and evaluate parsers. The proposed task was desirable for several reasons (1)entailments focus on the semantically meaningful parser decisions.(2) no formal system training was required	English
13.	TempEval 2	Time Expression	Text comprehension requires the capability to identify the events described in a text and to locate them in time. The three subtasks of TempEval were relevant to understanding the temporal structure of a text: (i) identification of events, (ii) identification of time expressions and (iii) identification of temporal relations.	English
14.	Word Sense Induction	Word Sense Induction	Word Sense Induction (WSI) is defined as the process of identifying the different senses (or uses) of a target word in a given text in an automatic and fully-unsupervised manner. The goal of this task was to allow comparison of unsupervised sense induction and disambiguation systems. A secondary outcome of this task will be to provide a comparison with current supervised and knowledge-based methods for sense disambiguation. This task was a continuation of the WSI task in SemEval-1 with some significant changes to the evaluation setting.	English
15.	Infrequent Sense Identification for Mandarin Text to Speech Systems	WSD	This task was a little different from traditional WSD. The WSD methodology was applied to solve homograph ambiguity in grapheme to phoneme (GTP) in a text to speech (TTS) systems. In this task two or more senses may correspond to one pronunciation. That is, the sense granularity was coarser than WSD.	Chinese (Mandarin)
16.	Japanese WSD	WSD	This task can be considered an extension of Senseval-2 Japanese Lexical Sample (monolingual dictionary-based) task. Word senses were defined according to the Iwanami Kokugo Jiten, a Japanese dictionary published by Iwanami Shoten.	Japanese
17.	All-words Word Sense Disambiguation on a Specific Domain (WSD-domain)	WSD	WSD systems trained on general corpora were known to perform worse when moved to specific domains. This task offered a testbed for domain-specific WSD systems, and will allow to test domain portability issues.	English, Chinese, Dutch and Italian
18.	Disambiguating Sentiment Ambiguous Adjectives	WSD, Sentimental Analysis	Some adjectives were neutral in sentiment polarity out of context, but they show positive, neutral or negative meaning within specific context. Such words can be called dynamic sentiment ambiguous adjectives. This task aims to create a benchmark dataset for disambiguating dynamic sentiment ambiguous adjectives.	Chinese

External links[edit]

Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL)
Semeval - Semantic Evaluation Workshop (endorsed by SIGLEX)
Senseval - international organization devoted to the evaluation of Word Sense Disambiguation Systems (endorsed by SIGLEX)
SemEval Portal on the Wiki of the Association for Computational Linguistics

Reference[edit]

^ Agirre, E., Lluís M., & Richard W. (2009), Computational semantic analysis of language: SemEval-2007 and beyond. Language Resources and Evaluation 43(2):97–104.
^ Kilgarriff, A. (1998). SENSEVAL: An Exercise in Evaluating Word Sense Disambiguation Programs. In Proc. LREC, Granada, May 1998. Pp 581--588
^ http://www.grsampson.net/RLeafAnc.html
^ Palmer, M., Ng, H.T., & Hoa, T.D. (2006), Evaluation of WSD systems, in Eneko Agirre & Phil Edmonds (eds.), Word Sense Disambiguation: Algorithms and Applications, Text, Speech and Language Technology, vol. 33. Amsterdam: Springer,75–106.
^ Resnik, P. (2006), WSD in NLP applications, in Eneko Agirre & Phil Edmonds (eds.), Word Sense Disambiguation: Algorithms and Applications. Dordrecht: Springer, 299–338.
^ Yarowsky, D. (1992), Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. Proceedings of the 14th Conference on Computational Linguistics, 454–60. http://dx.doi.org/10.3115/992133.992140
^ Palmer, M., & Light, M. (1999), ACL SIGLEX workshop on tagging text with lexical semantics: what, why, and how? Natural Language Engineering 5(2):i–iv.
^ Ng, H.T. (1997), Getting serious about word sense disambiguation. Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How? 1–7.
^ Philip Resnik and Jimmy Lin (2010) Evaluation of NLP Systems. In Alexander Clark, Chris Fox, and Shalom Lappin, editors. The Handbook of Computational Linguistics and Natural Language Processing. Wiley-Blackwellis. 11:271
^ Michelle de Haaff (2010), Sentiment Analysis, Hard But Worth It!, CustomerThink, retrieved 2010-03-12. {{citation}}: Check date values in: |accessdate= (help); Text "web" ignored (help)CS1 maint: numeric names: authors list (link)
^ http://semeval2.fbk.eu/semeval2.php?location=tasks
^ Kilgarriff, A. and Rosenzweig, J. (2000) Framework and results for English SENSEVAL. Computers in the Humanities 34(1–2): 15–48.
^ Gildea,D. and Jurafsky,D. (2002). Automatic Labeling of Semantic Roles. Computational Linguistics 28:3, 245-288.
^ Edmonds, P. and Kilgarriff,A (2002) Introduction to the Special Issue on Evaluating Word Sense Disambiguation Systems. Journal of Natural Language Engineering 8 (4).
^ Hendrickx, I., Su, N.K., Kozareva, Z.,Nakov, P., O S´eaghdha, D., Padok,S., Pennacchiotti, M., Romanom L.,Szpakowicz, S.(2010). SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals. 5th SIGLEX Workshop.

[1] Agirre, E., Lluís M., & Richard W. (2009), Computational semantic analysis of language: SemEval-2007 and beyond. Language Resources and Evaluation 43(2):97–104.

[2] Kilgarriff, A. (1998). SENSEVAL: An Exercise in Evaluating Word Sense Disambiguation Programs. In Proc. LREC, Granada, May 1998. Pp 581--588

[3] ttp://www.grsampson.net/RLeafAnc.html

[4] Palmer, M., Ng, H.T., & Hoa, T.D. (2006), Evaluation of WSD systems, in Eneko Agirre & Phil Edmonds (eds.), Word Sense Disambiguation: Algorithms and Applications, Text, Speech and Language Technology, vol. 33. Amsterdam: Springer,75–106.

[5] Resnik, P. (2006), WSD in NLP applications, in Eneko Agirre & Phil Edmonds (eds.), Word Sense Disambiguation: Algorithms and Applications. Dordrecht: Springer, 299–338.

[6] Yarowsky, D. (1992), Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. Proceedings of the 14th Conference on Computational Linguistics, 454–60. http://dx.doi.org/10.3115/992133.992140

[7] Palmer, M., & Light, M. (1999), ACL SIGLEX workshop on tagging text with lexical semantics: what, why, and how? Natural Language Engineering 5(2):i–iv.

[8] Ng, H.T. (1997), Getting serious about word sense disambiguation. Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How? 1–7.

[9] Philip Resnik and Jimmy Lin (2010) Evaluation of NLP Systems. In Alexander Clark, Chris Fox, and Shalom Lappin, editors. The Handbook of Computational Linguistics and Natural Language Processing. Wiley-Blackwellis. 11:271

[10] Michelle de Haaff (2010), Sentiment Analysis, Hard But Worth It!, CustomerThink, retrieved 2010-03-12. {{citation}}: Check date values in: |accessdate= (help); Text "web" ignored (help)CS1 maint: numeric names: authors list (link)

[11] ttp://semeval2.fbk.eu/semeval2.php?location=tasks

[12] Kilgarriff, A. and Rosenzweig, J. (2000) Framework and results for English SENSEVAL. Computers in the Humanities 34(1–2): 15–48.

[13] Gildea,D. and Jurafsky,D. (2002). Automatic Labeling of Semantic Roles. Computational Linguistics 28:3, 245-288.

[14] Edmonds, P. and Kilgarriff,A (2002) Introduction to the Special Issue on Evaluating Word Sense Disambiguation Systems. Journal of Natural Language Engineering 8 (4).

[15] Hendrickx, I., Su, N.K., Kozareva, Z.,Nakov, P., O S´eaghdha, D., Padok,S., Pennacchiotti, M., Romanom L.,Szpakowicz, S.(2010). SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals. 5th SIGLEX Workshop.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]