Semantic Modification of the Mitkov Algorithm for Anaphora Resolution
The article is dedicated to modern algorithm of pronominal anaphora resolution. Anaphora resolution should be considered in a wider range of problems related with language ambiguity resolution, for instance: entity recognition, reference analysis and in general case, of course, semantic analysis of...
Saved in:
| Published in: | Штучний інтелект |
|---|---|
| Date: | 2012 |
| Main Author: | |
| Format: | Article |
| Language: | English |
| Published: |
Інститут проблем штучного інтелекту МОН України та НАН України
2012
|
| Subjects: | |
| Online Access: | https://nasplib.isofts.kiev.ua/handle/123456789/57080 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Journal Title: | Digital Library of Periodicals of National Academy of Sciences of Ukraine |
| Cite this: | Semantic Modification of the Mitkov Algorithm for Anaphora Resolution / O.O. Marchenko // Штучний інтелект. — 2012. — № 3. — С. 106-110. — Бібліогр.: 7 назв. — англ. |
Institution
Digital Library of Periodicals of National Academy of Sciences of Ukraine| _version_ | 1859814605436485632 |
|---|---|
| author | Marchenko, O.O. |
| author_facet | Marchenko, O.O. |
| citation_txt | Semantic Modification of the Mitkov Algorithm for Anaphora Resolution / O.O. Marchenko // Штучний інтелект. — 2012. — № 3. — С. 106-110. — Бібліогр.: 7 назв. — англ. |
| collection | DSpace DC |
| container_title | Штучний інтелект |
| description | The article is dedicated to modern algorithm of pronominal anaphora resolution. Anaphora resolution should be considered in a wider range of problems related with language ambiguity resolution, for instance: entity recognition, reference analysis and in general case, of course, semantic analysis of natural language text. We can render conclusion from stated above that anaphora resolution is possible only on semantic level of natural language analysis. The main purpose of this work is development of semantic heuristics for finding the most probable antecedent corresponding to anaphora with analysis of sentence context. The proposed algorithm gives about 5% improvements in comparison to the standard Mitkov algorithm.
Робота присвячена аналізу алгоритму розв’язання займенникової анафори. Розв’язання анафори має бути розглянуто в рамках широкого кола проблем лінгвістичної неоднозначності, наприклад: розпізнавання сутностей тексту, аналіз посилань та, в загальному випадку, семантичний аналіз текстів природною мовою. Із зазначеного вище можна зробити висновок, що розв’язання анафори можливе лише на семантичному рівні аналізу природної мови. Головною метою цієї роботи є розробка семантичної евристики для пошуку найбільш імовірного антецедента, що відповідає анафорі, із застосуванням аналізу контексту речень. Запропонований алгоритм дає покращення близько 5% порівняно зі стандартним алгоритмом Міткова.
Работа посвящена анализу алгоритма решения местоименной анафоры. Решение анафоры должно быть рассмотрено в рамках широкого круга проблем лингвистической неоднозначности, например: распознавание сущностей текста, анализ ссылок и, в общем случае, семантический анализ текстов на естественном языке. Из указанного выше можно сделать вывод, что решение анафоры возможно только на семантическом уровне
анализа естественного языка. Главной целью этой работы является разработка семантической эвристики для поиска наиболее вероятного антецедента, соответствующего анафоре, с использованием анализа контекста предложений. Предложенная модификация алгоритма дает улучшение около 5% по сравнению со стандартным алгоритмом Миткова.
|
| first_indexed | 2025-12-07T15:20:58Z |
| format | Article |
| fulltext |
«Искусственный интеллект» 3’2012106
3М
УДК 681.3
O.O. Marchenko
Taras Shevchenko National University of Kyiv
Ukraine, 03680, Kyiv, Glushkova Ave., 4-d
Semantic Modification of the Mitkov Algorithm
for Anaphora Resolution
О.О. Марченко
Київський національний університет імені Тараса Шевченка
Україна, 03680, м. Київ, просп. Глушкова, 4-д
Семантична модифікація алгоритму Міткова
для розв’язання анафор
А.А. Марченко
Киевский национальный университет имени Тараса Шевченко
Украина, 03680, г. Киев, просп. Глушкова 4-д
Семантическая модификация алгоритма Миткова
для решения анафор
The article is dedicated to modern algorithm of pronominal anaphora resolution. Anaphora resolution should
be considered in a wider range of problems related with language ambiguity resolution, for instance: entity
recognition, reference analysis and in general case, of course, semantic analysis of natural language text. We
can render conclusion from stated above that anaphora resolution is possible only on semantic level of natural
language analysis. The main purpose of this work is development of semantic heuristics for finding the most
probable antecedent corresponding to anaphora with analysis of sentence context. The proposed algorithm
gives about 5% improvements in comparison to the standard Mitkov algorithm.
Key Words: natural language text processing, anaphora resolution, semantic analysis.
Робота присвячена аналізу алгоритму розв’язання займенникової анафори. Розв’язання анафори має
бути розглянуто в рамках широкого кола проблем лінгвістичної неоднозначності, наприклад:
розпізнавання сутностей тексту, аналіз посилань та, в загальному випадку, семантичний аналіз текстів
природною мовою. Із зазначеного вище можна зробити висновок, що розв’язання анафори можливе
лише на семантичному рівні аналізу природної мови. Головною метою цієї роботи є розробка
семантичної евристики для пошуку найбільш імовірного антецедента, що відповідає анафорі, із
застосуванням аналізу контексту речень. Запропонований алгоритм дає покращення близько 5%
порівняно зі стандартним алгоритмом Міткова.
Ключові слова: обробка текстів природною мовою, розв’язання анафори, семантичний аналіз.
Работа посвящена анализу алгоритма решения местоименной анафоры. Решение анафоры должно быть
рассмотрено в рамках широкого круга проблем лингвистической неоднозначности, например: распознавание
сущностей текста, анализ ссылок и, в общем случае, семантический анализ текстов на естественном языке.
Из указанного выше можно сделать вывод, что решение анафоры возможно только на семантическом уровне
анализа естественного языка. Главной целью этой работы является разработка семантической эвристики для
поиска наиболее вероятного антецедента, соответствующего анафоре, с использованием анализа контекста
предложений. Предложенная модификация алгоритма дает улучшение около 5% по сравнению со
стандартным алгоритмом Миткова.
Ключевые слова: обработка текстов на естественном языке, решение анафоры,
семантический анализ.
Semantic Modification of the Mitkov Algorithm for Anaphora Resolution
«Штучний інтелект» 3’2012 107
3М
Introduction
Existing rule-based algorithms for anaphora resolution based on analysis of syntax
properties have already reached their limit in quality aspects. Statistics shows that
probability of connecting right antecedent with anaphora is about 85-90% [1], [2]. Further
optimization is complicated by conflicts that occur due to big number of syntax rules.
Attempts to optimize coefficients that determine rules priorities (so that no conflicts will
occur) lead to decrease of probability for correct anaphora resolution [3]. Possible progress
is not significant (within 0.1-0.001%).
Anaphora reference resolution is impossible without semantics of candidates to
antecedent and analysis of how its semantic meaning consistent with semantic of words
that close neighbor to anaphora. As it has been stated before, the main purpose of this
article is creation of semantic heuristics for correct antecedent determination with usage of
semantic meaning of sentences. We decided to do it with modification of existing methods
through adding semantic rules into the Mitkov algorithm [2].
There are wide arrays of approaches for solving this problem. The modern
approaches are: Lapin and Leass algorithm, centering algorithm, Hobb’s algorithm, Mitkov
algorithm [1], [2]. Mitkov algorithm is known to be quite flexible and adjustable. Ordinary
realization of Mitkov algorithm doesn’t solve dubious cases and can cause wrong answer.
Semantic rules like “semantic triplet match” together with syntax restrictions can give us
improvement in ambiguous cases. This helps to extend a “bottleneck” of Mitkov algorithm.
In the next sections, we discuss our new resolution algorithm and statistics about some
resolution.
Mitkov algorithm
Mitkov algorithm to pronominal anaphora can be described as a set of rules that weight
candidates to antecedents and after that the best candidate is antecedent with greatest salience.
The set of rules is:
1. Definiteness: All defined nouns have weights +1, candidates that don’t have any
defined nouns : -1;
2. Giveness: candidates that represent the following topic: +1;
3. Indicator words: candidates after verbs: {discuss, present, illustrate, identify,
summarize, examine, describe, define} have +1;
4. Lexical reiteration: repeated candidates have salience +1 if they are repeated
once, +2 if they are repeated twice, and so on;
5. Non-pronominal phrases: candidates that enter NP have salience +1;
6. Collocation pattern preference: +2 to salience of candidates that have syntactic
position the same that a pronoun;
7. Connective pattern: in case like: “you V1 NP or con((you) V2 it) con ((you) V3
it)” NP candidates have +2 to it’s salience;
8. Reminder indicator: candidates that are reminded in previous sentences have +1,
in the same sentence : +2;
9. Field indicator: candidates that concern the same field that antecedent have +1;
10. Boost pronoun: candidates that have more references to pronouns have more
salience;
11. Syntax parallelism: in case of same syntactic position, candidates have +1;
12. Reference indicator: in case of the most referred antecedents, they have +1.
O.O. Marchenko
«Искусственный интеллект» 3’2012108
3М
А
Analysis and improvements of the Mitkov algorithm’s
The Mitkov algorithm is a rule-based approach. It is based on syntax rules that can conflict
with each other. The main "bottleneck" in this algorithm in some relations must be solved on
semantic level but this can not be done, because we have only syntax-based rules. In this case pro-
bability of finding right antecedent is very low. This problem can be solved only with semantic
rules implementation. If we try to extend the set of syntax rules this could only increase conflicts
between the rules and will lead to wrong antecedent as a result. We can also strengthen pronominal
anaphora algorithm with semantic measurement implementation. The main advantage of the
Mitkov algorithm is that it can be easily adjusted so that we can process a set of sentences. In our
realization we look backwards for four sentences.
Let us consider rule #6: Collocation match pattern. We can extend this rule adding
some semantic sense to it. We can increase probability of choosing right antecedent
modifying this indicator. For instance one of possible modifications can be done: we can
see that semantic position of candidate is not strictly the same as semantic position of
pronoun but close enough. Close enough means that semantic distance less or equal than
value that was specified before. In our case we used semantic distance by Leacock and
Chodorow: ( 1 , 2 ) log
( 1, 2 )
2 *
, where len is the number of edges on the
shortest path in the taxonomy between the two concepts(words) and MAX is the depth of
the taxonomy [4], [5].
Another possible modification is to create triplets in a form like: VERB verb NOUN
noun1 NOUN noun2. With this triplet we can use semantic distance for noun1 and noun2.
There can be more modifications done. They are building layer by layer so that chances of
finding right antecedent will raise a lot.
We can also use syntactic restrictions when composing weights for antecedents -
NPs.
Let us consider the following example:
“To avoid data loss on devices, we should avoid storage of critical information on them”.
Let’s take a look for possible outcome of our algorithm using the Mitkov approach
without any syntactic restrictions:
,1
,2
, 3
as we can see that our most probable antecedent is “data”. But if we use restriction like
“Anaphora can not refer on co-argument” [6], [7] then algorithm cut off “data” case, and
we can have “devices” as the most probable antecedent.
We can use syntactic restrictions like:
Pronoun P and noun phrase N are non-coreferential if any of the following conditions are hold:
1.P and N have incompatible agreement features.
2.P is in the arguments domain of N.
3.P is the adjunct domain of N.
4.P is an argument of head noun, N is not pronoun, and N is contained in head noun.
5.P is in the NP domain of N.
6.P is determiner of a noun Q, and N is contained in Q.
Main idea
The main concept is eliciting semantic information that concern anaphora and try to
find noun with a similar semantic context. In our algorithm we look backwards, up to five-
Semantic Modification of the Mitkov Algorithm for Anaphora Resolution
«Штучний інтелект» 3’2012 109
3М
six sentences, and try to find closest semantic triplet. In cases when antecedent is a
pronoun we can substitute anaphora that we are resolving with that pronoun and find
antecedent for new pronoun. When we will have a situation that antecedent is noun, we can
trace back to our original pronoun.
For instance let us consider example:
“John likes to solve extraordinary problems. Last set seams to be quite difficult, it took
almost all day him to solve them. His extraordinary talent gives him advantage, that’s why
he is obsessed in solving them”.
Let us try to solve anaphora for “them”:
First triplet: VERB(solving) NOUN1(them) NOUN2(he)
Seconds triplet: VERB(took) NOUN1(his) NOUN2(them)
Third triplet: VERB(solve)NOUN1(John)NOUN2(problems)
Together with semantic approach, pronoun substitution we can come that “he” refers
to “John” and “them” – to “problems”.
Let us consider another example:
“There was seen a tail of a fox. It has stolen a chicken. It was red, furry and with a white tip”.
In this case syntax structure is identical, and it’s impossible to determine with only
syntax rules antecedent for last “it”. With usage of semantic rules we can determine that
“tail” can not steal something, so first “it” is reference to “fox”. Second “it” cannot be
reference to a fox, because “tip” is not a property of a fox but of a tail. So second “it” will
correspond to “tail”.
Experiments
Here are some statistics demonstrated improvements of a new algorithm over the Mitkov
standard.
Fig. 1 – Improvements of the New Algorithm
On the graphic above the statistical information is presented: dark area is results of
the Mitkov algorithm without any modifications, light area is the results of the Mitkov
algorithm with modifications (semantic measures, semantic approach) on different types of
sentences. On Y axis, there is probability that antecedent correctly matched with pronoun.
On X axis, we can see the sets of sentences that were used to test and compare the
improved Mitkov algorithm and the standard Mitkov algorithm.
Type A sentences is complex sentences that can have more that one NP in them.
Type B is sequence of sentences, maximum length is two sentences. Type C is the sequen-
ce where maximum number of sentences is three, Type D is the sequence with the length
of four sentences. On each type, there were about ten sentences.
As we can see, modification gives us improvement about 4-5%, on every type of a sentence.
Conclusions
Analysis of existing algorithms has convinced us that it’s impossible to solve anaphora
resolution problem just by syntax means. Semantic rules are needed so that it’s possible to
determine which of candidates to antecedents is closer to words – context neighbours of
anaphora. Semantic rules were created so and added to the modern Mitkov algorithm.
O.O. Marchenko
«Искусственный интеллект» 3’2012110
3М
А
In current realization we were able to improve rule-based algorithm and we have provided
evidence with statistical data. Using of semantic approach allows us to solve some cases when in
syntax approach we have to use priority of rules that was determined in empirical way. With usage
of semantic rules we can determine antecedent more precisely.
Usage of semantic distance metrics that have been constructed on the base of global
ontology networks can give some improvements to procedure of context linkage of
candidate to antecedent to anaphora place.
References
1. Carbonell J.G. Anaphora: analysis, algorithms and applications / J.G. Carbonell, J. Siekmann. – Springer , 2007. –
P. 125-150.
2. Antonio Branco. Anaphora processing. Linguistic, cognitive and computational modelling (Book style) /
Antonio Branco Tony McEnergy, Ruslan Mitkov. – 2005. – ch 2, 3.
3. Chierchia G. Dynamic of Meaning: Anaphora, Presupposition and the Theory of Grammar / G. Chierchia. –
University of Chicago Press, 1995. – ch. 4.
4. Leacock C. Using Corpus Statistics and WordNet Relations for Sense Identification, Computational
Linguistics - Special issue on word sense disambiguation Claudia Leacock, George A. Miller , Martin
Chodorow – March 1998. – Vol. 24, Is.1. – P. 147-165.
5. Budanitsky Al. Semantic distance in WordNet: An experimental, application-oriented evaluation of five
measures, Workshop on wordnet and other lexical resources / Alexander Budanitsky, Graeme Hirst. – 2001.
6. Mitkov R. A New, Fully Automatic Version of Mitkov's Knowledge-Poor Pronoun Resolution Method.
CICLing '02 / Ruslan Mitkov, Richard Evans, and Constantin Orasan // Proc. of the Third International
Conference on Computational Linguistics and Intelligent Text Processing. – 2002. P. 168-186.
7. Patrick Sturt A New Look at the Syntax-Discourse Interface: The Use of Binding Principles in Sentence
Processing, Journal of psycholinguistic research,Volume 32, Number 2 (2003), pp.125-139.
RESUME
O.O. Marchenko
Semantic Modification of the Mitkov Algorithm
for Anaphora Resolution
The article is dedicated to modern algorithm of pronominal anaphora resolution. Anaphora
resolution should be considered in a wider range of problems related with language ambiguity
resolution, for instance: entity resolution, reference analysis and in general case, of course, semantic
analysis of natural language text. Analysis of existing algorithms has convinced us that it’s
impossible to solve anaphora resolution problem just by syntax means. We can render conclusion
from stated above that anaphora resolution is possible only on semantic level of natural language
analysis. The main purpose of this work is development of semantic heuristics for finding the most
probable antecedent corresponding to anaphora with analysis of sentence context.
Semantic rules are needed so that it’s possible to determine which of candidates to
antecedents is closer to words - context neighbours of anaphora. Semantic rules were created so
and added to modern Mitkov algorithm.
In current realization we were able to improve rule-based algorithm and we have provided
evidence with statistical data. Using semantic approach allows us to solve some cases when in
syntax approach we have to use priority of rules that was determined in empirical way. With usage
of semantic rules we can determine antecedent more precisely.
Usage of semantic distance metrics that have been constructed on the base of global ontology
networks can give some improvements to procedure of context linkage of candidate to antecedent
to anaphora place. The proposed algorithm gives about 5% improvements in comparison to the
standard Mitkov algorithm.
Статья поступила в редакцию 30.05.2012.
|
| id | nasplib_isofts_kiev_ua-123456789-57080 |
| institution | Digital Library of Periodicals of National Academy of Sciences of Ukraine |
| issn | 1561-5359 |
| language | English |
| last_indexed | 2025-12-07T15:20:58Z |
| publishDate | 2012 |
| publisher | Інститут проблем штучного інтелекту МОН України та НАН України |
| record_format | dspace |
| spelling | Marchenko, O.O. 2014-03-03T14:21:31Z 2014-03-03T14:21:31Z 2012 2012 Semantic Modification of the Mitkov Algorithm for Anaphora Resolution / O.O. Marchenko // Штучний інтелект. — 2012. — № 3. — С. 106-110. — Бібліогр.: 7 назв. — англ. 1561-5359 https://nasplib.isofts.kiev.ua/handle/123456789/57080 681.3 The article is dedicated to modern algorithm of pronominal anaphora resolution. Anaphora resolution should be considered in a wider range of problems related with language ambiguity resolution, for instance: entity recognition, reference analysis and in general case, of course, semantic analysis of natural language text. We can render conclusion from stated above that anaphora resolution is possible only on semantic level of natural language analysis. The main purpose of this work is development of semantic heuristics for finding the most probable antecedent corresponding to anaphora with analysis of sentence context. The proposed algorithm gives about 5% improvements in comparison to the standard Mitkov algorithm. Робота присвячена аналізу алгоритму розв’язання займенникової анафори. Розв’язання анафори має бути розглянуто в рамках широкого кола проблем лінгвістичної неоднозначності, наприклад: розпізнавання сутностей тексту, аналіз посилань та, в загальному випадку, семантичний аналіз текстів природною мовою. Із зазначеного вище можна зробити висновок, що розв’язання анафори можливе лише на семантичному рівні аналізу природної мови. Головною метою цієї роботи є розробка семантичної евристики для пошуку найбільш імовірного антецедента, що відповідає анафорі, із застосуванням аналізу контексту речень. Запропонований алгоритм дає покращення близько 5% порівняно зі стандартним алгоритмом Міткова. Работа посвящена анализу алгоритма решения местоименной анафоры. Решение анафоры должно быть рассмотрено в рамках широкого круга проблем лингвистической неоднозначности, например: распознавание сущностей текста, анализ ссылок и, в общем случае, семантический анализ текстов на естественном языке. Из указанного выше можно сделать вывод, что решение анафоры возможно только на семантическом уровне анализа естественного языка. Главной целью этой работы является разработка семантической эвристики для поиска наиболее вероятного антецедента, соответствующего анафоре, с использованием анализа контекста предложений. Предложенная модификация алгоритма дает улучшение около 5% по сравнению со стандартным алгоритмом Миткова. en Інститут проблем штучного інтелекту МОН України та НАН України Штучний інтелект Анализ и синтез коммуникационной информации Semantic Modification of the Mitkov Algorithm for Anaphora Resolution Семантична модифікація алгоритму Міткова для розв’язання анафор Семантическая модификация алгоритма Миткова для решения анафор Article published earlier |
| spellingShingle | Semantic Modification of the Mitkov Algorithm for Anaphora Resolution Marchenko, O.O. Анализ и синтез коммуникационной информации |
| title | Semantic Modification of the Mitkov Algorithm for Anaphora Resolution |
| title_alt | Семантична модифікація алгоритму Міткова для розв’язання анафор Семантическая модификация алгоритма Миткова для решения анафор |
| title_full | Semantic Modification of the Mitkov Algorithm for Anaphora Resolution |
| title_fullStr | Semantic Modification of the Mitkov Algorithm for Anaphora Resolution |
| title_full_unstemmed | Semantic Modification of the Mitkov Algorithm for Anaphora Resolution |
| title_short | Semantic Modification of the Mitkov Algorithm for Anaphora Resolution |
| title_sort | semantic modification of the mitkov algorithm for anaphora resolution |
| topic | Анализ и синтез коммуникационной информации |
| topic_facet | Анализ и синтез коммуникационной информации |
| url | https://nasplib.isofts.kiev.ua/handle/123456789/57080 |
| work_keys_str_mv | AT marchenkooo semanticmodificationofthemitkovalgorithmforanaphoraresolution AT marchenkooo semantičnamodifíkacíâalgoritmumítkovadlârozvâzannâanafor AT marchenkooo semantičeskaâmodifikaciâalgoritmamitkovadlârešeniâanafor |