Quantcast
Channel: Recent Discussions - Hemoroizi Forum
Viewing all articles
Browse latest Browse all 26991

All the IC models proposed herein share the same

$
0
0
In Resnik [48], the author introduces the most broadly accepted corpus-based IC model for the evaluation of semantic similarity tasks, which BMN673 shown in Table 2. The Resnik method is based on the estimation of the concept probabilities through the frequency counting of concept occurrences in a training corpus. Each occurrence in the corpus of a word contained in WordNet is counted as an occurrence of all its subsumed concepts. In [41, p.34], Pedersen describes the Resnik frequency counting method used to build the WordNet-based frequency files used in our experiments, Pedersen [39], as well as the corpus-based IC models evaluated in his paper series on similarity measures in WordNet. Following the notation of Pedersen to define the ICResnikICResnik model, each concept frequency f(ci)fci is defined as the sum of the term-frequency (TF) occurrences of the concept cici, plus the inherited frequency (IF) of each subsumed child concept. The estimated probability p^ci of each taxonomic concept ci∈Cci∈C is defined as the ratio of the concept frequency to the root frequency, where N is the total number of occurrences of any noun within the corpus and its value matches the frequency of the root concept Γ. This frequency counting does not take into account the word senses, although Resnik suggests that a sense-tagged corpus could be used to improve axillary buds issue. In another work Pedersen [40], the authors prove that the IC models derived from a non sense-tagged corpus perform better than the sense-tagged ones. Like most IC models, the Resnik method does not satisfy the axioms for a well-founded IC model described in Section 4, encouraging the proposal of the CondProbCorpus IC model in order to complete the proposed family herein.

Viewing all articles
Browse latest Browse all 26991

Trending Articles