Download A Theory of Indexing by Gerard Salton PDF

By Gerard Salton

Offers a concept of indexing in a position to rating index phrases, or topic identifiers in lowering order of significance. This ends up in the alternative of fine rfile representations, and in addition money owed for the function of words and of glossary periods within the indexing strategy.

This research is normal of theoretical paintings in computerized info association and retrieval, in that techniques are used from arithmetic, machine technology, and linguistics. a whole thought of details retrieval may perhaps emerge from a suitable mixture of those 3 disciplines.

Show description

Read Online or Download A Theory of Indexing PDF

Similar probability books

Statistics: A Very Short Introduction (Very Short Introductions)

Statistical principles and strategies underlie on the subject of each element of recent lifestyles. From randomized scientific trials in scientific study, to statistical types of threat in banking and hedge fund industries, to the statistical instruments used to probe giant astronomical databases, the sector of statistics has turn into centrally vital to how we comprehend our global.

Probability and Schroedinger's mechanics

Addresses a number of the difficulties of examining Schrodinger's mechanics-the so much entire and particular conception falling below the umbrella of 'quantum theory'. For actual scientists drawn to quantum conception, philosophers of technology, and scholars of medical philosophy.

Statistical Design for Research

The Wiley Classics Library contains chosen books that experience turn into well-known classics of their respective fields. With those new unabridged and cheap versions, Wiley hopes to increase the lifetime of those very important works by way of making them to be had to destiny generations of mathematicians and scientists.

Extra resources for A Theory of Indexing

Sample text

Furthermore, a direct correlation exists between discrimination value order and document frequency Bk. 3 (b) The terms with average discrimination ranks and discrimination values around zero are those with quite low document frequencies ranging from 1 to 5 for the test collections of Table 17. 025 and 0 in Table 17) aro characterized by the highest document frequencies ranging up to 270 for the collections of 450 documents. The data of Table 17 also show that the class of high-frequency, negative discriminators is fairly small in each case.

While different subject areas are covered in each case, the relevance properties are identical for the three collections; in particular, the probability that a given document is relevant to a query is the same throughout the test base. The basic collection statistics are shown in Table 7. 02 [12, Chap. 3]. The basic indexing statistics are shown for the three collections in Table 8. It may be seen that the total number of distinct terms (word stems) used to index the three collections increases from CRAN to MED, and from MED to Time.

The newly generated phrases can be assigned to documents and queries in various combinations. Singles, pairs, and triples can all be used together (SPT); 4 In a practical implementation, the phrase formation model of Table 18 need not of course be followed precisely. In fact, it is unnecessary physically to form any phrases at all; instead in each query or document, the high-frequency nondiscriminators can be flagged appropriately, and the formation of the corresponding pairs and triples can be made implicitly.

Download PDF sample

Rated 4.14 of 5 – based on 9 votes