Offers a concept of indexing in a position to rating index phrases, or topic identifiers in lowering order of significance. This ends up in the alternative of fine rfile representations, and in addition money owed for the function of words and of glossary periods within the indexing strategy.

This research is normal of theoretical paintings in computerized info association and retrieval, in that techniques are used from arithmetic, machine technology, and linguistics. a whole thought of details retrieval may perhaps emerge from a suitable mixture of those 3 disciplines.

Furthermore, a direct correlation exists between discrimination value order and document frequency Bk. 3 (b) The terms with average discrimination ranks and discrimination values around zero are those with quite low document frequencies ranging from 1 to 5 for the test collections of Table 17. 025 and 0 in Table 17) aro characterized by the highest document frequencies ranging up to 270 for the collections of 450 documents. The data of Table 17 also show that the class of high-frequency, negative discriminators is fairly small in each case.

While different subject areas are covered in each case, the relevance properties are identical for the three collections; in particular, the probability that a given document is relevant to a query is the same throughout the test base. The basic collection statistics are shown in Table 7. 02 [12, Chap. 3]. The basic indexing statistics are shown for the three collections in Table 8. It may be seen that the total number of distinct terms (word stems) used to index the three collections increases from CRAN to MED, and from MED to Time.

The newly generated phrases can be assigned to documents and queries in various combinations. Singles, pairs, and triples can all be used together (SPT); 4 In a practical implementation, the phrase formation model of Table 18 need not of course be followed precisely. In fact, it is unnecessary physically to form any phrases at all; instead in each query or document, the high-frequency nondiscriminators can be flagged appropriately, and the formation of the corresponding pairs and triples can be made implicitly.

