The concept of access to information has evolved, as Borgman (2000, 79) shows, from the varied areas such as library system, telecommunication system and so on. According to her (Borgman 2000, 57), access to information is a process, through which the user is able to retrieve the information s/he seeks from the internetwork of computers provided that,
i. The user has the basic technical knowledge and skills,
ii. The technology is viable,
iii. The information is relevant and usable.
The whole process of access to digital libraries is dependent on these three factors: the knowledgeable user, technology and nature and quality of data. In other words, the user should have a minimum level of technical knowledge for better access in terms of quality of the retrieved data. Now, since I recognise teachers and students of English literature as the readers, I will discuss the search methods briefly.
Generally speaking, we come to across the following information models with the digital libraries on the internet:
- Boolean Model
- The Vector Space Model
- The Probabilistic Model
- The Natural Language Processing Model
- The Hypertext Model
The first three models function by matching search terms with index terms to generate search results. “One of the major criticisms of them is”, as Chowdhury and Chowdhury point out (Gobinda Chowdhury and Sudatta Chowdhury, 2003), “that they look at individual search terms; they do not consider the search or index terms as part of a sentence or document.” That is why the last two models are put forward to tackle the limitation of the previous models.
Boolean Search Model
This search model is the oldest and functions in accordance with set theory and Boolean algebra. It operates by matching a set of search terms against a set of index terms. Multiple search terms are processed on the basis of logical product (AND logic), logical sum (OR logic) and logical difference (NOT logic). The processes of its functioning are described later in this chapter.
The Vector Space Model
This model is based on the calculation of binary weights. It functions by assigning non-binary weights to index terms in queries as well as in documents and computing the degree of similarity between each document in a collection and the query based on the weight of the terms. Thus a ranked list of output can be produced with items that fully as well as partially match the query. While this model produces a ranked list, the major weakness of this model lies in its assumption that index terms are mutually independent.
Probabilistic models are based on the principles of probability theory. According to Answer.com, they “treat the process of document retrieval as a probabilistic inference. Similarities are computed as probabilities that a document is relevant for a given query. Probabilistic theorems like the Bayes’ theorem are often used in these models.” (Answer.com)
The Natural Language Processing Model
This model (also known as computational linguistics) is an attempt at processing search items not simply in terms of keywords, but also in terms sentences, taking into consideration syntactic, semantic and pragmatic analyses. Webopedia defines it as “a branch of artificial intelligence that deals with analyzing, understanding and generating the languages that humans use naturally in order to interface with computers in both written and spoken contexts using natural human languages instead of computer languages” (Webopedia). In other words, it tries to make computer understand how human beings learn and use language.
The Hypertext Model
This model evolved as a system to overcome the limitations of the fixity and linearity of the conventional documents. It does so by putting in hyperlinks to other parts of a document (sentence, paragraph or the entire document on a local machine and to other domains and sub-domains on the web. The hyperlinks are made indexable and search able by search programmes. For the flexibility of this model it has played a major role in the designs of the websites and in the functioning of the internet. It should be noted that hypertext model has been largely instrumental in the making of Hypertext Markup Language (HTML) and Hypertext Transfer Protocol (http).
It has been generally found that teachers and students search the web for resources just by using the major search engines and through certain keyword or phrases, which lead them to particular digital libraries and web resources. Since they are not familiar with the search techniques, they cannot get optimum access to the resources. Added to this is their deep-seated phobia of viruses and distrust of unknown sites. While the virus threat can be effectively minimised by using a good anti-virus software, better access can be achieved by being familiar with the ways the digital libraries and the web function.
Boolean search employs special logic to produce search results. Without knowing its basic functions, a user cannot apply the logic to retrieve information in the digital environment. The search operators may vary with different libraries, but the basic function is very intuitive and simple. For instance, if a user applies the logical product (AND logic) and enters the search terms “Shakespeare and fool”, it will retrieve all those documents where both the terms appear. The second ‘OR logic’ “allows the user to combine two or more search terms in order to retrieve all those items that contains either one or all of the constituent terms” (p. 188) Following this the search terms “Shakespeare or Marlowe” will retrieve all those documents) i) where the term ‘Shakespeare’ occurs, ii) where the term ‘Marlowe’ occurs and iii) where both the terms occur. By using this logic, search broadens its scope. On the other hand, ‘NOT logic’ is used to restrict the search results to specific terms and exclude particular term. For instance, “Elizabethan dramatist not Marlowe” will retrieve all the records except Marlowe.
Truncation sends signals to a search engines to retrieve the information relating to the different terms having the same common root. The user can perform this kind of search by placing operator like ‘*’ or ‘?’ (which may vary with different search engines) in the left hand side of a root, in the right hand side of a root or in the middle of a world. For instance, “*logy” will result in retrieve terms having ‘logy’ at the end like ‘philology’, ‘psychology’, ‘biology’ etc. Right-hand truncation like “philo*” will produce search results having the same characters in the beginning like ‘philosophy’, ‘philology’, ‘philomel’ etc. Similarly middle-truncation (humo*r) retrieves the terms matching characters (like ‘humour’ ‘humor’).
This type of search is performed in order to specify the distance between two terms in the retrieved results. In principle, this is similar to the Boolean ‘AND’ search, but the difference is that it makes the search more restricted and more user’s query-oriented. The use of operators for this varies with different digital libraries. In the ACM digital library (http://portal.acm.org/dl.cfm) the ‘NEAR’ is used to retrieve terms which will have close proximity to each other.
Field or Meta Tag Search
This search is performed when a user wants to restrict searches to more specific results. This is done by selecting an appropriate given field (area) before proceeding to search a particular item in the collection. This is called field or meta tag search because the fields in digital collections are specified by meta tags. For instance, in the “Advanced Search” wizard of the Project Gutenberg library, the user can restrict search results by selecting appropriate fields from ‘Language’, ‘Category’, ‘LoCC’ and ‘File Type’, where the items are expected to be found. In the Batleby library the user is given the option of choosing a particular field in “Select Search” option before performing a particular search.
A digital collection in a particular library may contain many items with similar index terms. In this a particular simple search may result in hundreds of retrieved items. In such cases, it is necessary to limit searches by choosing appropriate criteria such as language, year of publication, type of information, file type etc. This type of action is also useful in searching the entire web.