Class One Notes

Information Access - To What?

This class us about electronic access to information - how is is structured, accessed, and manipulated.

Type	Examples
Electronic records for print books	Classic Catalog Encore Catalog Prospector Classic Catalog Prospector Encore Catalog
Electronic records for print journals	Periodicals Index Online Alternative Press Index C19: The Nineteenth Century Index
Electronic records for print newspapers	Wall Street Journal
Electronic records for print documents (federal, international)	ProQuest Congressional AccessUN
Electronic books (E-books)	eBook Collection (formerly netLibrary) ebrary
Electronic Journals (E-journals)	Periodicals Archive Online
Electronic Newspapers	New York Times Historical (1851- ) Washington Post Historical (1877-1990)
Electronic government documents	U.S. Congressional Serial Set
Business and company directories	ReferenceUSA
Financial data	Key Business Ratios
Statistical data	Historical Statistics of the United States ICPSR
Public opinion polling	Roper Center for Public Opinion Research
Images	ARTstor AP Images
Citation Indexes	Web of Science
Book Reviews	Book Review Index Book Review Digest
Advertisments	Ad*Access Advertising Redbooks
Dissertations	Dissertations & Theses (PQDT)
Directories	Archives Unbound
Biographies	Biography Index Retrospective (1946-1983)
College Catalogs	CollegeSource
Poetry (Index)	Columbia Granger's World of Poetry

What is being searched?

	Resource	Access Points
Indexing Only	Reader's Guide to Periodical Literature (print) [example]	Author, Title, Subject
	Poole's Index to Periodical Literature (1802-1906) (print) [example]	Subject (requires separate name index and subject index to use effectively)
	C19 (online)	Author, Title, Source Title, Subject, Limit by Date, Keyword
Indexing and Abstracting	Biological Abstracts (print) [example]	Author, Title, Subject
	Agricola (online)	Author, Title, Source Title, Subject, Descriptors, Abstract, Notes, Keyword, and many others
	Bibliography of the History of Art (online)	Author, Title, Source Title, Abstract, Classification, Notes, Keyword, and many others
	Biological Abstracts (online)	Author, Title, Source Title, "Topic", Controlled Vocabulary
Indexing and Abstracting plus Full Text Searching	Academic Search Complete (online)	Author, Title, Subject, Source Title, Abstract, Full Text (optional search)
	U.S. Congressional Serial Set (1817-1980) (online)	Author, Title, Subject, Bibliograpic Numbers, Personal Names, Geographic Location, Publication Category, Full Text
	Early English Books Online (1475-1700) (online)	Author, Title Subject, Full Text
	Emerald Library (online)	Author, Title Subject, Source Title, Abstract, Full Text

Why would you ever want to search just surrogate records rather than full text?

Full text scanning is not always so great. Look at this example.

Information Retrieval - What can we do with all this information?

Download it.

Data pulls - citation managers

Bibliographic citation software

RefWorks
ProCite
EndNote
Reference Manager
Papyrus
Library Master
Biblioscape
Note Bene
Citation Machine
Scholar's Aid
Citation
WebClarity
Zotero
Mendeley

Information Seeking Behavior

My theories of typical user behavior

1. Lack of awareness.

Most Web users have no idea what is happening to them when they surf the Web.

2. Lack of discernment.

Most users don't apply critical thinking and evaluative principles to the content they encounter on the Web.

3. Path of least resistance

Uses give up if they don't find full text right away. If they encounter citations only, they simply give up. They prefer the less credible resource if it is available now if full text, over the more credible resource if it is not readily available.

History of Indexing

Orality Period

In the early days of the book it was nearly possible to have read all extant books and for the human mind to remember everything that was read. Quick flip to today when a search engine provides the “memory” and nearly all extant publications are retrievable with an instant search. What is in the middle is the history of indexing – and a complex history it is.

Walter Ong, Orality and Literacy.

Walter Ong has noted the differences between the ways of managing knowledge in oral cultures versus the ways of managing knowledge in literate cultures. Indexes are essentially lists. Lists did not exist in oral cultures – there was no need. When writing cam about, lists eventually were necessary. Alphabetic indexes developed first for manuscripts, but the obvious problem was how to refer to a location within a manuscript. When printing came about, page numbers enabled indexing to refer to places throughout all copies of the same imprint. (see Ong, p. 123ff.).

Literary Period

In oral cultures memory was king.

In literary cultures, index was king.

In our Internet culture the search engine is king.

Modern periodical indexing began in the beginning of the 19 th century. William Frederick Poole began indexing reviews and periodicals during the 1840s, and by 1853 he published his Index to Periodical Literature . This evolved into Poole's Index to Periodical Literature , with coverage extending back to 1802. It was a remarkable work for its time. The interesting “Chronological Conspectus of the Serials Indexed” gives year-by-year and title-by-title coverage of each volume indexed.

As data increased, methods of accessing that data became necessary. Two sciences arose to make this possible, classification and indexing (Kuhr 1993). The difference between these two is extremely significant. Classification is a hierarchical system, often based on numbers that groups items into broad divisions and subdivisions. Indexing sometimes based on the language of the original author (such as in a book index), at other times it is based on a controlled vocabulary (such as in a periodical index).

Digital Age

The beginnings of the computer age saw many early experimental applications of computer technologies. With the Census Bureau using computers to count people on a Hollerith machine with punch cards in 1890, and with early computers like Eniac and the evolution of the analog computer, to the development and growth of digital computing, computers have made their mark on information storage, access, and retrieval.

Not surprisingly, paralleling the development of computational power was the explosion of publishing. More publications meant more indexing.

My favorite indexing humor.

Osborn estimated that in 1980 there were 500,000 serials in the world (p. 45).

Gale Directory of Publications and Broadcast Media (141st ed., 2006)) covers approx. 52,000 newspapers, magazines, journals, and other periodicals.

Growth of Electronic SerialsLetters of Support

Year	E-Journals and Newsletters
1991	110
1992	133
1993	240
1994	443
1995	675
1996	1,689

From http://db.arl.org/dsej/2000/mogge.html

In 1992 it was estimated that over thirty publications could be considered scholarly journal publications.
Sasse, Margo and B. Jean Winkler. "Electronic Journals: A Formidable Challenge for Libraries." Advances in Librarianship 17, (1993): 149-173.

Penrose Library currently subscribes to 44,647 unique electronic journals (as of 1/1/07)

59,308 unique electronic journals as of 1/15/08

85,166 unique electronic journals as of 11/25/08

100,657 unique electronic journals as of 1/5/12

129,615 unique electronic journals as of 3/23/13

Growth of Electronic Journals at DU University Libraries

So let's take a look at various kinds of indexes throughout the ages.

Alphabetic Index. The most basic and earliest indexes were alphabetic. The early idea of a list of ideas, people, or places evolved into nested indexes,

Book Index. We are all familiar with the structure of the modern book, with a title page in the front, followed by the table of contents, then the main body of the work, and lastly the index.

Periodical Index. Serial publications generally have articles authored by a variety of authors, and from early on periodicals often issued indexes to the contents of their publications, sometimes annually, sometimes less frequently.

Cumulative Indexes. To avoid checking many indexes (such as annual), a cumulative index gathers larger periods of time together to save the time of the user.

See: http://www2.sims.berkeley.edu/courses/is245/s03/verbal.html for KWIC, KWAC, KWOC

If it Quacks Like a Duck: KWIC, KWAC, and KWOC

KWIC means key word in context. Under this scheme each word that is not a stop-word is an index entry. Let's take a title as an example: A handbook for road repair crews

[insert scan from KWIC Index: A Bibliography of Computer Management]

The emphases justifying these indexes were speed with which they could be produced, and the low cost of production. All of this motivated, of course, by the fact that the technology was possible. Beginning with the mid- to late-1950s these indexes began to appear.

H.W. Wilson; print indexes; early computing

Key Dates:
1889: Halsey Wilson and Henry Morris start a Minnesota bookstore.
1898: Wilson buys out Morris and begins publishing the Cumulative Book Index .
1901: Reader's Guide to Periodical Literature is first published.
1903: H.W. Wilson is incorporated.
1913: Wilson sells the bookstore and moves to White Plains , New York .
1917: The company is relocated to The Bronx.
1954: Halsey Wilson dies.
1985: The company's first electronic product, a version of the Reader's Guide , debuts.
1997: The WilsonWeb web site is launched.
2011: Merged with Ebsco Publishing

From: http://www.fundinguniverse.com/company-histories/The-HW-Wilson-Company-Company-History.html

History of Indexing – see: http://www.asindexing.org/site/history.shtml

The Natural History of Pliny – contained an index in volume 1 See: http://mirlyn.lib.umich.edu:80/F/?func=direct&doc_number=001769341&local_base=MIU01_PUB

1st encyclopedia in alphabetical order – ca. 900 A.D.

Ancient Indexing

Early Modern Indexing

Early Computer Indexing

Modern Print Indexing

Future of Computer Indexing

Kuhr, Patricia S. 1993. Abstracting and indexing. In World encyclopedia of library and information services. Third ed., 1-5. Chicago : American Library Association.

Why We Can't Find It

We have libraries filled with hundreds of thousands to millions of books. Yet students continue to approach the academic reference desk saying, “Why doesn't your library have anything on x?” No wonder they say this: our access points are deponent.

Students seem to be a bit happier when they search for journal articles. Why the difference? I call this problem “the information access anomaly.” This problem can be seen when we look at the size, structure, and extent of the surrogate bibliographic record for each respective information type compared with the full text of the item.

“Information wants to be found.”

The Information Access Anomaly: Books vs. Periodicals

	Book (average)	Journal Article (average)
Typical Length - full text (FT)	200 pages x 400 1 = 80,000 words	15 pages x 400 1 = 6,000
Surrogate Record (SR)	50-100 words (75 ave.)	300-500 words (400 ave.)
SR to FT ratio	1 to 10,666	1 to 15

1 Ave. 400 pages per book ( http://www.writersservices.com/wps/p_word_count.htm )

How do the differences in ratios affect search strategies?

It's all about what you are searching and how you are searching it.

We go to Google and type in election statistics for colorado and we get 223,000 results. We type the same words in an academic library's online catalog and we get 10 results. We conclude that libraries are not helpful.

Let's say that we analyze the search and conclude that is was flawed. The user should have typed election statistics AND colorado . Now we get 14 results. Not much better. Explain ratio of FT to words in biblio record.

Who handles the syntax?

Who handles the classification? Pre-coordination, post-coordination

Who handles the logical operators? Boolean