36
LIS654 lecture 2 history Thomas Krichel 2011-09-16

LIS654 lecture 2 history Thomas Krichel 2011-09-16

Embed Size (px)

Citation preview

Page 1: LIS654 lecture 2 history Thomas Krichel 2011-09-16

LIS654 lecture 2

history

Thomas Krichel2011-09-16

Page 2: LIS654 lecture 2 history Thomas Krichel 2011-09-16

contents

• some known old thinkers– Vannevar Bush– Joseph Carl Robnett Licklider, aka “Lick”

• the birth of the repository

Page 3: LIS654 lecture 2 history Thomas Krichel 2011-09-16

background

• Vannevar Bush (1890—1974) directed the US office of Science Research and Development during WW2.

• As the war ended he saw two problems– how to make the war time scientific reports

available– find a new challenge for the scientists

• He proposed a solution in “As we may think”.

Page 4: LIS654 lecture 2 history Thomas Krichel 2011-09-16

As we may think

• It remains to date one of the most frequently cited papers in Library and Information Science.

• I think this fame is somewhat undeserved.

Page 5: LIS654 lecture 2 history Thomas Krichel 2011-09-16

the scientific record

• As scientists do more work, the “record” extends. This is good.

• Recent advances in microfilm also made is possible to store more of the record in microfilm.

• But with much research and increased specialization, “significant attainments become lost in the mass of the inconsequential”.

Page 6: LIS654 lecture 2 history Thomas Krichel 2011-09-16

the memex

• The memex was a proposed desktop machine that would store millions of books in microfilm.

• It would have a mechanism that would allow any known item from the collection rapidly.

• But the problem is what items to look at?

Page 7: LIS654 lecture 2 history Thomas Krichel 2011-09-16

organizing the collection

• Still today collections are organized hierarchically from class to subclass. Think of standard classification schemes.

• Or in a list of words, in a mechanical form from letter to letter.

• Bush rejected this, claiming that the brain does not work in that way.

Page 8: LIS654 lecture 2 history Thomas Krichel 2011-09-16

the brain, by Bush

• Bush thought that the brain works by association.

• “With one item in its grasp, [the brain] snaps instantly to the next that is suggested by association of thought”.

• This is done “in accordance with some intricate web of trails carried by the cells of the brain.”

Page 9: LIS654 lecture 2 history Thomas Krichel 2011-09-16

memex as a brain

• Every time a document is added to the memex it is given an identifier.

• Every time an item is consulted the user can associate with it other items. These associations are recorded.

• Trails of associations can be annotated and copied.

• Selection by association replaces indexing.

Page 10: LIS654 lecture 2 history Thomas Krichel 2011-09-16

sharing

• An annotated trail between items can form a new item. That item can be shared.

• Bush envisioned that there would be a way for each memex to learn from all other memexes.

• Memex users would improve their thinking ability by its use.

• This would greatly increase the speed of scientific discoveries.

Page 11: LIS654 lecture 2 history Thomas Krichel 2011-09-16

implementation

• There is no evidence that anything like the memex was ever built.

• Microfilm was replaced by digitization.• But the idea of associative trails or associative

indexing has something to do with the hypermedia.

• The later goes back to Ted Nelson.

Page 12: LIS654 lecture 2 history Thomas Krichel 2011-09-16

Licklider

• Joseph Carl Robnett Licklider (1915—1990) trained as a mathematician and psychologist and worked mainly at the MIT.

• The Council of Library Resources got funding from the Ford Foundation to examine how technology could help libraries.

• Work was undertaken by Bolt, Beranek and Newman (BBN) of later ARPA fame

Page 13: LIS654 lecture 2 history Thomas Krichel 2011-09-16

the basic idea

• The idea was that one could store all knowledge in a single or distributed machine.

• How this should be done?• Well first Lick estimated the corpus would be

10^14 bytes by the year 2000.• That’s about 500 20TB systems. • It could a be a central system with thin clients.

Page 14: LIS654 lecture 2 history Thomas Krichel 2011-09-16

the system

• The system was call “procognitive” meaning for the advancement of knowledge.

• It would not be based on documents, metadata and retrieval.

• It would process information into knowledge and questions into answers.

• Users transmit their knowledge to the system.

Page 15: LIS654 lecture 2 history Thomas Krichel 2011-09-16

information to knowledge

• To see how information can be processed into knowledge, Lick, looked at the human brain. He had studied cat brains in his PhD work.

• If it is possible to the process the body of information into knowledge structures, then questions can be answered by knowledge rather than be documents.

Page 16: LIS654 lecture 2 history Thomas Krichel 2011-09-16

Lick on the brain

• The brain receives stimuli and stores representations of them.

• The brain finds answers to question by processing stored memory on the basis of “schemata”, which are ways in which stored stimuli representations can be processed.

Page 17: LIS654 lecture 2 history Thomas Krichel 2011-09-16

human processing• Lick understood that current and foreseeable

technology would not allow processing of documents into knowledge.

• This would be the job of set of librarian called “procognitive system specialists”.

• The would encode contents of documents in a knowledge language.

• They would watch for ambiguity warnings.• Users would also provide feedback.

Page 18: LIS654 lecture 2 history Thomas Krichel 2011-09-16

encoding

• Surprisingly Lick still imagined the procognitive system be based on natural language.

• The hope was that artificial intelligence (AI) methods would be developed to extract information from documents.

• That hope seemed justified in the 60s when AI was in its infancy.

Page 19: LIS654 lecture 2 history Thomas Krichel 2011-09-16

steps to implementation

• The first attempts, in the 60s, tried to find the citation string in a database of citations.

• Thus this was more information retrieval on a small set of metadata than actual digital library work.

• Librarians preparing bibliographies for researchers were the prime users.

Page 20: LIS654 lecture 2 history Thomas Krichel 2011-09-16

into 80s

• In the 80s the personal computer “came back”.

• Searching could be done of the full-texts of document.

• Browsing became available.

Page 21: LIS654 lecture 2 history Thomas Krichel 2011-09-16

90s

• In the 90s the Internet and the search engine came along.

• Initially search engines followed standard information retrieval principles.

• My first work, about 1993, was based on gopher access and WAIS indexing.

Page 22: LIS654 lecture 2 history Thomas Krichel 2011-09-16

the semantic web• The semantic web is the actual successor to

Lick’s vision.• It’s still not done.• I speculate it will not be done for a long time.• The reason is that while Lick thought

Psychology and Computers, he did not think through the economics of operating such system as the ones that he proposed.

• He also had too optimistic a vision about AI.

Page 23: LIS654 lecture 2 history Thomas Krichel 2011-09-16

back in the trenches

• As we have seen early digital library visions have been inspired by the concern of access to scientific documents.

• The academic digital library was synonymous with the digital library.

• So all the progress was there, and pretty much is.

Page 24: LIS654 lecture 2 history Thomas Krichel 2011-09-16

academic documents

• There are basically two types of academic documents.

• There are academic books and academic articles or papers.

• Both of them have been treated in different ways in the past, and continue to be treated differently at this time, maybe not for long.

Page 25: LIS654 lecture 2 history Thomas Krichel 2011-09-16

academic books (monographs)

• Books are (were) purchased by libraries. • They were cataloged into the local integrated

library system– locally or– through shared cataloging

Page 26: LIS654 lecture 2 history Thomas Krichel 2011-09-16

academic papers

• Most of them published in serials.• Libraries never catalogued them locally.• They relied on third party services to provide

abstracting and indexing services for them.

Page 27: LIS654 lecture 2 history Thomas Krichel 2011-09-16

publishing academic papers

• Publishing academic papers go through a process of peer review.

• Papers are written for free by some academics.

• They are being reviewed for free by other academics.

• The profits from publishing go to publisher. Academic publishing is very profitable.

Page 28: LIS654 lecture 2 history Thomas Krichel 2011-09-16

non-formal publication

• Some academic disciplines have a tradition of informal publication of papers that have not peer-reviewed.

• These are – mathematicians and physicists have preprints.– computer scientists and economists have working

papers.

Page 29: LIS654 lecture 2 history Thomas Krichel 2011-09-16

preprints vs working papers• Preprints were sent by academics to

colleagues.• Working papers are issued by departments

and sent other department by an exchange agreement.

• Whatever the mode of working, non-formal publication channels enabled librarians to build really digital libraries.

• Actually they were more built by their users.

Page 30: LIS654 lecture 2 history Thomas Krichel 2011-09-16

xxx.lanl.gov• This was/is a preprint server started by Paul

Ginsparg at Los Alamos National Archives.• It has been popular with physicists and

mathematicians. • It’s coverage with sub-disciplines varies.• It became arXiv.org.• It moved Cornell University in 2001.• It is now run by Cornell University libraries.

Page 31: LIS654 lecture 2 history Thomas Krichel 2011-09-16

NCSTRL• was the network computer science technical

report library, a DARPA/NSF funded project that built an infrastructure for publishing computer science working papers.

• Starting in 1993, it was built on a formal protocol called Dienst. This enables local and remote services.

• Implementation software was deployed at participating institutions.

• Collapsed completely when funding was gone.

Page 32: LIS654 lecture 2 history Thomas Krichel 2011-09-16

RePEc

• is a federated system based on metadata (ReDIF) and a transport protocol (Guildford Protocol), both written by yours truly.

• It can be run on a standard ftp or http software.

• RePEc archives don’t offer local services to end users.

Page 33: LIS654 lecture 2 history Thomas Krichel 2011-09-16

UPSPROTO• In 2000, Herbert Van de Sompel started work

to build a prototype system to provide the existing discipline-based digital libraries.

• The experience lead to the formation of a working group that created an interoperabilty protocol called the Open Archives Protocol for Public Metadata Harvesting. (OAI-PMH).

• I was part of that group.

Page 34: LIS654 lecture 2 history Thomas Krichel 2011-09-16

repositories

• OAI-PMH has been so widely implemented in repositories that we can say that a repository is a collection of documents on a server that implements this protocol.

• The is no official lists, but counts for institutional repositories now go over 2000.

Page 35: LIS654 lecture 2 history Thomas Krichel 2011-09-16

institutional repositories• The initial purpose of the institutional

repositories has been to make institutional research papers available.

• This would create open access to research papers.

• But the success of deposit of real scientific work has been muted.

• In the meantime there are other type of contents in IR.

Page 36: LIS654 lecture 2 history Thomas Krichel 2011-09-16

http://openlib.org/home/krichel

Please shutdown the computers whenyou are done.

Thank you for your attention!