Upload
dangthuan
View
213
Download
0
Embed Size (px)
Citation preview
Class 2 – Information systems as boundary objects
Last week we began investigating the core concepts of information infrastructures, information
systems, information institutions and information organization. This week we are going to explore in
more detail the relationships between information systems, users, documents and institutions by
exploring a case study of HTML documents used to deliver text-based encyclopedic information. In
this exercise we will experiment with some of the technical elements that currently influence
information visualization and use and explore the connection between the intellectual content of
documents and their organizational structures.
Instructions:
Work individually or in groups to complete the worksheet. When you get to a section that requires
you to select a resource to explore – pick one resource (please don’t always choose the first one!).
When asked to ‘discuss as a group’, consider your response and continue completing the worksheet..
We’re going to work with computer coding today and here’s an important note as you follow the exercises. Computer code is shown on numbered lines and are enclosed in boxes. The
numbered lines are simply to help as a reference during instruction and should not be copied into
your program. For example a line that reads 56. p { visibility:hidden; } should simply be typed in as
p { visibility:hidden; }
Suggested Readings1. Mitchell, E. (2015). Chapter 1 in Metadata Standards and Web Services in Libraries, Archives, and Museums.
Libraries Unlimited. Santa Barbara, CA.
2. Listen: With modern makeovers, Americas libraries are branching out. http://www.npr.org/2013/09/01/217211315/with-modern-makeovers-americas-libraries-are-branching-out
3. Listen: Computers Are The Future, But Does Everyone Need To Code?. NPR News Story 1/25, 2014. http://www.npr.org/2014/01/25/266162832/computers-are-the-future-but-does-everyone-need-to-code
4. Listen: Do we really need Libraries (2015 – NPR story): http://www.npr.org/sections/npr-history-dept/2015/05/05/403529103/do-we-really-need-libraries
5. Kernighan, B. (2011). D is for Digital. Chapter 6: Software systems, Chapter 2: Bitys, Bytes and Representation of Information
6. Read - “User-centered models of information retrieval.” Introduction to modern information retrieval. Pp 249-261. –
Metadata Standards and Web Services Page 1
Erik Mitchell
7. Explore: DCC Curation Lifecycle Model. (2012). http://www.dcc.ac.uk/resources/curation-lifecycle-model
8. Explore: Records and Information Life Cycle Management. http://www.bac-lac.gc.ca/eng/services/government-information-resources/information-management/Pages/records-information-life-cycle/introduction.aspx
Optional readings9. M. K. Buckland. (1997). What is a document?48, 804-809. http://people.ischool.berkeley.edu/~buckland/whatdoc.html
Discussion of readings
Information needs/seeking, retrieval and use
On the surface it may appear easy to design and implement information systems. All it takes is
creating information resources and making them available to a user. Behind the scenes however
there are complex information systems that store the documents, store representations of documents
to facilitate retrieval and use and create indexes that help match a user's information query. These
information systems run on computers that are designed to serve thousands of users at a time.
These two elements combined, the software and hardware could be considered to be an "information
system."
These systems have to be designed with users in mind and need to take into consideration the client
platform (e.g. a laptop, smartphone, tablet), knowledge level (e.g. information literacy) and
information need (e.g. find a book) of the user. Two of these items, the user knowledge level or
cognitive state and information need inform the information seeking behavior of the user. The final
piece, the client technology influences the interaction between the human and computer (e.g. Human
Computer Interaction or HCI).
Let's group each of these items in to a simple visual model:
Metadata Standards and Web Services Page 2
Erik Mitchell
In the above model we have a hardware environment on top of which a set of software is deployed.
The entire information system is developed to implement the abstract model that matches a user's
information need with available documents through the interaction between the user's query and an
information system. The core abstract model is pulled from an information retrieval model that we will
be reading more about later in the semester.
The interaction between each of these
elements can be visualized differently. For
example, in our reading you will find
Saracevic's Information Stratification model.
The model expands on our simple model by
introducing situational factors and by
breaking down the component parts of an
information seeking activity (e.g. - query
characteristics, interface elements, system
engineering).
At the moment you might find some of the
acronyms and systems mentioned in these
Metadata Standards and Web Services Page 3
Erik Mitchell
Figure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction model
models confusing. Don't get too caught up in understanding each system in depth yet. In this class
we will focus on three of these elements: computing infrastructure, information seeking behavior and
digital document structure. Let's start by exploring the technology building blocks of an information
system.
Explore computing infrastructure
Before we move on to understanding documents and document representations and the roles they
play in our information system, let's briefly consider the physical building blocks of computers and
software stack that makes up an information system.
In class 1 we learned about the building blocks of computers including hard drives, RAM,
motherboards, CPUs and peripherals. It turns out that, for the most part, the only difference between
the servers that Google uses and the laptop on your desktop is the form factor of the machine (e.g. it
fits neatly in a rack mount), the amount of RAM and power of the CPU and the number of machines
combined to make up a server farm. There are other differences including the robustness of the
power supply and the number of redundant parts (e.g. extra hard drives, power supplies and RAM for
failure tolerance). In general however, information systems often get their robustness by using lots of
smaller computers rather than using one really big computer. This is also known as "commodity
computing" and is sometimes referred to as "computing as a utility."
In addition to the physical hardware, information systems have a base operating system (e.g.
Windows 7, OSX, Linux, Unix), general software that is designed to re-usable across the computer
(e.g. Libraries) and applications that serve specific purposes (e.g. Webservers, Word Processors,
Indexing software). The model below provides a view of the relationship between these four building
blocks.
Metadata Standards and Web Services Page 4
Erik Mitchell
Exploring your computer
Let's spend a moment exploring our own computer including its hardware, operating system and
running applications. To explore your computer open task manager (Windows - Ctrl-Alt-Del click on
task manager), Activity Monitor (OSX - search for activity monitor), About this mac (Apple -- About
this mac). Look around the activity monitor/task manager applications. Can you find out how fast
your CPU is? How much ram do you have?
Table 1 Activity monitors for Windows (Left) and OSX (Right)
Metadata Standards and Web Services Page 5
Erik Mitchell
ApplicationLibraries
Operating systemPhysical hardware
Native, web-basedUser-facing service
Date/time, GUIBundles of re-usable functions
Resource managerKernelWindows, OSX, Linux
CPURAMHDD
Question 1. CPU speed and type
Question 2. Amount of RAM in system
Question 3. Operating system
Question 4. Running applications
Turning your computer from a word-processing focused machine into a web server or document
indexing server is actually quite easy. In later classes we will use virtual machines to explore these
types of applications. While it may seem complex, the world of system administration revolves
around tuning machines and software using tools similar to the application monitoring tools we just
explored.
Understanding the logical structure of a hard-drive
In addition to understanding the building blocks of your computer it also worth understanding the
logical structure of your hard drive. Modern operating systems tend to interface with the user using
skeuomorphic principles. In skeuomorphic
design " a derivative object retains
ornamental design cues to a structure that
was necessary in the original"
(http://en.wikipedia.org/wiki/Skeuomorphism). Skeuomorphic design is a popular design principle
because it helps users infer functionality by recognizing objects that they are familiar with.
Take for example the toolbar of Word:
Metadata Standards and Web Services Page 6
Erik Mitchell
Figure 2 Word toolbar iconsFigure 2 Word toolbar iconsFigure 2 Word toolbar iconsFigure 2 Word toolbar iconsFigure 2 Word toolbar iconsFigure 2 Word toolbar iconsFigure 2 Word toolbar iconsFigure 2 Word toolbar iconsFigure 2 Word toolbar icons
The toolbar shows a new document (physical paper), a file folder to indicate opening a document, a
floppy disk to indicate the "save" function, a clipboard to represent "copy" functions and a paint brush
to represent formatting options. Critics of Skeuomorphic design counter that objects loose relevance
over time and are meaningless to users who, for example, never have seen a floppy disk. In addition,
critics of skeuomorphic design assert that using these constructs in design are a barrier to a deeper
or different understanding of system functionality.
This explanation of skeuomorphic design is a long-winded way of pointing out the rather obvious fact
that our hard drives do not actually have "folders" and "documents" on them. As we learned in our
reading this week, files and folders are ultimately represented as bits on the physical media on our
disk drives. Take a few minutes and browse through Finder (OSX) or My Computer (Windows).
Question 5. What Skeuomorphic design elements do you see used to represent the information
stored on your hard drive?
Question 6. Reflect back on the classification structures we touched on in Class 1 (e.g.
Enumerative, Analytico-Synthetic, Faceted). Is there a classification structure that seems to fit
the folder structure of your hard drive? Why?
Step 1: Take a few minutes and revisit Kernighan's discussion of bits and bytes (p28-30) and
answer the following questions.
Question 7. A byte is comprised of how many bits of information?
Question 8. How many bytes are in a kilobyte?
Question 9. How many bytes are in a megabyte
Question 10. How many bytes are in a gigabyte?
Metadata Standards and Web Services Page 7
Erik Mitchell
Question 11. A hard drive has 10,000 images, each 30 megabyte in size. How many of
those files will fit on a 1 gigabyte flash drive?
Question 12. How much space ( in gigabytes) do the images in the previous question take
up on the hard drive?
Explore information seeking behavior
Step 2: Chowdhury looked at a number of information behavior and information seeking models.
While the reading does a good job of describing the model there are few visual models in
the reading to help. In order to enhance your understanding read the Chowdhury reading
alongside the ppt slides for this class, and use them to inform your understanding. Rather
than focusing on every model I would recommend selecting one or two.
a. Models
i. Wilson’s problem solving model
ii. Dervin’s sense-making approach
iii. Ellis’s information seeking process
iv. Kuhlthau’s information seeking model
v. Ingwersen’s model
vi. Beklin’s ASK model
vii. Saracevic’s stratified interaction model
b. Questions
i. Briefly review the model’s components – be prepared to answer the question
“what process does this model describe?” Can you think of a real-world situation
in which this model applies?
ii. Is your model focused on user behavior, cognition, an information seeking
process or the interaction between a user and a system?
iii. Does this model fit with your own view of how you seek information? Why or why
not?
Metadata Standards and Web Services Page 8
Erik Mitchell
Understanding document structure
In later classes we will explore the indexing and querying processes in more detail. In order to
understand documents and the role that their representations play in information systems let us focus
on a particular type of digital document - HTML documents. How we structure documents is central
to our use of them. For example, recipes tend to be structured in a specific way to help us
differentiate between ingredients and cooking instructions. Nearly every text or multi-media based
document has their own model, or general structure, that help us recognize how to use them.
Explore the recipe – circle and label different types of information (e.g. quantities, procedures,
ingredients, etc.
Consider the structure of this recipe and answer the following questions
1. What are the main sections of the recipe?
2. What terms/formatting are used to indicate each section?
3. What assumptions does the recipe make about your knowledge level?
Although this recipe is relatively simple, it is actually rather complex to represent in a digital document
form. We need ways to group sections including ingredients, directions, submission information and
prep data. We need a way to include a picture with the document, give it a title and need a
presentation model that makes sense to cooks. In
fact, there are two equal problems here. First, we
need the intellectual content of the recipe to be
captured and made available. Second, we need that
content to be presented in such a way as to be
useable by our readers.
Consider the impact of some of the information
seeking models that we explored. How would a clear
layout help with the Sense-making process? How
might the design and presentation of the content
influence your level of need satisfaction in Belkin’s
ASK model? This screen shot accomplishes all of this
Metadata Standards and Web Services Page 9
Erik Mitchell
through a suite of technologies including HTML, CSS, data modeling and programming. In this class
we are primarily focused on HTML so lets spend a bit more time exploring the HTML standard.
Case study in HTML
HTML (HyperText Markup Language) is the primary document encoding scheme of the web. HTML
is a text-based document format that serves as the foundation of every webpage. HTML has seen
quite a few versions and is managed by a consortium known as the World Wide Web consortium
(W3c). In the remainder of this worksheet we will explore the structure of HTML documents and
consider their relationship to information organization.
Document structure overview
HTML documents are primarily comprised of elements and attributes. For each element/attribute
there is a name (e.g. element name, attribute name) and value (e.g. the value of the
element/attribute). These elements and attributes are arranged in a hierarchical manner. Exact
elements and attribute names and the rules governing their values are defined by HTML standards
maintained by the W3c. Figure 1 shows us an example HTML page. Take a moment to review the
HTML document and acquaint yourself with the following concepts:
1. Elements: In HTML, Elements are surrounded by <>. An element must be “opened” (e.g.
<html>) and “closed” (e.g. </html>) and must follow hierarchical rules (more on this below).
Take a look at line 2. The element defined on line 2 is <html>. For this element the name is
“html” and the value is all of the sub-elements (and their values) of the html element.
2. Attributes: Attributes are enclosed with element declarations (e.g. the attribute title is
attached to the element meta). An example of this is seen on line 5. The element <div> has
an attribute “id”.
3. Values: Attribute values are the text in quotes after the = sign for values (e.g. title=”Sample
page”). Element values are the text and all of the child elements contained in between the
opening and closing elements in an HTML page. For example, lets look at line 7. Find the
element <p>. The value of the element <p> is “This is a very simple webpage.”
Metadata Standards and Web Services Page 10
Erik Mitchell
In the software development world these name/value combinations are also called variables. A
variable is a name/value combination such as title=”sample page”. Although not represented
explicitly, this also applies to elements. For example in regards to line 7, <p> = “This is a very simple
webpage.” Likewise, on line 5 the element <div> has an attribute id which has the value header
(id=”header”).
Question 13. Briefly review line 5 in Figure 3 to identify the element, attribute, attribute value
and element value and fill out table 1.
Table 2: Map of element / attribute names and values
Element Name Attribute Name Attribute Value Element Value
Figure 3: Example HTML document
1. <!DOCTYPE html>
2. <html>
3. <head><title>Sample page</title></head>
4. <body>
5. <div id="header">Hello World</div>
6. <div id="body">
7. <p>This is a very simple webpage</p>
8. <ul> <li>It has just a few basic elements</li>
9. <li>It has a meta tag to provide descriptive metadata</li>
10. <li>It has div elements to facilitate styling with cascading stylesheets</li>
11. </ul>
12. </div>
Metadata Standards and Web Services Page 11
Erik Mitchell
13. <div id="footer">
14. <p>2011</p>
15. </div>
16. </body>
17. </html>
One additional important feature of HTML documents it that they follow a hierarchy. The hierarchical
nature of HTML may be familiar from our work last week with hierarchical classification systems.
To review, some of the features of hierarchy include (Adapted from Kwasnik, 1999 -
https://www.ideals.illinois.edu/bitstream/handle/2142/8263/librarytrendsv48i1d_opt.pdf ):
1. Inclusiveness: The top element of the hierarchy contains all sub-classes
2. Super / Sub – class distinctions: Elements interact via a super/sub class distinction.
Also known as parent/child/sibling relationships, super/sub-class distinctions describe “is-a”
relationships (e.g. head “is a” child of html)
3. Inheritance: Sub-elements are members of not only their parent class but all other parents
of those super classes.
In HTML documents these rules have some special implications. First, HTML document elements
must respect the idea of inclusiveness. This means that each element that is open (e.g. <html>) must
also be closed (e.g. </html>). The concept Super/sub class distinctions means that an element can
have only one parent. This is represented by opening and closing a child element inside of the parent
element (e.g. <html><head></head></html>). Finally, as we will see in our next worksheet that the
concept of inheritance is applied when HTML documents are processed by web browsers.
Using Figure 3 as a guide answer the following questions.
Key Questions
Question 14. What is the top element of the hierarchical document in figure 1?
Metadata Standards and Web Services Page 12
Erik Mitchell
Question 15. What element is the parent of the element <li>?
Question 16. In the <div>, elements what attributes are defined?
Question 17. On line 6, what is the value of the attribute “id”?
Question 18. On line 5, what is the value of the element <div>?
HTML document structure
You may have already noticed that HTML documents are a bit odd. For example, while we say that
each element can have only one parent, you may have noticed that the <p> element occurs under
two parents (see lines 7 and 14 and find the parents of these two elements. The HTML standard
allows the repeating of elements but still requires each element to exist under a parent element.
While not comprehensive Table 2 provides us with a quick overview of how the HTML standard uses
each of the elements defined in Figure 1.
Table 3: Brief table of HTML elements
HTML element Function
html This is the root element of HTML documents. This element helps the web-
browser understand that the document follows the HTML standard
head The head element stands for “header” and contains information that the HTML
document uses to store information about the page (e.g. style sheets, javaScript,
document meta tags)
body The body element is designed to contain all of the HTML contents that will be
shown in your web-browser when the page is displayed
div The div element has little default functionality but is often used as a container for
Metadata Standards and Web Services Page 13
Erik Mitchell
other elements (more on this later!)
ul The ul (or Unordered List) element helps create a bullet list of content in your
HTML document. This element is used in conjunction with multiple li (list)
elements to show individual bullets. Another element that behaves similarly to
the unorder list is ol (ordered list). When you use an ol element in place of a ul
element, the list is created with numbers instead of bullets. Note: Both <ul> and
<ol> elements represent individual items with repeating <li> elements.
li See the ul element for use. The li element is used to contain the text that is
shown in individual bullets
p The p (or paragraph) element is used to contain larger blocks of text that is
typically represented in paragraph form. While the <p> element has some
default behavior it, like all other elements can have its behavior modified through
the use of cascading style sheets.
The role of web-browsers
Although HTML documents are text-based they create GUI (Graphic user interface) web-sites when
they are displayed in a web-browser like Chrome, Firefox or Internet Explorer. In the software
development world the function that the web-browser performs is known as an “interpreter.” There
are interpreters for every different programming languages. This semester we will be working with a
few different interpreters, each designed to work with a different type of document.
Schemas, encoding and interpreters. . .
HTML is an excellent example of the new realm of digital documents that follow a metadata schema
that is implemented using a variation of the encoding system known as XML (Extensible Markup
Language). Digital documents are often text-based but behave differently when accessed through an
interpreter. As we conclude lets make sure we are on the same page with these three concepts:
Metadata Standards and Web Services Page 14
Erik Mitchell
Metadata schema: A system that defines the intellectual structure of a document (e.g. title, author).
An example of a metadata schema is HTML. HTML is a metadata schema whose purpose is to
enable the publishing and viewing of documents using web protocols.
Encoding system: An encoding system is a system whose purpose is to define how a metadata
schema will be implemented. The HTML metadata schema is implemented using an encoding
system known as XML (eXtensible Markup Language). When properly implemented, HTML can be
said to conform to the XHTML (eXtensible HTML) standard. XHTML conforms to some basic rules:
1. All XHTML elements must be properly nested (e.g. <ul><li></li></ul> NOT <ul><li></ul>)
2. All XHTML elements must have a closing element (e.g. <html></html>)
3. All XHTML attributes must enclose values in quotations (e.g. href=”http://umd.edu”)
4. All XHTML elements and attributes must be in lower case
Although XHTML has these additional encoding rules that do not apply to HTML, for the most part
HTML and XHTML share the same element definitions and uses. As we will see when we begin
working with other XML-based standards, HTML is one of a range of metadata schemas that are
important in library and information science. In addition, while XHTML has some interesting rules,
these requirements are universal across all XML-based encoding systems and are important to follow
to ensure that every interpreter that processes your document behaves predictably.
HTML and XHTML can be confusing to discuss in terms of metadata schemas and encoding systems
because the standard conflates the two concepts in the definition of the HTML standard. Because of
this, the HTML standard includes both metadata schema and encoding system requirements.
Interpreter: An interpreter is an application (e.g. a web-browser) that works with specific types of
files to run the instructions contained in that file. The instructions must correspond to both a
metadata scheme (HTML) and an encoding system (HTML/XML). Web-browsers are unique in that
they are capable of interpreting a number of metadata and encoding schemes. For example, web-
browsers can interpret JavaScript, Cascading Style Sheets, XML documents, HTML documents and
XSL documents (just to name a few!). In our next class we will explore the differences between these
three concepts in more detail and consider the implications that they have for information
organization.
Metadata Standards and Web Services Page 15
Erik Mitchell
Create your own web-page
Open a text editor and type the HTML code in Figure 1 into the text editor. Save the editor with the
file extension .html. In your web-browser of choice open the file (e.g. in Chrome click on File >>
Open File and navigate to your file. Figure 2 contains a screen shot of the web-page rendered using
Chrome.
Figure 4: Rendered HTML page
Validate your HTML page
Now that you have some html (or have copied the html in this document), you can make sure that
your document conforms to the HTML standard.
Step 3: Validate your HTML document
a. In a web browser, go to http://validator.w3.org/check
b. Click on “Validate by Direct Input” tab
c. Copy and paste the code from this worksheet (Figure 1 – remember to remove the
numbered lines!) or from your own document into the direct input box and click check
d. Did your document validate? What errors / warnings did you see?
When you validate your document using the W3C validator you are checking for syntactic validation. This means that you are checking to see if your webpage follows the grammar of HTML
and that there are no missed syntactic structures. In addition to syntactic validation there is also
semantic validation that makes sure that your documents (and computer code) have the intended
Metadata Standards and Web Services Page 16
Erik Mitchell
meaning (remember disambiguation?). We will spend more time on semantic validation later in the
semester.
Validation is an important part of digital documents as a validated document is ready to be used by
humans and computers. If you got lots of errors in your validator don’t worry about that too much for
right now. At the moment, the important idea is to understand the purpose of a validator (to check a
digital document’s conformance to metadata standard and encoding scheme). We will be working
with validators for different standards throughout the semester.
Case study wrap-up
Digital documents are different from print documents in that they are designed to be re-used and re-
purposed in any number of ways. There is a natural tension between the need to design a document
for a specific use and the need to be able to re-purpose that document in the future. In other words,
the decisions you make now have implications for uses you have not even thought about yet.
In exploring HTML we have considered how metadata and encoding standards form the foundation of
digital documents and have briefly explored the necessary elements of these documents. Our goal in
this class is to feel comfortable with the XHTML standard. In our next class we will consider the
differences between document representation and display in more detail by learning more about
cascading style sheets and considering the implications of different approaches to HTML document
creation.
Key Questions
Question 19. HTML stands for:
Question 20. Which organization is responsible for maintaining the HTML standard?
Question 21. Which HTML element is used to begin an un-ordered list?
Metadata Standards and Web Services Page 17
Erik Mitchell
Question 22. Which HTML element is used for each un-ordered list item?
Question 23. True or False: HTML elements have set behaviors that cannot be modified.
Question 24. Which HTML element is used to create an ordered list?
Question 25. Is XHTML an encoding system, a metadata schema or both?
Question 26. Which of the following is true:
a. An interpreter parses a document and executes or transforms it according to the
rules associated with that document type
b. An interpreter translates a document from one language to another
c. Web-browsers are only capable of interpreting HTML documents.
Question 27. True or False: HTML and XHTML differ in encoding rules
Question 28. True or False: HTML and XHTML differ in the use of core HTML elements
Question 29. If an HTML document is “Valid” according to a validator what does that mean?
In this class we touched on three aspects of an information system: system design, information
seeking behavior and document representation standards and technologies. We created a web page
Metadata Standards and Web Services Page 18
Erik Mitchell
represented in HTML that we viewed using a web-browser. In later classes we will expand on our
understanding of document representation standards and technologies, explore indexing and retrieval
systems and become more fluent with user interaction technologies.
Metadata Standards and Web Services Page 19
Erik Mitchell