Instructions: - Erik Web viewChapter 6: Software systems, Chapter 2: Bitys, Bytes and Representation of Information. Read - “User-centered models of information retrieval.”

Class 2 – Information systems as boundary objects

Last week we began investigating the core concepts of information infrastructures, information

systems, information institutions and information organization. This week we are going to explore in

more detail the relationships between information systems, users, documents and institutions by

exploring a case study of HTML documents used to deliver text-based encyclopedic information. In

this exercise we will experiment with some of the technical elements that currently influence

information visualization and use and explore the connection between the intellectual content of

documents and their organizational structures.

Instructions:

Work individually or in groups to complete the worksheet. When you get to a section that requires

you to select a resource to explore – pick one resource (please don’t always choose the first one!).

When asked to ‘discuss as a group’, consider your response and continue completing the worksheet..

We’re going to work with computer coding today and here’s an important note as you follow the exercises. Computer code is shown on numbered lines and are enclosed in boxes. The

numbered lines are simply to help as a reference during instruction and should not be copied into

your program. For example a line that reads 56. p { visibility:hidden; } should simply be typed in as

p { visibility:hidden; }

Suggested Readings1. Mitchell, E. (2015). Chapter 1 in Metadata Standards and Web Services in Libraries, Archives, and Museums.

Libraries Unlimited. Santa Barbara, CA.

2. Listen: With modern makeovers, Americas libraries are branching out. http://www.npr.org/2013/09/01/217211315/with-modern-makeovers-americas-libraries-are-branching-out

3. Listen: Computers Are The Future, But Does Everyone Need To Code?. NPR News Story 1/25, 2014. http://www.npr.org/2014/01/25/266162832/computers-are-the-future-but-does-everyone-need-to-code

4. Listen: Do we really need Libraries (2015 – NPR story): http://www.npr.org/sections/npr-history-dept/2015/05/05/403529103/do-we-really-need-libraries

5. Kernighan, B. (2011). D is for Digital. Chapter 6: Software systems, Chapter 2: Bitys, Bytes and Representation of Information

6. Read - “User-centered models of information retrieval.” Introduction to modern information retrieval. Pp 249-261. –

Metadata Standards and Web Services Page 1

Erik Mitchell

7. Explore: DCC Curation Lifecycle Model. (2012). http://www.dcc.ac.uk/resources/curation-lifecycle-model

8. Explore: Records and Information Life Cycle Management. http://www.bac-lac.gc.ca/eng/services/government-information-resources/information-management/Pages/records-information-life-cycle/introduction.aspx

Optional readings9. M. K. Buckland. (1997). What is a document?48, 804-809. http://people.ischool.berkeley.edu/~buckland/whatdoc.html

Discussion of readings

Information needs/seeking, retrieval and use

On the surface it may appear easy to design and implement information systems. All it takes is

creating information resources and making them available to a user. Behind the scenes however

there are complex information systems that store the documents, store representations of documents

to facilitate retrieval and use and create indexes that help match a user's information query. These

information systems run on computers that are designed to serve thousands of users at a time.

These two elements combined, the software and hardware could be considered to be an "information

system."

These systems have to be designed with users in mind and need to take into consideration the client

platform (e.g. a laptop, smartphone, tablet), knowledge level (e.g. information literacy) and

information need (e.g. find a book) of the user. Two of these items, the user knowledge level or

cognitive state and information need inform the information seeking behavior of the user. The final

piece, the client technology influences the interaction between the human and computer (e.g. Human

Computer Interaction or HCI).

Let's group each of these items in to a simple visual model:


Erik Mitchell

http://www.dcc.ac.uk/resources/curation-lifecycle-model

In the above model we have a hardware environment on top of which a set of software is deployed.

The entire information system is developed to implement the abstract model that matches a user's

information need with available documents through the interaction between the user's query and an

information system. The core abstract model is pulled from an information retrieval model that we will

be reading more about later in the semester.

The interaction between each of these

elements can be visualized differently. For

example, in our reading you will find

Saracevic's Information Stratification model.

The model expands on our simple model by

introducing situational factors and by

breaking down the component parts of an

information seeking activity (e.g. - query

characteristics, interface elements, system

engineering).

At the moment you might find some of the

acronyms and systems mentioned in these


Erik Mitchell

Figure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction modelFigure 1 Saracevic's Stratified Interaction model

models confusing. Don't get too caught up in understanding each system in depth yet. In this class

we will focus on three of these elements: computing infrastructure, information seeking behavior and

digital document structure. Let's start by exploring the technology building blocks of an information

system.

Explore computing infrastructure

Before we move on to understanding documents and document representations and the roles they

play in our information system, let's briefly consider the physical building blocks of computers and

software stack that makes up an information system.

In class 1 we learned about the building blocks of computers including hard drives, RAM,

motherboards, CPUs and peripherals. It turns out that, for the most part, the only difference between

the servers that Google uses and the laptop on your desktop is the form factor of the machine (e.g. it

fits neatly in a rack mount), the amount of RAM and power of the CPU and the number of machines

combined to make up a server farm. There are other differences including the robustness of the

power supply and the number of redundant parts (e.g. extra hard drives, power supplies and RAM for

failure tolerance). In general however, information systems often get their robustness by using lots of

smaller computers rather than using one really big computer. This is also known as "commodity

computing" and is sometimes referred to as "computing as a utility."

In addition to the physical hardware, information systems have a base operating system (e.g.

Windows 7, OSX, Linux, Unix), general software that is designed to re-usable across the computer

(e.g. Libraries) and applications that serve specific purposes (e.g. Webservers, Word Processors,

Indexing software). The model below provides a view of the relationship between these four building

blocks.


Erik Mitchell

Exploring your computer

Let's spend a moment exploring our own computer including its hardware, operating system and

running applications. To explore your computer open task manager (Windows - Ctrl-Alt-Del click on

task manager), Activity Monitor (OSX - search for activity monitor), About this mac (Apple -- About

this mac). Look around the activity monitor/task manager applications. Can you find out how fast

your CPU is? How much ram do you have?

Table 1 Activity monitors for Windows (Left) and OSX (Right)


Erik Mitchell

ApplicationLibraries

Operating systemPhysical hardware

Native, web-basedUser-facing service

Date/time, GUIBundles of re-usable functions

Resource managerKernelWindows, OSX, Linux

CPURAMHDD

Question 1. CPU speed and type

Question 2. Amount of RAM in system

Question 3. Operating system

Question 4. Running applications

Turning your computer from a word-processing focused machine into a web server or document

indexing server is actually quite easy. In later classes we will use virtual machines to explore these

types of applications. While it may seem complex, the world of system administration revolves

around tuning machines and software using tools similar to the application monitoring tools we just

explored.

Understanding the logical structure of a hard-drive

In addition to understanding the building blocks of your computer it also worth understanding the

logical structure of your hard drive. Modern operating systems tend to interface with the user using

skeuomorphic principles. In skeuomorphic

design " a derivative object retains

ornamental design cues to a structure that

was necessary in the original"

(http://en.wikipedia.org/wiki/Skeuomorphism). Skeuomorphic design is a popular design principle

because it helps users infer functionality by recognizing objects that they are familiar with.

Take for example the toolbar of Word:


Erik Mitchell

Figure 2 Word toolbar iconsFigure 2 Word toolbar iconsFigure 2 Word toolbar iconsFigure 2 Word toolbar iconsFigure 2 Word toolbar iconsFigure 2 Word toolbar iconsFigure 2 Word toolbar iconsFigure 2 Word toolbar iconsFigure 2 Word toolbar icons

The toolbar shows a new document (physical paper), a file folder to indicate opening a document, a

floppy disk to indicate the "save" function, a clipboard to represent "copy" functions and a paint brush

to represent formatting options. Critics of Skeuomorphic design counter that objects loose relevance

over time and are meaningless to users who, for example, never have seen a floppy disk. In addition,

critics of skeuomorphic design assert that using these constructs in design are a barrier to a deeper

or different understanding of system functionality.

This explanation of skeuomorphic design is a long-winded way of pointing out the rather obvious fact

that our hard drives do not actually have "folders" and "documents" on them. As we learned in our

reading this week, files and folders are ultimately represented as bits on the physical media on our

disk drives. Take a few minutes and browse through Finder (OSX) or My Computer (Windows).

Question 5. What Skeuomorphic design elements do you see used to represent the information

stored on your hard drive?

Question 6. Reflect back on the classification structures we touched on in Class 1 (e.g.

Enumerative, Analytico-Synthetic, Faceted). Is there a classification structure that seems to fit

the folder structure of your hard drive? Why?

Step 1: Take a few minutes and revisit Kernighan's discussion of bits and bytes (p28-30) and

answer the following questions.

Question 7. A byte is comprised of how many bits of information?

Question 8. How many bytes are in a kilobyte?

Question 9. How many bytes are in a megabyte

Question 10. How many bytes are in a gigabyte?


Erik Mitchell

Question 11. A hard drive has 10,000 images, each 30 megabyte in size. How many of

those files will fit on a 1 gigabyte flash drive?

Question 12. How much space ( in gigabytes) do the images in the previous question take

up on the hard drive?

Explore information seeking behavior

Step 2: Chowdhury looked at a number of information behavior and information seeking models.

While the reading does a good job of describing the model there are few visual models in

the reading to help. In order to enhance your understanding read the Chowdhury reading

alongside the ppt slides for this class, and use them to inform your understanding. Rather

than focusing on every model I would recommend selecting one or two.

a. Models

i. Wilson’s problem solving model

ii. Dervin’s sense-making approach

iii. Ellis’s information seeking process

iv. Kuhlthau’s information seeking model

v. Ingwersen’s model

vi. Beklin’s ASK model

vii. Saracevic’s stratified interaction model

b. Questions

i. Briefly review the model’s components – be prepared to answer the question

“what process does this model describe?” Can you think of a real-world situation

in which this model applies?

ii. Is your model focused on user behavior, cognition, an information seeking

process or the interaction between a user and a system?

iii. Does this model fit with your own view of how you seek information? Why or why

not?


Erik Mitchell

Understanding document structure

In later classes we will explore the indexing and querying processes in more detail. In order to

understand documents and the role that their representations play in information systems let us focus

on a particular type of digital document - HTML documents. How we structure documents is central

to our use of them. For example, recipes tend to be structured in a specific way to help us

differentiate between ingredients and cooking instructions. Nearly every text or multi-media based

document has their own model, or general structure, that help us recognize how to use them.

Explore the recipe – circle and label different types of information (e.g. quantities, procedures,

ingredients, etc.

Consider the structure of this recipe and answer the following questions

1. What are the main sections of the recipe?

2. What terms/formatting are used to indicate each section?

3. What assumptions does the recipe make about your knowledge level?

Although this recipe is relatively simple, it is actually rather complex to represent in a digital document

form. We need ways to group sections including ingredients, directions, submission information and

prep data. We need a way to include a picture with the document, give it a title and need a

presentation model that makes sense to cooks. In

fact, there are two equal problems here. First, we

need the intellectual content of the recipe to be

captured and made available. Second, we need that

content to be presented in such a way as to be

useable by our readers.

Consider the impact of some of the information

seeking models that we explored. How would a clear

layout help with the Sense-making process? How

might the design and presentation of the content

influence your level of need satisfaction in Belkin’s

ASK model? This screen shot accomplishes all of this


Erik Mitchell

through a suite of technologies including HTML, CSS, data modeling and programming. In this class

we are primarily focused on HTML so lets spend a bit more time exploring the HTML standard.

Case study in HTML

HTML (HyperText Markup Language) is the primary document encoding scheme of the web. HTML

is a text-based document format that serves as the foundation of every webpage. HTML has seen

quite a few versions and is managed by a consortium known as the World Wide Web consortium

(W3c). In the remainder of this worksheet we will explore the structure of HTML documents and

consider their relationship to information organization.

Document structure overview

HTML documents are primarily comprised of elements and attributes. For each element/attribute

there is a name (e.g. element name, attribute name) and value (e.g. the value of the

element/attribute). These elements and attributes are arranged in a hierarchical manner. Exact

elements and attribute names and the rules governing their values are defined by HTML standards

maintained by the W3c. Figure 1 shows us an example HTML page. Take a moment to review the

HTML document and acquaint yourself with the following concepts:

1. Elements: In HTML, Elements are surrounded by <>. An element must be “opened” (e.g.

<html>) and “closed” (e.g. </html>) and must follow hierarchical rules (more on this below).

Take a look at line 2. The element defined on line 2 is <html>. For this element the name is

“html” and the value is all of the sub-elements (and their values) of the html element.

2. Attributes: Attributes are enclosed with element declarations (e.g. the attribute title is

attached to the element meta). An example of this is seen on line 5. The element <div> has

an attribute “id”.

3. Values: Attribute values are the text in quotes after the = sign for values (e.g. title=”Sample

page”). Element values are the text and all of the child elements contained in between the

opening and closing elements in an HTML page. For example, lets look at line 7. Find the

element <p>. The value of the element <p> is “This is a very simple webpage.”


Erik Mitchell

In the software development world these name/value combinations are also called variables. A

variable is a name/value combination such as title=”sample page”. Although not represented

explicitly, this also applies to elements. For example in regards to line 7, <p> = “This is a very simple

webpage.” Likewise, on line 5 the element <div> has an attribute id which has the value header

(id=”header”).

Question 13. Briefly review line 5 in Figure 3 to identify the element, attribute, attribute value

and element value and fill out table 1.

Table 2: Map of element / attribute names and values

Element Name Attribute Name Attribute Value Element Value

Figure 3: Example HTML document

1. <!DOCTYPE html>

2. <html>

3. <head><title>Sample page</title></head>

4. <body>

5. <div id="header">Hello World</div>

6. <div id="body">

7. <p>This is a very simple webpage</p>

8. <ul> <li>It has just a few basic elements</li>

9. <li>It has a meta tag to provide descriptive metadata</li>

10. <li>It has div elements to facilitate styling with cascading stylesheets</li>

11. </ul>

12. </div>


Erik Mitchell

13. <div id="footer">

14. <p>2011</p>

15. </div>

16. </body>

17. </html>

One additional important feature of HTML documents it that they follow a hierarchy. The hierarchical

nature of HTML may be familiar from our work last week with hierarchical classification systems.

To review, some of the features of hierarchy include (Adapted from Kwasnik, 1999 -

https://www.ideals.illinois.edu/bitstream/handle/2142/8263/librarytrendsv48i1d_opt.pdf ):

1. Inclusiveness: The top element of the hierarchy contains all sub-classes

2. Super / Sub – class distinctions: Elements interact via a super/sub class distinction.

Also known as parent/child/sibling relationships, super/sub-class distinctions describe “is-a”

relationships (e.g. head “is a” child of html)

3. Inheritance: Sub-elements are members of not only their parent class but all other parents

of those super classes.

In HTML documents these rules have some special implications. First, HTML document elements

must respect the idea of inclusiveness. This means that each element that is open (e.g. <html>) must

also be closed (e.g. </html>). The concept Super/sub class distinctions means that an element can

have only one parent. This is represented by opening and closing a child element inside of the parent

element (e.g. <html><head></head></html>). Finally, as we will see in our next worksheet that the

concept of inheritance is applied when HTML documents are processed by web browsers.

Using Figure 3 as a guide answer the following questions.

Key Questions

Question 14. What is the top element of the hierarchical document in figure 1?


Erik Mitchell

Question 15. What element is the parent of the element <li>?

Question 16. In the <div>, elements what attributes are defined?

Question 17. On line 6, what is the value of the attribute “id”?

Question 18. On line 5, what is the value of the element <div>?

HTML document structure

You may have already noticed that HTML documents are a bit odd. For example, while we say that

each element can have only one parent, you may have noticed that the <p> element occurs under

two parents (see lines 7 and 14 and find the parents of these two elements. The HTML standard

allows the repeating of elements but still requires each element to exist under a parent element.

While not comprehensive Table 2 provides us with a quick overview of how the HTML standard uses

each of the elements defined in Figure 1.

Table 3: Brief table of HTML elements

HTML element Function

html This is the root element of HTML documents. This element helps the web-

browser understand that the document follows the HTML standard

head The head element stands for “header” and contains information that the HTML

document uses to store information about the page (e.g. style sheets, javaScript,

document meta tags)

body The body element is designed to contain all of the HTML contents that will be

shown in your web-browser when the page is displayed

div The div element has little default functionality but is often used as a container for


Erik Mitchell

other elements (more on this later!)

ul The ul (or Unordered List) element helps create a bullet list of content in your

HTML document. This element is used in conjunction with multiple li (list)

elements to show individual bullets. Another element that behaves similarly to

the unorder list is ol (ordered list). When you use an ol element in place of a ul

element, the list is created with numbers instead of bullets. Note: Both <ul> and

<ol> elements represent individual items with repeating <li> elements.

li See the ul element for use. The li element is used to contain the text that is

shown in individual bullets

p The p (or paragraph) element is used to contain larger blocks of text that is

typically represented in paragraph form. While the <p> element has some

default behavior it, like all other elements can have its behavior modified through

the use of cascading style sheets.

The role of web-browsers

Although HTML documents are text-based they create GUI (Graphic user interface) web-sites when

they are displayed in a web-browser like Chrome, Firefox or Internet Explorer. In the software

development world the function that the web-browser performs is known as an “interpreter.” There

are interpreters for every different programming languages. This semester we will be working with a

few different interpreters, each designed to work with a different type of document.

Schemas, encoding and interpreters. . .

HTML is an excellent example of the new realm of digital documents that follow a metadata schema

that is implemented using a variation of the encoding system known as XML (Extensible Markup

Language). Digital documents are often text-based but behave differently when accessed through an

interpreter. As we conclude lets make sure we are on the same page with these three concepts:


Erik Mitchell

Metadata schema: A system that defines the intellectual structure of a document (e.g. title, author).

An example of a metadata schema is HTML. HTML is a metadata schema whose purpose is to

enable the publishing and viewing of documents using web protocols.

Encoding system: An encoding system is a system whose purpose is to define how a metadata

schema will be implemented. The HTML metadata schema is implemented using an encoding

system known as XML (eXtensible Markup Language). When properly implemented, HTML can be

said to conform to the XHTML (eXtensible HTML) standard. XHTML conforms to some basic rules:

1. All XHTML elements must be properly nested (e.g. <ul><li></li></ul> NOT <ul><li></ul>)

2. All XHTML elements must have a closing element (e.g. <html></html>)

3. All XHTML attributes must enclose values in quotations (e.g. href=”http://umd.edu”)

4. All XHTML elements and attributes must be in lower case

Although XHTML has these additional encoding rules that do not apply to HTML, for the most part

HTML and XHTML share the same element definitions and uses. As we will see when we begin

working with other XML-based standards, HTML is one of a range of metadata schemas that are

important in library and information science. In addition, while XHTML has some interesting rules,

these requirements are universal across all XML-based encoding systems and are important to follow

to ensure that every interpreter that processes your document behaves predictably.

HTML and XHTML can be confusing to discuss in terms of metadata schemas and encoding systems

because the standard conflates the two concepts in the definition of the HTML standard. Because of

this, the HTML standard includes both metadata schema and encoding system requirements.

Interpreter: An interpreter is an application (e.g. a web-browser) that works with specific types of

files to run the instructions contained in that file. The instructions must correspond to both a

metadata scheme (HTML) and an encoding system (HTML/XML). Web-browsers are unique in that

they are capable of interpreting a number of metadata and encoding schemes. For example, web-

browsers can interpret JavaScript, Cascading Style Sheets, XML documents, HTML documents and

XSL documents (just to name a few!). In our next class we will explore the differences between these

three concepts in more detail and consider the implications that they have for information

organization.


Erik Mitchell

Create your own web-page

Open a text editor and type the HTML code in Figure 1 into the text editor. Save the editor with the

file extension .html. In your web-browser of choice open the file (e.g. in Chrome click on File >>

Open File and navigate to your file. Figure 2 contains a screen shot of the web-page rendered using

Chrome.

Figure 4: Rendered HTML page

Validate your HTML page

Now that you have some html (or have copied the html in this document), you can make sure that

your document conforms to the HTML standard.

Step 3: Validate your HTML document

a. In a web browser, go to http://validator.w3.org/check

b. Click on “Validate by Direct Input” tab

c. Copy and paste the code from this worksheet (Figure 1 – remember to remove the

numbered lines!) or from your own document into the direct input box and click check

d. Did your document validate? What errors / warnings did you see?

When you validate your document using the W3C validator you are checking for syntactic validation. This means that you are checking to see if your webpage follows the grammar of HTML

and that there are no missed syntactic structures. In addition to syntactic validation there is also

semantic validation that makes sure that your documents (and computer code) have the intended


Erik Mitchell

meaning (remember disambiguation?). We will spend more time on semantic validation later in the

semester.

Validation is an important part of digital documents as a validated document is ready to be used by

humans and computers. If you got lots of errors in your validator don’t worry about that too much for

right now. At the moment, the important idea is to understand the purpose of a validator (to check a

digital document’s conformance to metadata standard and encoding scheme). We will be working

with validators for different standards throughout the semester.

Case study wrap-up

Digital documents are different from print documents in that they are designed to be re-used and re-

purposed in any number of ways. There is a natural tension between the need to design a document

for a specific use and the need to be able to re-purpose that document in the future. In other words,

the decisions you make now have implications for uses you have not even thought about yet.

In exploring HTML we have considered how metadata and encoding standards form the foundation of

digital documents and have briefly explored the necessary elements of these documents. Our goal in

this class is to feel comfortable with the XHTML standard. In our next class we will consider the

differences between document representation and display in more detail by learning more about

cascading style sheets and considering the implications of different approaches to HTML document

creation.

Key Questions

Question 19. HTML stands for:

Question 20. Which organization is responsible for maintaining the HTML standard?

Question 21. Which HTML element is used to begin an un-ordered list?


Erik Mitchell

Question 22. Which HTML element is used for each un-ordered list item?

Question 23. True or False: HTML elements have set behaviors that cannot be modified.

Question 24. Which HTML element is used to create an ordered list?

Question 25. Is XHTML an encoding system, a metadata schema or both?

Question 26. Which of the following is true:

a. An interpreter parses a document and executes or transforms it according to the

rules associated with that document type

b. An interpreter translates a document from one language to another

c. Web-browsers are only capable of interpreting HTML documents.

Question 27. True or False: HTML and XHTML differ in encoding rules

Question 28. True or False: HTML and XHTML differ in the use of core HTML elements

Question 29. If an HTML document is “Valid” according to a validator what does that mean?

In this class we touched on three aspects of an information system: system design, information

seeking behavior and document representation standards and technologies. We created a web page


Erik Mitchell

represented in HTML that we viewed using a web-browser. In later classes we will expand on our

understanding of document representation standards and technologies, explore indexing and retrieval

systems and become more fluent with user interaction technologies.


Erik Mitchell

Documents

Instructions: - Erik Web viewChapter 6: Software systems, Chapter 2: Bitys, Bytes and Representation of Information. Read - “User-centered models of information retrieval.”