69
Chapter 11: Advanced Text Techniques: Web and Information

Chapter 11: Advanced Text Techniques: Web and Information

Embed Size (px)

Citation preview

Page 1: Chapter 11: Advanced Text Techniques: Web and Information

Chapter 11: Advanced Text Techniques: Web and

Information

Page 2: Chapter 11: Advanced Text Techniques: Web and Information
Page 3: Chapter 11: Advanced Text Techniques: Web and Information

Networks: Two or more computers communicatingNetworks are formed when distinct computers communicate via some mechanism.Rarely does the communication take the place of 0/1 voltages over a wire. Too hard to make work over distances

More common is the use of frequencies (maybe in the sound range, but maybe not).

For example, a modem (modulator-demodulator) takes your computer’s 0’s and 1’s and translates them into sound frequencies that can pass over the sound wire and be decoded on the other side.

Page 4: Chapter 11: Advanced Text Techniques: Web and Information

Networks, networks everywhereIf you’re driving a newer car, you probably have a network in there.There are lots of computers in your car (controlling air flow, gas flow; making the air bag work) and they communicate.

You can have a network in your own home, or even on an airplane.Can use radio signals for communication (wireless)

Or can string a cable between two computers.

Page 5: Chapter 11: Advanced Text Techniques: Web and Information

Networks have layersNetworks have several layers to them.

At the bottom level is the physical substrate. What are the signals being passed on?

Levels higher determine how data is encoded. Do we use sound frequencies to represent 0’s and 1’s, or radio waves?

Do we send a bit at a time? A byte at a time? Or in packets larger than that?

Levels even higher determine the protocol of communication. How do I address a particular computer I want to talk to? Or many computers?

How do I tell a computer that I want to talk to it? That I’m starting to send it data? What it’s supposed to do with it? When we’re done?

Page 6: Chapter 11: Advanced Text Techniques: Web and Information

Ethernet: A common mid-level protocolEthernet is a common mid-level protocol.It specifies some aspects of how data is encoded and computers are specified.For example, each computer on an Ethernet network has a deep-down inside-the-computer address that identifies it uniquely.

But Ethernet can work over a variety of physical substrates.For example, you can run Ethernet over wireless (radio) or over coaxial cable (where you hear terms like “10baseT”

Page 7: Chapter 11: Advanced Text Techniques: Web and Information

Internet: A collection of networksThe Internet is a network of networks.If you put a device in your home so that your computers can talk to one another, you have a network.A wireless base station, or an Ethernet router, perhaps.

You can probably reach printers on your network, or copy files between computers.

If you now connect your network (through an Internet Service Provider (ISP)) to the global Internet, your network becomes yet another part of the whole Internet.

Page 8: Chapter 11: Advanced Text Techniques: Web and Information

Internet is based on agreements on encodingsThe Internet is built on a set of agreements about: How computers will be addressed

A set of four numbers (each one byte now, soon to grow) separated by periods, e.g., 10.1.0.5.

A way of associating domain names with these numbers, like www.cnn.com (which really is a name that resolves to a set of four numbers), using domain name servers.

How computers will communicate That data will be put into packets with various pieces in them.

That computers will format their data and talk to one another using TCP/IP

How packets are routed around the network to find their destination.

Page 9: Chapter 11: Advanced Text Techniques: Web and Information

The Internet is not newThe Internet agreements date back 40 years.

It was originally set up for military applications.One of the features of the Internet is that packets find their destination even if part of the Internet is destroyed, damaged, or subject to censorship.

The Internet originally had only a handful of computers (nodes) on it, but it has grown dramatically in recent years.

Page 10: Chapter 11: Advanced Text Techniques: Web and Information

Protocols on the Internet But all that just lets us pass data back and forth. What does the data say? What does the data do?

One of the first applications placed on top of the Internet was electronic mail. The mail protocols have evolved over time to their standard forms today.

The File Transfer Protocol (FTP) allows computers to move files between each other. It defines what one side says to the other when copying a file over (e.g., “STO filename”) and how the file will be encoded.

Page 11: Chapter 11: Advanced Text Techniques: Web and Information

Then there’s the WebThe Web dates only back to the 1980’s, but before there were graphical browsers (like Netscape Navigator, Internet Explorer, and the first, NCSA Mosaic).

The Web is (again) a set of agreements, started by Tim Berners-LeeOn how to refer to everything on the Internet: The URL (Uniform Resource Locator)

On how to create documents that refer to things all over the Internet: HTTP (HyperText Transfer Protocol)

On how those documents will be formatted: Using HTML (HyperText Markup Language)

Page 12: Chapter 11: Advanced Text Techniques: Web and Information

HyperText: Non-linear textHypertext is a term invented by Ted Nelson in the 1960’s.It refers to text that is non-linear, which the computer makes possible.

You’re familiar with this on the Web: Read a little on a page, Click, Continue reading on some other page anywhere on the Internet.

Page 13: Chapter 11: Advanced Text Techniques: Web and Information

The point of the Web is HypertextTim Berners-Lee wanted a way to create readable documents that could reference material anywhere on the Internet in a hypertext format.

There are technical flaws in what he did:For example, the phenomena of “dead links” couldn’t happen in other hypertext systems before the Web.

But it worked and has become a worldwide standard.

Page 14: Chapter 11: Advanced Text Techniques: Web and Information

HyperText Transfer Protocol (HTTP)HTTP defines a very simple protocol for how to exchange information between computers.

It defines the pieces of the communication.What resource do you want?Where is it?Okay, here’s the type of thing it is (JPEG, HTML, whatever), and here it is.

And the words that the computers say to one another:Not-complex words like “GET”, “PUT” and “OK”

Page 15: Chapter 11: Advanced Text Techniques: Web and Information

Uniform Resource Locators (URL)URLs allow us to reference any material anywhere on the Internet.Strictly speaking, any computer providing a protocol accessible via URL.

Just putting your computer on the Internet does not mean that all of your files are accessible to everyone on the Internet.

URLs have four parts:The protocol to use to reach this resource,The domain name of the computer where the resource is,

The path on the computer to the resource,And the name of the resource.

Page 16: Chapter 11: Advanced Text Techniques: Web and Information

http://www.cc.gatech.edu/index.html

ftp://cleon.cc.gatech.edu/pub/guzdial/papers/sigcse2003.pdf

ProtocolProtocol

Domain nameDomain name

PathPath

FilenameFilename

Page 17: Chapter 11: Advanced Text Techniques: Web and Information

What if there is no path?Web servers (programs that understand the HTTP protocol) typically have a special directory that they serve from.Files in that special directory are directly referable without specifying a path.

Sub-directories within the server directory can be accessed in terms of a path.But always starting from the server directory, so not everything on your computer is always accessible.

Page 18: Chapter 11: Advanced Text Techniques: Web and Information

A browser is a clientYour Web browser is called a client accessing a Web server.

Programs like Internet Explorer or Firefox or Safari understand a lot about Internet protocols.They know how to interpret HTML and display it graphically.

If the HTML references other resources, like JPEG pictures, the client fetches them and displays them where appropriate.

Your client knows the details of the HTTP (and maybe FTP, mailto, gopher…) protocols so that it can request the resources you request.

Page 19: Chapter 11: Advanced Text Techniques: Web and Information

You don’t need a browser to use the InternetYour mail program also understands some Internet protocols.JES even knows a little about one of the mail protocols, SMTP (Simple Mail Transfer Protocol), so that it can email homework to your instructor (if it’s set up).

Python (and other languages) have modules that allow you to use these protocols.In Python, we can read any URL as if it was a file.

Page 20: Chapter 11: Advanced Text Techniques: Web and Information

Opening a URL and reading it>>> import urllib>>> connection = urllib.urlopen("http://www.cnn.com")

>>> weather = connection.read()>>> connection.close()

Page 21: Chapter 11: Advanced Text Techniques: Web and Information

Let’s do something useful: Read from Facebook

Page 22: Chapter 11: Advanced Text Techniques: Web and Information

Reading Facebook via ProgramHere’s an example of how to read Facebook activities and interests – but it won’t work.

1.Facebook changed their interface to “Timeline” so the way we “scraped” the screen would have to change.

2.Facebook changed how they responded to “robots” (access from programs, not browsers) disallowing access to activities and interests.

Page 23: Chapter 11: Advanced Text Techniques: Web and Information

def fbActivitiesInterests(name): import urllib # Could go above, too interests = "" # Get the Facebook page for a name connection = urllib.urlopen("http://www.facebook.com/"+name+"?ref=pb") fb = connection.read() connection.close() # Let's find if "activities_and_interests" are there actStart = fb.find("activities_and_interests") pageLoc = actStart #Starting place to look for pages

We’ll look for a Facebook entry with a given “name.”If it’s there, we’ll look to see if we can find “activities_and_interests” (a code in a URL) on that page.

Page 24: Chapter 11: Advanced Text Techniques: Web and Information

if actStart <> -1: #Did we find Activities and Interests? # Now, a loop to find all the activities and interests pageLoc = fb.find("http://www.facebook.com/pages/",pageLoc) while (pageLoc <> -1): interestStart = pageLoc+30 # For "http://www.facebook.com/pages/" interestEnd = fb.find("/",interestStart+1) interest = fb[interestStart:interestEnd] # Did we get to the last page, which is actually a footer? if not ("footer" in interest): interests = interests+fb[interestStart:interestEnd]+"," pageLoc = fb.find("http://www.facebook.com/pages/",pageLoc+1) else: pageLoc = -1 return interests[:-1]

Now use a WHILE loop to keep searching for more references to an interest with a Facebook “page.” And wherever we find one, grab the interest/activity name and add it to our list.Stop when we get to the “footer” page reference

Page 25: Chapter 11: Advanced Text Techniques: Web and Information

When this was working…

>>> fbActivitiesInterests("mark.guzdial")'Running,Running,Beer,Beer,Engineers-Bookstore,POLISH-CLUB-OF-ATLANTA,Napoleons,Sprig-Restaurant-and-Bar,The-Barrelhouse,National-Science-Foundation,Running-of-the-Ears,Cognitive-Science-Society,Ludoliteracy,Seymour-Papert,Georgia-Techs-Computational-Media-progam,ACM-Association-for-Computing-Machinery,Herbert-Simon,Georgia-Computes,Wait-Wait-Dont-Tell-Me,Sublime-Doughnuts,Terry-Pratchetts-Discworld,Dr-Horribles-Sing-Along-Blog,Firefly,Doug'>>> fbActivitiesInterests("alison.clear")'Grandchilden,Christmas,Travel,Scary-washing-machine,Engage-Learning,Tea-Total,ACM-SIGCSE,Jeremy-the-Sign-Language-Guy,Bob-Parker-Deserves-a-Knighthood-for-all-the-hard-work-he-has-done,Leo-the-Neo,LightORama,Desparate-Housewives'>>> fbActivitiesInterests("alfredtwo")'Association-for-Computing-Machinery,Firearms,FIRST-Robotics,Reading,Imagine-Cup-2010,Microsoft-TeachTec,Ron-Charette,Christinas-Country-Cafe,NH-TechFest,BookHampton,KinectEDucation,Microsoft-New-England-Research-and-Development,First-Robotics-For-Inspiration-and-Recognition-of-Science-and-Technology,Microsoft-Canada-Partners-in-Learning,ISTE-SIGCT,DreamSpark,EduConnect-Microsoft,Bishop-Guertin-High-School,Microsoft-Innovation-Center,Bytes-by-MSDN,Bill-Miller-FRC-Team,Cool-Cat-Teacher,Brooklyn-Technical-High-School,Bookey-Consulting-Inc,Photosynth,Smooth-Fusion,Microsoft-Technology-Center-Boston,Computer-Science-Teachers-Association,HP-CodeWars,ISTE'

Page 26: Chapter 11: Advanced Text Techniques: Web and Information

WHILE loopwhile sometest:

#Do something hereLooks like an IF, but just keeps going until the test is false.

In other words, you can create Infinite Loops.

while 1 == 1:print “This is the song that never ends…”

Page 27: Chapter 11: Advanced Text Techniques: Web and Information

Can we write any programs that query Facebook?Yes, if you do things that Facebook allows.

You can ask if someone is on Facebook.

Page 28: Chapter 11: Advanced Text Techniques: Web and Information

Grabbing a Facebook page>>> import urllib>>> con = urllib.urlopen("http://www.facebook.com/mark.guzdial")>>> fb = con.read()>>> con.close()

Page 29: Chapter 11: Advanced Text Techniques: Web and Information

>>> print fb<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>Mark Guzdial</title><meta name="description" content="Facebook helps you connect and share with the people in your life." /><noscript><meta http-equiv="X-Frame-Options" content="deny" /></noscript><link id="OHOA2" type="text/css" rel="stylesheet" href="http://static.ak.fbcdn.net/rsrc.php/v2/y3/r/4C3Lvn3dMWv.css" /></head><body><a href="/home.php"><img src="http://static.ak.fbcdn.net/rsrc.php/v2/yA/r/-a7RweL8dOj.gif" width="76" height="20" alt="facebook" /></a><br /><strong>Mark Guzdial</strong><br /><strong>Mark Guzdial is on Facebook.</strong> To connect with Mark, sign up for Facebook today.<br /><br /><a href="/r.php?next=http%3A%2F%2Fm.facebook.com%2Fmark.guzdial">Sign Up</a> <a href="/login.php?next=http%3A%2F%2Fm.facebook.com%2Fmark.guzdial&amp;refsrc=http%3A%2F%2Fwww.facebook.com%2Fmark.guzdial">Log in</a><hr /><img src="http://profile.ak.fbcdn.net/hprofile-ak-snc4/161216_12806007_1498517380_n.jpg" width="200" height="188" /><br /><br /><strong>Mark Guzdial</strong> likes:<br /><br /><table><tr><th align="left"><b>Other</b></th></tr><tr><td><a href="/WaltDisneyWorld">Walt Disney World</a><br /><a href="/Amazon">Amazon.com</a><br /><a href="/DalaiLama">Dalai Lama</a><br /><a href="/SmallBusinessSaturday">Small Business Saturday</a><br /><a href="/georgehtakei">George Takei</a><br /></td></tr></table><br /><small><b>English (UK)</b>&nbsp;·&nbsp;<a href="/a/language.php?l=cy_GB&amp;gfid=AQCQNsyspUPrl-QE">Cymraeg</a>&nbsp;·&nbsp;<a href="/a/language.php?l=en_US&amp;gfid=AQD_Ymisv0ZHn_Gt">English (US)</a>&nbsp;·&nbsp;<a href="/language.php">More…</a></small><br /><small><a href="/login.php?refsrc=http%3A%2F%2Fwww.facebook.com%2Fmark.guzdial">Log in to Facebook</a>&nbsp;·&nbsp;Facebook ©2012</small><br /><input type="hidden" id="m_user_DEPRECATED" value="0" /><br /></body></html>

Page 30: Chapter 11: Advanced Text Techniques: Web and Information

Is someone on Facebook?

def isOnFacebook(name): import urllib # Could go above, too namewords = name.split() lastname = namewords[-1] combined = namewords[0]+"."+namewords[-1] # Get the Facebook page for name connection = urllib.urlopen("http://www.facebook.com/"+name) fb = connection.read() connection.close() checkOn = fb.find("is on Facebook") if checkOn <> -1: print name,"is on Facebook at http://www.facebook.com/"+name return true # Get the Facebook page for combined name connection = urllib.urlopen("http://www.facebook.com/"+combined) fb = connection.read() connection.close() checkOn = fb.find("is on Facebook") if checkOn <> -1: print name,"is on Facebook at http://www.facebook.com/"+combined return true # Get the Facebook page for just the last name connection = urllib.urlopen("http://www.facebook.com/"+lastname) fb = connection.read() connection.close() checkOn = fb.find("is on Facebook") if checkOn <> -1: print name,"*may* be on Facebook at http://www.facebook.com/"+lastname return false

Is the name on?Is the name on with first.last combined form?

If the last name (surname) is on, might be the person, might not.

Return true/false, so you can use in a program.

Page 31: Chapter 11: Advanced Text Techniques: Web and Information

Running it>>> isOnFacebook("Mark Guzdial")Mark Guzdial is on Facebook at http://www.facebook.com/Mark.Guzdial1>>> isOnFacebook("Carole Moore")Carole Moore *may* be on Facebook at http://www.facebook.com/Moore0>>> isOnFacebook("Barbara Ericson")Barbara Ericson is on Facebook at http://www.facebook.com/Barbara.Ericson1>>> isOnFacebook("Erin Tone")Erin Tone is on Facebook at http://www.facebook.com/Erin.Tone1

Page 32: Chapter 11: Advanced Text Techniques: Web and Information

How would you use this?if isOnFacebook(somename):

email.write(“Please friend us on Facebook!”)

Tailoring your advertising to your customers.Deciding whether to use email or Facebook messages to contact someone.

Page 33: Chapter 11: Advanced Text Techniques: Web and Information

Storing a file is differentIt is possible to send information to a Web server.That’s how search functions, forms, etc. work.

But it’s more complicated than just reading,and it requires an accepting program on the Web server.

It isn’t hard to send information to an FTP server, though.

But first, let’s make our temperature-finding function useful by directly reading the Weather page…

Page 34: Chapter 11: Advanced Text Techniques: Web and Information

FTP and HTTP ServersFTP allows us to move files between computers on the InternetIncluding our computer and the computer hosting our HTTP server.

Computers running HTTP servers often also run FTP servers to allow for manipulation of the Web files.

You can do this with specialized FTP clients, or with Python/Jython.

Page 35: Chapter 11: Advanced Text Techniques: Web and Information

Uploading to an FTP server>>> import ftplib>>> connect = ftplib.FTP("cleon.cc.gatech.edu")>>> connect.login("guzdial",“mypassword")'230 User guzdial logged in.'>>> connect.storbinary("STOR barbara.jpg",open(getMediaPath("barbara.jpg")))

'226 Transfer complete.'>>> connect.storlines("STOR JESintro.txt",open("JESintro.txt"))

'226 Transfer complete.'>>> connect.close()

Page 36: Chapter 11: Advanced Text Techniques: Web and Information

The Interactive WebThe first use of HTTP was just to send around static pages and images (and sounds and…)

Later extensions allowed for users providing input to the server (such as for doing searches).Originally, this was just “CGI” (Common Gateway Interface) scripts.

Later, servlets and applets and PHP and…

Page 37: Chapter 11: Advanced Text Techniques: Web and Information

Interactive Web requires programs to generate HTMLTypically, a Web server will have some directory specified “special.”Files referenced there aren’t just returned to the client.

Instead, the files are executed and the result is returned to the input.

There’s even a mechanism where the client can provide input to the executed files, e.g., a search string.

Those special files would generate HTML.The generated HTML might be based on up-the-minute information like stock quotes and temperature sensors and database queries.

Thus, to have an interactive Web, we need to write programs that write HTML.

Page 38: Chapter 11: Advanced Text Techniques: Web and Information

Using text to map between any mediaWe can map anything to text.We can map text back to anything.This allows us to do all kinds of transformations:Sounds into Excel, and back againSounds into pictures.Pictures and sounds into lists (formatted text), and back again.

Page 39: Chapter 11: Advanced Text Techniques: Web and Information

Why care about media transformations?Transformed digital media can be more easily transmittedFor example, transfer of binary files over email is often accomplished by converting to text.

We can encode additional information to check for and even correct errors in transmission.

It may allow us to use the media in new contexts, like storing it in databases.

Some transformations of media are made easier when the media are in new formats.

Page 40: Chapter 11: Advanced Text Techniques: Web and Information

Mapping sound to textSound is simply a series of numbers (sample values).

To convert them to text means to simply create a long series of numbers.

We can store them to a file to manipulate them elsewhere.

Page 41: Chapter 11: Advanced Text Techniques: Web and Information

Copying a sound to textdef soundToText(sound,filename): file = open(filename,"wt") for s in getSamples(sound): file.write(str(getSample(s))+"\n")

file.close()

Page 42: Chapter 11: Advanced Text Techniques: Web and Information

What to do with sound as textWhat this leaves us with is a long file, containing just numbers.

What knows how to deal with long lists of numbers? EXCEL!

We can simply open our text (.txt) file in Excel.

Page 43: Chapter 11: Advanced Text Techniques: Web and Information

We can process the sound in ExcelWe can graph the sound (below)

A signal view is simply the graph of the sample values!

We can add a column and do some modification to the original sound. (Fill down to get them all.)Can increase the volume that way.

Page 44: Chapter 11: Advanced Text Techniques: Web and Information

Some forms of Excel may not work

Page 45: Chapter 11: Advanced Text Techniques: Web and Information

Reading text back into a soundAfter we process the sound (as text) in Excel, we can save it back to a sound.First, copy the column you want into a new worksheet

Then, save the worksheet as a .txt file.Get the full pathname of the new .txt file to use in JES.

Page 46: Chapter 11: Advanced Text Techniques: Web and Information

Issues in reading the text back into a soundWe can’t be sure how many numbers are in the file.

We can’t be sure that the numbers will all fit into the sound we’ve chosen to serve as our target.

What we want to do is:AS LONG AS we’re not out of numbers in the file, and AS LONG AS we still have room in the sound,

Copy a number out of the file,And put it into a sample in the sound,Then go to the next number and the next sample.

Page 47: Chapter 11: Advanced Text Techniques: Web and Information

Reading the text back as a sounddef textToSound(filename): #Set up the sound sound = makeSound(getMediaPath("sec3silence.wav")) soundIndex = 1 #Set up the file file = open(filename,"rt") contents=file.readlines() file.close() fileIndex = 0 # Keep going until run out sound space or run out of file contents

while (soundIndex < getLength(sound)) and (fileIndex < len(contents)):

sample=float(contents[fileIndex]) #Get the file line setSampleValueAt(sound,soundIndex,sample) fileIndex = fileIndex + 1 soundIndex = soundIndex + 1 return sound

Page 48: Chapter 11: Advanced Text Techniques: Web and Information

while (soundIndex < getLength(sound)) and (fileIndex < len(contents)):Let’s explain this statement:

while – keeps executing the block until the logical expression is false.

(soundIndex < getLength(sound)) – while the index is not yet at the end of the sound, so there’s still room for more numbers.

and – both parts have to be true for the whole thing to be true.

(fileIndex < len(contents)) – while there are any numbers left in the file, i.e., the fileIndex is before the length of the contents of the file.

Page 49: Chapter 11: Advanced Text Techniques: Web and Information

We could do pictures, but more complicatedPictures aren’t just a single number for each pixel

To recreate a picture in text we need to record, for each pixel:The X and Y positionsThe R, G, and B component values

That requires more structured text than simply a long line of numbers.

Let’s do that in just a few minutes.

Page 50: Chapter 11: Advanced Text Techniques: Web and Information

Mapping from text to anythingOnce we’ve converted to text (or numbers), we can do anything we want.

Like, mapping from sound to…pictures!

Page 51: Chapter 11: Advanced Text Techniques: Web and Information

We simply decide on a representation: How do we map sample values to colors?

def soundToPicture(sound): picture =

makePicture(getMediaPath("640x480.jpg"))

soundIndex = 0 for p in getPixels(picture): if soundIndex == getLength(sound): break sample =

getSampleValueAt(sound,soundIndex) if sample > 1000: setColor(p,red) if sample < -1000: setColor(p,blue) if sample <= 1000 and sample >= -1000: setColor(p,green) soundIndex = soundIndex + 1 return picture

Here’s one:

- Greater than 1000 is red

- Less than 1000 is blue

- Everything else is green

Page 52: Chapter 11: Advanced Text Techniques: Web and Information

Breakbreak is yet another new statement.It literally means “Exit the current loop.”

It’s most often used in the block of an if“If something extraordinary happens, leave the loop immediately.”

In this case, “If we run out of samples before we run out of pixels, STOP!”

Page 53: Chapter 11: Advanced Text Techniques: Web and Information

Representing “This is a test”

Page 54: Chapter 11: Advanced Text Techniques: Web and Information

Any visualization of sound is merely an encoding

Page 55: Chapter 11: Advanced Text Techniques: Web and Information

Any visualization of any kind is merely an encodingA line chart? A pie chart? A scatterplot?These are just lines and pixels set to correspond to some mapping of the data

Sometimes data is lostRecall the mapping of grayscale

Sometimes data is not lost, even if it looks like a dramatic change.Recall creating a negative of an image, then taking the negative of a negative to get back to the original.

Page 56: Chapter 11: Advanced Text Techniques: Web and Information

Lists can do anything!

def soundToList(sound): list = [] for s in getSamples(sound):

list = list + [getSample(s)]

return list

Going from sound to lists is easy:

Page 57: Chapter 11: Advanced Text Techniques: Web and Information

This really does work>>> list = soundToList(sound)>>> print list[0]6757>>> print list[1]6852>>> print list[0:100][6757, 6852, 6678, 6371, 6084, 5879, 6066, 6600, 7104, 7588, 7643, 7710, 7737, 7214, 7435, 7827, 7749, 6888, 5052, 2793, 406, -346, 80, 1356, 2347, 1609, 266, -1933, -3518, -4233, -5023, -5744, -7394, -9255, -10421, -10605, -9692, -8786, -8198, -8133, -8679, -9092, -9278, -9291, -9502, -9680, -9348, -8394, -6552, -4137, -1878, -101, 866, 1540, 2459, 3340, 4343, 4821, 4676, 4211, 3731, 4359, 5653, 7176, 8411, 8569, 8131, 7167, 6150, 5204, 3951, 2482, 818, -394, -901, -784, -541, -764, -1342, -2491, -3569, -4255, -4971, -5892, -7306, -8691, -9534, -9429, -8289, -6811, -5386, -4454, -4079, -3841, -3603, -3353, -3296, -3323, -3099, -2360]

Page 58: Chapter 11: Advanced Text Techniques: Web and Information

Can we go from pictures into lists?Of course! We just have to decide on a representation.We’ll put a list as an element for each pixel.

The numbers in the pixel-list will represent The X and Y positions The Red, Green, and Blue component values.

Page 59: Chapter 11: Advanced Text Techniques: Web and Information

Pictures to Listsdef pictureToList(picture): list = [] for p in getPixels(picture): list = list + [[getX(p),getY(p),getRed(p),getGreen(p),getBlue(p)]]

return list

Why the double brackets? Because we’re putting a sub-list in the list, not just adding a component as we were with sound.

Page 60: Chapter 11: Advanced Text Techniques: Web and Information

Running pictureToList>>> picture = makePicture(pickAFile())>>> piclist = pictureToList(picture)>>> print piclist[0:5][[1, 1, 168, 131, 105], [1, 2, 168, 131, 105], [1, 3, 169, 132, 106], [1, 4, 169, 132, 106], [1, 5, 170, 133, 107]]

Page 61: Chapter 11: Advanced Text Techniques: Web and Information

Can we go back again? Sure!def listToPicture(list): picture = makePicture(getMediaPath("640x480.jpg")) for p in list: if p[0] <= getWidth(picture) and p[1] <= getHeight(picture): setColor(getPixel(picture,p[0],p[1]),makeColor(p[2],p[3],p[4]))

return picture

We need to make sure that the X and Y fits within our canvas, but other than that, it’s pretty simple code.

Page 62: Chapter 11: Advanced Text Techniques: Web and Information

The numbers could have come from anywhereThe numbers in the list came from another picture, but we know that they could have come from anywhere!From multiple sounds, one for each of Red, Green, and Blue.

From random numbers.From stock market data.From solar radiation.

Page 63: Chapter 11: Advanced Text Techniques: Web and Information

All we’re doing is changing encodingsThe basic information isn’t changing at all here.

What’s changing is our encoding.Different encodings afford us different capabilities.If we go to numbers, we can use Excel.If we go to lists, we can represent structure more easily.

Page 64: Chapter 11: Advanced Text Techniques: Web and Information

Kurt GödelOne of Time magazine’s 100 greatest thinkers of the 20th century

Proved the “Incompleteness Theorem”

By mapping mathematical statements to numbers, he was able to show that there are true statements (numbers) that cannot be proven by any mathematical system. Gödel numbers

In this way, he showed that no system of logic can prove all true statements.

Page 65: Chapter 11: Advanced Text Techniques: Web and Information

Hiding Text in a PictureSteganography is hiding information in ways that can’t be easily detected.

One form of steganography is hiding text information of a picture.

Page 66: Chapter 11: Advanced Text Techniques: Web and Information

Our Algorithm for Hiding TextWe’ll draw our message in black pixels on a message picture.

We’ll hide our message in a picture of the same size.

First: Make sure that all red values are even.

Second: For every pixel where the message picture is black, add one to the red value at the corresponding x,y.

Page 67: Chapter 11: Advanced Text Techniques: Web and Information

Function to encode the messagedef encode(msgPic ,original ):# Assume msgPic and original have same dimensions# First , make all red pixels evenfor pxl in getPixels(original ):

# Using modulo operator to test oddnessif (getRed(pxl) % 2) == 1:

setRed(pxl , getRed(pxl) - 1)# Second , wherever there ’s black in msgPic# make odd the red in the corresponding original pixelfor x in range(0, getWidth(original )):

for y in range(0, getHeight(original )):msgPxl = getPixel(msgPic ,x,y)origPxl = getPixel(original ,x,y)if (distance(getColor(msgPxl),black) <

100.0):# It’s a message pixel! Make the red

value odd.setRed(origPxl , getRed(origPxl )+1)

Page 68: Chapter 11: Advanced Text Techniques: Web and Information

Doing the encoding>>> beach = makePicture(getMediaPath("beach.jpg"))>>> explore(beach)>>> msg = makePicture(getMediaPath("msg.jpg"))>>> encode(msg,beach)>>> explore(beach)>>> writePictureTo(beach,getMediaPath("beachHidden.png"))

Original Encoded

It’s really important to save the message as .PNG or .BMP, not JPEG. JPEG is lossy so pixel color values might change. PNG and BMP are lossless formats.

Page 69: Chapter 11: Advanced Text Techniques: Web and Information

Decoding: Getting the message backCreate a new “message” picture of same size as the encoded image.

For each pixel, if the red value is odd, make the pixel in the message at the same x,y black.

def decode(encodedImg): # Takes in an encoded image. Return the original message message = makeEmptyPicture(getWidth(encodedImg),getHeight(encodedImg)) for x in range(0,getWidth(encodedImg)): for y in range(0,getHeight(encodedImg)): encPxl = getPixel(encodedImg,x,y) msgPxl = getPixel(message,x,y) if (getRed(encPxl) % 2) == 1: setColor(msgPxl,black) return message