10
SIGCSE 2008 It Sounded Like a Good Idea at the Time… Manipulated by Strings Margaret Menzin Simmons College

SIGCSE 2008 It Sounded Like a Good Idea at the Time… Manipulated by Strings

  • Upload
    norina

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

SIGCSE 2008 It Sounded Like a Good Idea at the Time… Manipulated by Strings. Margaret Menzin Simmons College. A Data Structures Course The assignment:. Read a file of names like President George Washington Identify the titles from a list and strip them - PowerPoint PPT Presentation

Citation preview

Page 1: SIGCSE 2008 It Sounded Like a Good Idea at the Time… Manipulated by Strings

SIGCSE 2008It Sounded Like a Good Idea at the Time…

Manipulated by Strings

Margaret Menzin

Simmons College

Page 2: SIGCSE 2008 It Sounded Like a Good Idea at the Time… Manipulated by Strings

A Data Structures CourseThe assignment:

Read a file of names like President George Washington

Identify the titles from a list and strip them

Isolate the last name and invert it to Washington, George

Alphabetize the list

Page 3: SIGCSE 2008 It Sounded Like a Good Idea at the Time… Manipulated by Strings

Known issues for students to handle:

Equivalence of upper and lower cases for purposes of alphabetization

Generating a list of titles Matching from the list Isolating the last name by looking backwards

from the end of the name for the last blank Usual file handling Use of a simple sort

Page 4: SIGCSE 2008 It Sounded Like a Good Idea at the Time… Manipulated by Strings

Some surprises:

Some titles are at the beginning, but also some are at the end

Titles must be stripped recursively:Hon. Father Robert F. Drinan, S.J., L.L.D.Rev. Dr. Martin Luther King, jr.Augusta Ada Byron King, Lady LovelaceMajor General Stanley

Some titles occur in the middle Bernard Cardinal Law

Some of these titles can also be first, middle and last names – a problem which is exacerbated when we add other languages

Jr, II, etc. must be handled

Page 5: SIGCSE 2008 It Sounded Like a Good Idea at the Time… Manipulated by Strings

More surprises:

In alphabetizing apostrophes and hyphens are ignored ( O’Reilly and OReilly are equivalent)

We need to worry about alphabetical order using other alphabets

Alphabetize using first the Latin alphabet and then other alphabets in the order of their names in English (Cyrillic before Greek)

Page 6: SIGCSE 2008 It Sounded Like a Good Idea at the Time… Manipulated by Strings

Simplification:

Ignore titles in the middle Use an abbreviated list of titles Ignore other alphabets

Page 7: SIGCSE 2008 It Sounded Like a Good Idea at the Time… Manipulated by Strings

Still more surprises – where does the last name of these people begin: Leonardo da Vinci Catherine de Medici Ponce de Leon Vasco da Gama Jean de la Fontaine Gabriel Garcia Marquez Vicente Fox Quesada Wernher von Braun Elizabeth Alexandra May Windsor Thomas a Beckett Mao Tse-tung (Mao Zedong)

Page 8: SIGCSE 2008 It Sounded Like a Good Idea at the Time… Manipulated by Strings

The answers Leonardo da Vinci Catherine de Medici Juan Ponce de Leon Vasco da Gama Jean de La Fontaine Gabriel Garcia Marquez Vicente Fox Quesada Wernher von Braun Elizabeth (Alexandra May Windsor) II Thomas (a) Beckett Mao Tse-tung

Page 9: SIGCSE 2008 It Sounded Like a Good Idea at the Time… Manipulated by Strings

The solution Use the alphabetization standards of the

American Library Association According to the A.L.A. you alphabetize using the

rules of the language the person wrote/spoke in There are special rules for monarchs and saints

– they are alphabetized by first name Note: The A.L.A. keeps the name as

first_name last_name and has another field to specify the character where the last name begins!

Page 10: SIGCSE 2008 It Sounded Like a Good Idea at the Time… Manipulated by Strings

Conclusion

Internationalization is much harder than it looks!

p.s. The British use different rules for alphabetization than the U.S. does; surely other countries use other rules.