Upload
avalbane-dargan
View
34
Download
1
Embed Size (px)
DESCRIPTION
E-Mail Q&A. Telecooperation Group TU Darmstadt. Interoperability. No need to implement everything from RFCs 2045-2047 Way too much work Correctly implemented, you would out-standard most common e-mail clients Your implementation should have this functionality 7Bit encoding - PowerPoint PPT Presentation
Citation preview
Telecooperation
Technische Universität Darmstadt
Copyrighted material; for TUD student use only
E-Mail Q&A
Telecooperation GroupTU Darmstadt
2
Prof. Dr. M. MühlhäuserTelekooperation
©
Interoperability
• No need to implement everything from RFCs 2045-2047
– Way too much work
– Correctly implemented, you would out-standard most common e-mail clients
• Your implementation should have this functionality
– 7Bit encoding
– Quoted printable & Base64 encoding with all charsets Java can handle (i.e. every charsetName that does not throw an UnsupportedEncodingException)
– Multipart messages are recognized and decoded correctly
– Robustness: Do not choke on unrecognized headers
• Programs will be tested with public test cases + secret ones
– Secret test cases only use above mentioned functionality, too
3
Prof. Dr. M. MühlhäuserTelekooperation
©
Headers
• Multiline-Headers
– Line continuations start with a “folding whitespace” –may be space or tab (\t)
• Ignore every header you do not know
– If you want, you can also display additional headers like BCC – but required are only those mentioned in milestone 3.1
• Case-sensitivity
– Header names are always case-insensitive
• c.f. RFC 2822, section 1.2.2. „Characters will be specified […] by a case-insensitive literal value enclosed in quotation marks“
– Header values used in the assignment are usually case-insensitive, e.g. Content-Transfer-Encoding: Base64 and base64 are both possible
• Exceptions: multipart-boundaryall header values displayed to the user
4
Prof. Dr. M. MühlhäuserTelekooperation
©
Date
• Look into the documentation of SimpleDateFormat– no need to parse each item for yourself, even recognizes
“GMT” and “UTC” as timezones
– Modify the parser with Locale.US in order to let it parse things like “May”
• Output via DateFormat.getDateTimeInstance()
• Timezone– Setting via SimpleDateFormat or Calender#setTimeZone
is preferred to manual time manipulation
– Reason: DateFormat may be configured to display the timezone
5
Prof. Dr. M. MühlhäuserTelekooperation
©
Attachments
• Base64 encoded lines are always 76 characters wide – only exception is the last line
• If numberofchars % 4 != 0, you may just throw an exception and terminate
• Do not use javax.mail.internet.MimeUtility or similar additional libraries for decoding
• Use the Content-Disposition header to suggest a name for saving
• Attachments that are not of type text/… don’t have and don’t need a charset
– Just treat as stream of bytes/byte array
6
Prof. Dr. M. MühlhäuserTelekooperation
©
Base64-Example
• Take group of 4 charactersS W 4 g
• Decode according to RFC
– S = 0x12; W = 0x16; 4 = 0x38; g = 0x20
– Decoding may be done in groups: A-Z char – ‘A’; a-z char – ‘a’ + 26;0-9 = char – ‘0’ + 26*2; +, /, = must be treated separately
• Combine to 24 bit number, shift according to index (big endian)
– 0x12 << 18 | 0x16 << 12 | 0x38 << 6 | 0x20 << 0 0x496e20
• Shift number back in 8 bit blocks (also big endian)
– Byte 0 = 0x496e20 >> 16 & 0xff = 0x49
– Byte 1 = 0x496e20 >> 8 & 0xff = 0x6e
– Byte 2 = 0x496e20 >> 0 & 0xff = 0x20
7
Prof. Dr. M. MühlhäuserTelekooperation
©
Decoding
• Your own input stream– Elegant way of decoding Base64 and Quoted-Printable
data(you can do it differently, only a suggestion)
1. Extend java.io.InputStream2. Take character-array of undecoded data as
parameter3. Overwrite read()
– Decode the character data when– Return -1 if end of data reached
4. Let the InputStreamReader deal with the nasty problem of decoding charsets• Sample application has only 50 LoC for decoding quoted
printable, 100 LoC for Base64
8
Prof. Dr. M. MühlhäuserTelekooperation
©
Regular Expressions
• Regular expressions are a nice way for filtering out substrings
• A bit like file name patterns (*, ?), but more powerful
– Letters, Numbers remain the same
– Punctuation characters usually have a special meaning, for characters escape them by a \
• to use the character [, use \[
• Attention: you need to escape the Backslash in Java-Strings \[ == "\\["
– Alternatives: use []
• [abc] matches a or b or c
• [A-Z] matches A or B or … or Z
• Negation: [^abc] matches everything but a or b or c
– Wildcard . matches everything
– Repetition
• * means “the previous element zero or more times”
• + means “the previous element one or more times”
9
Prof. Dr. M. MühlhäuserTelekooperation
©
Regular Expressions with Java
• Part of java.util.regex
• First, compile the pattern to search:– Pattern p = Pattern.compile("charset=[^ ]*")
– The compile method has a variant that takes flags – use it for case-insensitivity: Pattern.CASE_INSENSITIVE
• Next, make a Matcher for a String out of it– Matcher m = p.match("Content-Type: text/plain;
charset=\"us-ascii\"")
• Be sure to call the Matcher’s find method– m.find()
• m.group(0) now contains everything that maches– charset="us-ascii"
10
Prof. Dr. M. MühlhäuserTelekooperation
©
Grouping
• You need the thing after “charset=“– Solution 1: parse for yourself
– Solution 2: add groups to the expression
• Groups are signified by () and counted from 1– Pattern p = Pattern.compile("charset=([^ ]*)")
• After matching, group(1) contains "\"us-ascii\")
11
Prof. Dr. M. MühlhäuserTelekooperation
©
Debugging
• Mail clients should be able to connect to the server and fetch the mail
• Always helpful: try to connect to the pop-server via telnet and issue POP commands manually– For closer examination, you may unzip the JAR-file and
have a look at “mailbox.xml”