21
RSCTC 2008 © 2008 ZL Technologies, Inc. Email Archiving Arvind Srinivasan Gaurav Baone

Email Archiving

  • Upload
    verne

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Email Archiving. Arvind Srinivasan Gaurav Baone. Imagine this is what happens to your business records at the end of every month …. SEC 17a-4. FDA 21 CFR 11. NASD 3010, 3110. DoD 5015.2. HIPAA. Sarbanes-Oxley. If this looks absurd …. That’s exactly what we do to email!. - PowerPoint PPT Presentation

Citation preview

Page 1: Email Archiving

RSCTC 2008 © 2008 ZL Technologies, Inc.

Email ArchivingArvind SrinivasanGaurav Baone

Page 2: Email Archiving

RSCTC 2008 15181

Imagine this is what happens

to your business records

at the end of every month ….

Page 3: Email Archiving

RSCTC 2008 15181

If this looks absurd …

That’s exactly what we do to email!

Regulators now treat email like hard copy records

Practically every major transaction, project, and contract, is recorded in email

SEC 17a-4NASD 3010, 3110

HIPAA

FDA 21 CFR 11DoD 5015.2

Sarbanes-Oxley

Non-compliance fines and legal liabilities are rising . . .

ZipLip, Inc.

And the courts agree (FRCP, Dec 2006)

Page 4: Email Archiving

RSCTC 2008 15181

Just How Much Scalability Does Archiving Require?

7 Years Retention

4.47 Billion Emails For Archive System To Index & Search

4.28 Billion Web-Pages Indexed by

source: Google Press Release, Feb 17, 2004

versus

25,000 Employees averaging 70 mails/day

Assume:

Functionality needs to scale to these volumes

Page 5: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

OutlineEmail Capture MethodsBusiness DriversArchive FunctionalityRetention & DeletionSurveillance & ComplianceE DiscoveryConclusion

Page 6: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

Email Capture Methods

Active Capture Methods – PRO-ACTIVE Archiving– Journaling– Mailbox crawling– SMTP Gateway Capture

Historical Capture Methods – REACTIVE Archiving– Restore from backup tapes– Crawl for PST / NSF files from desktops– Forensic captures

Page 7: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

Journaling – 100% Capture

Page 8: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

Mailbox Crawling – Policy Based

Page 9: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

Reactive Archiving

Page 10: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

Not Just Email

Page 11: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

Primary Business Drivers - Regulations and Laws

Investment Advisors Act

Canada PIPEDA

Gramm-Leach-Bliley Act NASD 3010

NASD 3011

HIPAA

SEC 17a-4

Sarbanes-Oxley Act

CA SB1386

Mutual Funds Rule 38a-1

Hedge Funds Rule 203(b)

UK Freedom of Information Act

US Freedom of Information Act

Japan Personal Information Protection Act

Florida Sunshine Law

Basel II

FRCPDoD5015.2

Page 12: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

Functional Requirements

Retention Surveillance and Compliance e Discovery

Common Theme - Classification

Page 13: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

Real-time Categorization of Mail Sender/Recipients Content (Subject, body,

attachment) User Input (Which folder it was

found, Manual Tagging)

Retention & Deletion

Conflicting Requirements: Laws & Regulation => Retain for “x” years.

Vs Company Liability/Risk and Cost

Retention Periods and Policies

Regulation Type of Record

Retention Period

Age Discrimination in Employment

Act

Hiring Documents

One year from date of decision

Fair Labor Standards

Payroll ,sales and Personal

Records

Three Years

Rehabilitation Act

Handicap discrimination

Records

Three Years

Civil Rights Act Records One Year

Occupational Safety and Health Act

Health Records

30 Years

Page 14: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

Retention & Deletion (cont’d)

"a priori" and "a posteriori“ based Retention. Event Driven – Deletion of mail from user folder,

Reclassification of mail by end user Legal Hold – Court Orders to retain evidence relating to

certain subject matters. Single Instance Storage Same Email in Multiple Mailboxes Same Attachment in Multiple Emails Significant storage savings.

Page 15: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

Surveillance

Conflicting Requirements: Regulation require review of documents

Vs Effort spent into reviewing the documents.

Examples of Compliance Categories

Category Content Action

Adult Offensive language

Post-Review

Confidential SSN Numbers, Bank Account

numbers

Pre-Review to prevent

confidential information from going

out

Legal Issues Words like attorney, charge*.

Phrases like breach* and

agreement within 6 words

Post/Pre Review

Compliance Hype

Stocks and sell between 3 words

of each other

Pre-Review in Financial Industries

Real-time Flagging of Mail Lexical Based – Key words, word

associations, wild-cards Policy Based – Eg. Mail from

WallStreetJournal.com is newsletter.

Custom Code – Detect Vacation Response, Read Receipts, DSN’s

Page 16: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

Surveillance(cont’d)

Real-time Flagging is a categorization problem Current Systems suffer from lot of false positive. Transparent and Deterministic rules preferred over

Blackboxes. Disclaimers (Internal and External) tend to get flagged

as it contains the very terms that we try to flag. Use Reviewer feedback to adapt the rules.

Page 17: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

E-Discovery

Conflicting Requirements: Produce electronic docs. to satisfy court-orders

Vs. Providing insufficient, not relevant, privileged

Information

Search Type Court-dictated Required Search

Full text "acidosis"

Boolean "cardiac" OR "respiratory"

Phrase "in-custody death"

Proximity "pre-existing" within 10 words of "condition"

Wildcard "epilep*"

Wildcard proximity

"mental*" within 5 words of "condition"

Dual wildcard proximity

"continu*" within 10 words of "discharg*"

Wildcard sentence-level

"caus*" within same sentence as "death"

┼ Source: Williams v. Taser Int’l, Inc., 2007 WL 1630875 (N.D. Ga. June 4, 2007)

Discovery Request Certain number of custodians Date Range Pertaining to certain subject

matter; usually described by a set of Search terms.

Page 18: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

E-Discovery(cont’d)

Landmark case Zubulake vs. UBS Warburg (2003) Primarily driven by Federal Rules of Civil Procedure

(FRCP) established in 2006. Litigants are entitled to obtain electronic information

from the adverse party. Voluntary Initial Disclosures need to be made pertaining

to each litigant Today, almost all cases have some sort of electronic

documents as evidence.

Page 19: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

E-Discovery(cont’d)

Parties face Sanctions if they do not provide all the relevant documents. (Numerous precedence, eg. Metrokane vs Built NY 2008). Validation occurs when receiving party can prove existence of other document through hard-copy printout or other means.

Lawyers from both parties routinely negotiate keywords to define Search Concepts

Manual Review of Documents for Relevance and Privilege. Numerous product cluster similar documents (near deduplication) to present similar documents to reviewers to improve efficiency.

Chain of Custody – To prove that the document has not be tampered or altered.

Page 20: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

Palin’s e-mail at $15m per request

NBC's price quote for e-mails sent to Todd Palin: $15 million.

AP's price quote for e-mails between state employees and the campaign headquarters of Sen. John McCain: $15 million.

AP's price quote for e-mails between state employees and the National Park Service: $15 million.

Cost to retrieve e-mail for 1 mailbox

6 Hours to assemble email for 1 employee mailbox

2 Hours for “security” checks

5 Hours to filter by requested keyword or topic

13 Total hours per mailbox

$73.87 Hourly rate

$960.31 Cost to retrieve e-mail for1 mailboxCost to retrieve e-mail for all

employees$960.31 Cost to retrieve email for 1

mailbox

16,000 Full-time employees

$15.3 million

Cost to retrieve e-mail for all employees

Page 21: Email Archiving

ZL Technologies, Inc.

CONFIDENTIALCONFIDENTIAL

ZLTI Unified Archival

RSCTC 2008

Conclusion Most challenges in archiving can be reduced to Classification problem. Segmentation Problems: Detect internal and external disclaimers Detect change in Email behavior through email profile analysis Understanding mails: Need to develop Analysis techniques to

understand the contents Visualization and Grouping Similar mails – Control the order in which

mails and documents are viewed. Consistent way of defining Subject Matters – Beyond just a set of

keywords. Extract more meta data about attachments such as images, audio and

video files. And all the above are required in muliple languages – English,

Japanese, Spanish, Chinese, and others.