Upload
vankhanh
View
217
Download
1
Embed Size (px)
Citation preview
11/16/2017
1
© 2017 COMMVAULT SYSTEMS, INC. ALL RIGHTS RESERVED.
• Patrick McGrath – Director Solutions Marketing – Archive, Search, Analytics
Data Protection and Information Governance Across Data Silos
What are these?
11/16/2017
2
Creating new value from old content
4
UC Berkeley Prosopography Services
Common info themes with many organizations?
• Information expensive to manage, conserve/preserve and use
• Most information is dark or inaccessible, constrained by medium, volume or steward
• Understanding of information constrained by issues such as language, context and
timeliness
• Information managed, governed and stored centrally within the organization
• Decisions influenced by information from within the organization
5However, times have changed
Goldmine or Minefield?
11/16/2017
3
Everyone approaches information from a different angle
Records, Archivists, Librarians“I wont accept this without my 193 fields of metadata!”
6
Legal“What do you mean you cant find it?”
Information Technology“I don’t care. Stop using so much storage!”
Content Creators, Researchers“Get out of my way – I have work to do!”
•Data Sprawl
11/16/2017
4
Digital Transformation and the Internet of ThingsForces fueling the move to the cloud
8
Explosion of Digital Data
…of all digital data ever created was created in the last 2 years.
Source: Sintef ITC
90%
Connected Devices Skyrocketing
Source: Gartner
…things could be connected to the Internet by 2020, about 5.5 million devices added every day.
20.8BILLION
Cloud Usage on the Rise
…of organizations are managing some of their data in cloud infrastructure.
Source: IDC
70%
Customers
Vendor/Partners
Employees
Products
Intellectual Property
Contracts
Providers, SLA’s
Applications• ERP• CRM• ECM• EDW• Archive• Backups
Devices (e.g. Laptops, IoT)
Locations/Jurisdictions
In different parts of the Organization
Data subject areas
Spread across…Containing sensitive information
Data sprawl
How to manage visibility and consistency of data handling?
11/16/2017
5
Heart of the problem
10
“… while the average large UK business now uses 24 systems to manage and store personal data, 1 in 5 use over 40 systems to do so.”
- Nick Ismail, Information Age (citing 2017 OnePoll Survey)
“There is one application for every 5-10 employees generating copies of the same files leading to massive amounts of duplicate, idle data…”
- Michael Vizard, ITBusinessEdge.com
Copy
Replicate
MailboxArchive
MailboxBackup
Data copies and silos
MailServer
Files
File Analytics
ComplianceArchive Mailbox
Archive
MultipleBackups
ComplianceCopy
OutlookPSTs
ComplianceReplica
ArchiveBackup
MultipleBackups
ArchiveBackup
Datacentre File Servers
File Archive
EndpointBackup
ServerBackup
ServerBackup
Personal Cloud & Devices
Dept. FileServers
Remote FileServers
Salesforce
End User
11/16/2017
6
Complexity hinders compliance and increases risk
LEGACY SYSTEMSDATA CENTERS CLOUD DATA SaaS
PAIN: LACK OF CONTROL AND ANALYSIS• Archive and Search systems create silos• Lack common search and collate• Multiple access controls to manage• Gaps in coverage present risk• Drives demand for more ‘data lakes’
projects
PAIN: VISIBILITY OF EXTERNAL DATA• Data held externally is difficult to track• Protection managed by 3rd party• Limited ability to archive or manage
retention• Risk of data on unsanctioned Clouds• Mobile and Shadow IT
PAIN: BACKUP AND RECOVERY RISKS• Too many siloed solutions & repositories• Impossible to set common policies• Reporting is a challenge• Variable controls for access & audit• Complexity leads to gaps in coverage
? ? ? ?
x?
Silo
Silo
Change drivers
Ransomware Data Privacy / GDPR
• Hack leads to data encryption, loss or copying
• Unless price paid, could lead to
• Halt of business operations for critical data
• Publication of sensitive data
• Could also lead to notifiable loss incident
• EU personal data privacy
• Serious consequences (€£₽$)
• Focus on EU resident personal data
• Global companies also liable
• Process and technology change for many
• Consent, requests, breach notification, etc.
13
• Customer demands
• External competition
• Workforce competition
• Compliance and security
Key Takeaways: Know your critical and sensitive data.
Get rid of it if you don’t need it!
11/16/2017
7
Where is this highly controlled data?
14AIIM Report – Understanding GDPR in 2017
14
15AIIM Report – Understanding GDPR in 2017
Control over the controlled data impacted by…
11/16/2017
8
2003 20162003 2005 2007 2009 2011 2013 2015
California SB1386 EnactedJul 2003
UCB Grad Division IncidentMar 2005
UCB J-School IncidentAug 2009
UCB E&I IncidentMay 2015
UCB RSSP IncidentMay 2006
UCB UHS IncidentMay 2009
UCB Cap Proj / Real Estate IncidentAug 2014
UCB BFS IncidentFeb 2016
UC Berkeley data breach incidents
16
Breach Detected
Declare IncidentAssessment
PlanningExecution
RemediationRemediation
Data Breach Response Phases
GDPR• 72 Hours to notify authorities
• “Without undue delay” to notify victims
• YOU are responsible for the data handling of your providers
11/16/2017
9
•So, how to deal with landmines and goldmines?
Data Platform
1. Ingest• Unstructured (Files, Social) • Structured Data (DB)• Metadata & Usage Info• Dedupe
• IoT• Big Data
• Backup/Archive (Store) – OR• In Place Indexing (No-Store)
3. Govern• Access Rules• Wipe/Erasure • Encryption• Movement• Synchronization• Retention/Disposition
• Monitor• Alerts/Notifications• Process Initiation/
Automation
5. Use
Big Data, Analytics, 360 Dashboards & Reporting, BusinessIntelligence
Collaboration
eDiscoverySearchResearchInvestigationsCase
2. Understand• Contents• Usage• Meaning and Context• Data Profiling/Entity Extraction• Recommendations
Primary Data Source
MailboxesData
Center Cloud EndpointsApps &
DatabasesIoT &
External
Applications & Ecosystem
6. Extend4. Recover
Operational Recovery
DR/HotsiteDev/Test
11/16/2017
10
File system data source example – Storage optimization
Duplicates
Orphaned files
Sensitive data
Sensitive Data Detection
21
11/16/2017
11
Data Analytics Applications
• Architecture to enable content-aware applications
• User profile based applications
• Fine tuned for the specific knowledge and use case for a desired outcome
• Core capabilities
• Data indexing
• Data detection
• Visualizations / Reports
• Workflow
• Data policy automation
• API access
• Audit trails
Data collection, indexing, analytics visualization and action!
Data Index
Content Index, Federation,
Virtualize, Enrich
Virtual Repository
Infrastructure
Traditional Mixed & Converged Software Defined Cloud
SAN
Live Data
Files AD
Apps SAAS
Ingest
Data Services
Profile-Based Applications
Stored Data
Inge
st Files Edge
Search & Analytics UnlocksSensitive Data Management
DISCOVER: Discover risk data, across file, endpoint, email and structured data, and present for risk evaluation and action taking, removal or retention by defined policy.
Map Enterprise Information
Data Center Cleanup
Automate Classification &
Retention
Optimize Accessibility
Accelerate Breach Notification
Planning
Automate Storage Tiering &
Disposition
Encrypt & Protect End User Computer
Data
Optimize Business Continuity
Demonstrate Compliance
Detect Ransomware
Identify Anomalous Access
Monitor for Personal Data in
Unauthorized Locations
Simplify Response to Access,
Rectification and Erasure Requests
PROTECT: Minimize use of risk data and protect from loss, breach or damage.
MANAGE: Ensure risk data is always managed to standards with ongoing risk assessments.
MANAGE
PROTECT
DISCOVER
11/16/2017
12
•Getting to the right information quickly• Search and machine learning
Compliance search – Commvault and LucidworksBeta Coming Soon!
AI intelligence with ease of use
11/16/2017
13
Prioritize and reduce costs with Machine Learning
26
Do it at scale
>5M documents/hour
Increase relevancy
Find and review what matters
Lower costs
80-90% reduction
Integrated AI
Powerful but easy to use
Leveraging the AI ecosystem
27
Define Review Set to Analyze Brainspace Pulls Information from the Review Set
Investigation starts by defining the search parameters of a Review Set
The plugin streams data from the Review Set into Brainspace using the templated
field map
Initiate Investigation
Create a Collection in Brainspace
Overlay Full Report
Execute Visual
Analytics
Full report data is synced into Commvault
when build is complete
Brainspace receives streamed text and
metadata to create a Cluster Wheel, perform Comm
Analysis, and display a dashboard
Analytics and SyncArchive Process BuildAnalyze
User can perform actions on the
Review Set based on the synchronized Brainspace tags
Collection Sync to
Review Set
User creates a collection in Brainspace using Visual
Analytics
11/16/2017
14
Integration Steps
First step: Define Review Set to Analyze
Second Step: Transfer Review Set to Brainspace
Third Step: Perform Brainspace Analytics
Fourth Step: Create a Collection from your Analytics Result and Sync back to Commvault
Fifth Step: Take Action in Review Set from Brainspace Tag
Second Step: Transfer Review Set to Brainspace
29
Pick Review Set
Build Process
Ingestion Fields Preconfigured
11/16/2017
15
Third Step: Perform Brainspace Analytics
• Analytics Dashboard
• Transparent Concept Search
• Cluster Wheel
• Conversation Analysis
• Communication Analysis
• Advanced Document Classification ‐ Predictive Coding
• Advanced Document Classification ‐ Continuous Multi‐Modal Learning(CMML)
Analytics Dashboard
The overview dashboard is completely interactive and provides insight at a glance for the entire dataset, including:
o Duplicates & Near‐Duplicateso Timelineo Faceted Listso Concept Searcho Document Results
11/16/2017
16
Transparent Concept Search
Brainspace’s next‐generation Transparent Concept Search provides the advantages of concept searching without the traditional drawbacks.
Transparent Concept Search significantly reduces the time and expense resulting from over‐inclusive document retrieval by allowing users interact with the concept expansion to boost or eliminate concepts.No black box.
Cluster Wheel
The Brainspace Cluster Wheel showcases our dynamic learning by organizing all documents into conceptually similar clusters.
The wheel is animated and interactive, showing neighborly populations of documents, making early assessment intuitive even for extremely large datasets.
11/16/2017
17
Conversation Analysis
Using Conversations allows you to visualize email activities within a dataset.
Users can track the flow of information throughout an organization by exploring what emails have been sent to who and determine what email domains have been most accessed.
Communication Analysis
Who said what to whom?Brainspace’s communication analysis view adapts to any active query and provides interactive exploration of email conversations, including:
o Interactive Social Grapho To, CC and BCC filteringo Sender/Recipient Volumeo Top relationshipso Top Termso Alias Consolidation
11/16/2017
18
Advanced Document ClassificationPredictive Coding
Brainspace’s Predictive Coding uses our patented machine learning technology together with Logistic Regression and Active Learning to help you review less and decrease your associated costs.
Brainspace gives you more control by allowing you to set your target recall at the beginning and allow you to adjust it by providing feedback on depth for recall performance throughout the process.
Advanced Document ClassificationContinuous Multi-Modal Learning(CMML)
The Continuous Multi‐Modal Learning, or CMML, workflow can be carried out entirely in Brainspace, and integrates supervised learning with Brainspace’stagging system.
CMML focuses on finding target documents during training, rather than on producing a predictive model to identify documents for later review.
11/16/2017
19
Fourth Step: Create a Notebook from the Analytics Result and Sync Back to Commvault
Create Notebook
Select tag
Sync
Fifth Step: Take Action in Review Set from BrainspaceTag
11/16/2017
20
Conclusion
Given explosive growth of• Data volumes• Number of silos• Security threats• Compliance requirements
Considerations• Develop Information Governance as a core capability• Align data protection to the needs of the data and the business• Increase data intelligence• Drive data visibility across silos• Automate data policy
•Questions? Discussion
11/16/2017
21
PROTECT. ACCESS. COMPLY. SHARE.COMMVAULT.COM | 888.746.3849 | [email protected]© 2017 COMMVAULT SYSTEMS, INC. ALL RIGHTS RESERVED.
Thank you.
Patrick McGrathDirector, Solutions Marketing, Content
@patrickiest