Data Science
Jeanina (Nina) Worden
Director, Statistical Programming
Data Science2
Data Science
Agenda
Managing Expectations
Managing Data Security
Managing the People & Processes
Managing to Change
Data Science4
Data Science
Data Science
Expectations
Four key departmental goals:
Compliant system/platform
Content security
Global access
Integrated environment for all users
6
Data Science
Highlights
Life Science Analytic Framework (LSAF)
Cloud-based SAS environment
Built to be compliant with regulatory requirements
Integrated analytic tools
Integrated library of CDISC standards
7
Data Science
LSAF Dashboard
8
Data Science
Compliance
LSAF IS a compliant system because:
Two-level system access
Account locking after failed attempts
Audit log records user activities
Redundant backups maintained by SAS
Robust change controls followed by SAS
9
Data Science
Performance Testing
Processing speed (in seconds) using a 440,000 record dataset
10
Program Server LSAFSimple SET 0.7020 0.4297PROC SQL query (with MEAN) 0.6550 0.3915PROC FREQ 0.4830 0.6455PROC LOGISTIC 8.2840 4.4321
Data Science
Access Requirements
Granular access controls
Familiar SAS environment
Built in management and audit tools
SAS maintenance relieves in-house IT needs
11
Data Science12
Data Science
Data Science
Controlling Access
Santen controls access by controlling account settings:
Creating user accounts appropriate to their function or task to be performed
Controlling access to the files contained on the system
Data Science
User Access
Santen has differing types of users based on their roles:
Programmers/Analysts
Data managers
Data “users” (i.e. executives, scientists)
Data Science
User Access
Different types of users based on their employment:
FTEs
Contractors located at the Santen office
CRO partners
16
Data Science
Group Accounts
Defined needs for each type of user
User “groups” provide consistency
Exceptions applied individually
17
FTEs Only
Programming Group
Privileges
Data Science
Access Privileges
Can the user download out of LSAF?
Will the user be allowed to change the Data Standards?
Should the user be able to lock/unlock accounts?
Data Science
Content AccessContent differences also exist
Types of content includes:
Ongoing study files
Archived study files
Ad hoc/testing data
Data Science
Content AccessContent access is not automatic
Access can be controlled at different levels:
Study level
Subfolder level
File level
Data Science
Study Level
21
Data Science
Subfolder/File Level
22
Data Science
Access history
Account creation and access modifications are logged
Audit history used as documentation for regulatory agencies
Data Science24
Data Science
Data Science
Going GlobalKey considerations in building the global
team
Efficient resourcing
Effective communication
Standard processes
26
Data Science
Where We Were
27
Data Science
Global ResourcesSingle work environment
Increase our team in a cost efficient manner
Full utilization of current resources
Capitalize on lower cost market
28
US-based programmers
Japan-based programmers
New programmers
Data Science
Where We Are Now
296 8 18!
Data Science
Global Resources
Increase productivity by capitalizing on the differing time zones
30
US
JapanChina
Data Science
Global Communication
Communication is the primary challenge
Study communication:
Single-source “living” documents
Task status
31
Data Science
Global Communication
Common study documents
32
Data Science
Monitoring Progress
33
Data Science
Global Libraries
Single environment = consistency
Global folders or libraries:
Best practice/guidance documents
Data standard implementation guides
Training templates
34
Data Science
Global Libraries
35
Data Science
Global Standards
Standards = efficiency
CDISC mapping documents
Standard programs
Dataset programs
Utility programs
36
Data Science
Global Libraries
37
Data Science38
Data Science
Data Science
Change
Change is dependent on the starting point
Changes for Santen
UNIX environment
Jobs (Batch execution)
Working with files between areas
Storage management
40
Data Science
Change - Jobs
41
Data Science
Change - Jobs
42
Dependencies are accounted for and are access based, not user based
Data Science
Change – Working with files
Check-in - Adding files
Checkout/Moving
Versioning
Tracks changes
Ability to revert to previous
43
Data Science
Change – Working with Files
44
Copy = no change
Check-in/Checkout = changes allowed
Data Science
Change - Versioning
45
Guidance needed for when to version, “type” definitions and utilizing the comment field
Data Science
Change – Versioning (Access)
46
Note: You cannot compare versions
Data Science
Change - Storage Management
Storage understanding and management is key
Storage allocation between development and production areas
Each version stored
47
Data Science
Change - Storage ManagementDo’s
Check-out only what is needed
Delete unneeded files often
Do Not’s
Create “archived” folders
Duplicate files in multiple places
SAS can provide reports with offenders!
48
Data Science
Future Change
Value of data
49
0
20
40
60
80
100
1 2 3 4 5 6 7 8 9
Study
Repository
Data Science
Future Change
Potential to build therapeutic data repository
Add value to data beyond submission usage
Cluster analysis for marketing targets
Preferred site locations
Defining better studies
50
Data Science
Summary
Account options provide wide access and granular security
The centralized work environment allows increased, efficient use of global resources
51
Data Science
SummaryChanges do occur but can be managed
and have the potential to add value
52
Data Science
Questions?
53
Data Science
Thank you to SAS for giving me the opportunity to present this topic!
Data Science
Thank You!
Jeanina (Nina) WordenDirector, Statistical Programming6400 Hollis Emeryville, CA [email protected]
55