Upload
mohamed-woollen
View
218
Download
1
Tags:
Embed Size (px)
Citation preview
Security Methods for Security Methods for Statistical DatabasesStatistical Databases
by Karen Goodwinby Karen Goodwin
IntroductionIntroduction
Statistical Databases containing medical Statistical Databases containing medical information are often used for researchinformation are often used for research
Some of the data is protected by laws to Some of the data is protected by laws to help protect the privacy of the patienthelp protect the privacy of the patient
Proper security precautions must be Proper security precautions must be implemented to comply with laws and implemented to comply with laws and respect the sensitivity of the datarespect the sensitivity of the data
Accuracy vs. ConfidentialityAccuracy vs. Confidentiality
Accuracy –Accuracy –
Researchers want to Researchers want to extract accurate and extract accurate and meaningful datameaningful data
Confidentiality – Confidentiality –
Patients, laws and Patients, laws and database database administrators want to administrators want to maintain the privacy maintain the privacy of patients and the of patients and the confidentiality of their confidentiality of their informationinformation
LawsLaws
Health Insurance Portability and Accountability Act – Health Insurance Portability and Accountability Act – HIPAA (Privacy Rule)HIPAA (Privacy Rule)
Covered organizations must comply by April 14, 2003Covered organizations must comply by April 14, 2003 Designed to improve efficiency of healthcare system by using Designed to improve efficiency of healthcare system by using
electronic exchange of data and maintaining security electronic exchange of data and maintaining security Covered entitiesCovered entities (health plans, healthcare clearinghouses, (health plans, healthcare clearinghouses,
healthcare providers) may not use or disclose protected healthcare providers) may not use or disclose protected information except as permitted or requiredinformation except as permitted or required
Privacy RulePrivacy Rule establishes a “minimum necessary standard” for establishes a “minimum necessary standard” for the purpose of making covered entities evaluate their current the purpose of making covered entities evaluate their current regulations and security precautionsregulations and security precautions
HIPAA ComplianceHIPAA Compliance
Companies offer 3Companies offer 3rdrd Party Certification of Party Certification of covered entitiescovered entities
Such companies will check your company Such companies will check your company and associating companies for compliance and associating companies for compliance with HIPAAwith HIPAA
Can help with rapid implementation and Can help with rapid implementation and compliance to HIPAA regulationscompliance to HIPAA regulations
Types of Statistical DatabasesTypes of Statistical Databases
StaticStatic – a static – a static database is made database is made once and never once and never changeschanges
Example: U.S. CensusExample: U.S. Census
DynamicDynamic – changes – changes continuously to reflect continuously to reflect real-time datareal-time data
Example: most online Example: most online research databasesresearch databases
Security MethodsSecurity Methods Access RestrictionAccess Restriction Query Set RestrictionQuery Set Restriction MicroaggregationMicroaggregation Data PerturbationData Perturbation Output PerturbationOutput Perturbation AuditingAuditing Random SamplingRandom Sampling
Access RestrictionAccess Restriction
Databases normally have different access levels Databases normally have different access levels for different types of usersfor different types of users
User ID and passwords are the most common User ID and passwords are the most common methods for restricting accessmethods for restricting access In a medical database:In a medical database:
Doctors/Healthcare Representative – full access to Doctors/Healthcare Representative – full access to informationinformation
Researchers – only access to partial information Researchers – only access to partial information (e.g. aggregate information)(e.g. aggregate information)
Query Set RestrictionQuery Set Restriction A query-set size control can limit the A query-set size control can limit the
number of records that must be in the number of records that must be in the result setresult set
Allows the query results to be displayed Allows the query results to be displayed only if the size of the query set satisfies only if the size of the query set satisfies the conditionthe condition
Setting a minimum query-set size can help Setting a minimum query-set size can help protect against the disclosure of individual protect against the disclosure of individual datadata
Query Set RestrictionQuery Set Restriction
Let K represents the minimum number or Let K represents the minimum number or records to be present for the query setrecords to be present for the query set
Let R represents the size of the query setLet R represents the size of the query set The query set can only be displayed ifThe query set can only be displayed if
K K R R
Query Set RestrictionQuery Set Restriction
Query 1
Query 1Results
Query 2Results
Query 2
K KQuery
Results
QueryResults
OriginalDatabase
MicroaggregationMicroaggregation
Raw (individual) data is grouped into small Raw (individual) data is grouped into small aggregates before publicationaggregates before publication
The average value of the group replaces each The average value of the group replaces each value of the individualvalue of the individual
Data with the most similarities are grouped Data with the most similarities are grouped together to maintain data accuracytogether to maintain data accuracy
Helps to prevent disclosure of individual dataHelps to prevent disclosure of individual data
MicroaggregationMicroaggregation
National Agricultural Statistics Service (NASS) National Agricultural Statistics Service (NASS) publishes data about farmspublishes data about farms
To protect against data disclosure, data is only To protect against data disclosure, data is only released at the county levelreleased at the county level
Farms in each county are averaged together to Farms in each county are averaged together to maintain as much purity, yet still protect against maintain as much purity, yet still protect against disclosuredisclosure
MicroaggregationMicroaggregation
10
12
13
11.67
11.67
11.67
Average
Age MicroaggregatedAge
57
54
59
56.67
56.67
56.67
Average
MicroaggregationMicroaggregation
Averaged
User
Que
ry
Res
ults
MicroaggregatedData
OriginalData
Data PerturbationData Perturbation
Perturbed data is raw data with noise Perturbed data is raw data with noise addedadded
ProPro: With perturbed databases, if : With perturbed databases, if unauthorized data is accessed, the true unauthorized data is accessed, the true value is not disclosed value is not disclosed
ConCon: Data perturbation runs the risk of : Data perturbation runs the risk of presenting biased datapresenting biased data
Data PerturbationData Perturbation
Noise Added
User 2
Query
Results
OriginalDatabase
PerturbedDatabase
User 1
Que
ry
Res
ults
Output PerturbationOutput Perturbation
Instead of the raw data being transformed Instead of the raw data being transformed as in Data Perturbation, only the output or as in Data Perturbation, only the output or query results are perturbedquery results are perturbed
The bias problem is less severe than with The bias problem is less severe than with data perturbationdata perturbation
Noise Addedto Results
User 2
Query
Results
OriginalDatabase
User 1
Query
Results
Output PerturbationOutput Perturbation
Query
Query Results
Results
AuditingAuditing
Auditing is the process of keeping track of Auditing is the process of keeping track of all queries made by each userall queries made by each user
Usually done with up-to-date logsUsually done with up-to-date logs Each time a user issues a query, the log is Each time a user issues a query, the log is
checked to see if the user is querying the checked to see if the user is querying the database maliciouslydatabase maliciously
Random SamplingRandom Sampling
Only a sample of the records meeting the Only a sample of the records meeting the requirements of the query are shownrequirements of the query are shown
Must maintain consistency by giving exact Must maintain consistency by giving exact same results to the same querysame results to the same query
WeaknessWeakness - Logical equivalent queries - Logical equivalent queries can result in a different query setcan result in a different query set
Comparison MethodsComparison Methods
SecuritySecurity – – possibility of exact disclosure, partial possibility of exact disclosure, partial disclosure, robustnessdisclosure, robustness
Richness of InformationRichness of Information – – amount of non-amount of non-confidential information eliminated, bias, confidential information eliminated, bias, precision, consistencyprecision, consistency
CostsCosts – – initial implementation cost, processing initial implementation cost, processing overhead per query, user educationoverhead per query, user education
The following criteria are used to determine the most effective methods of statistical database security:
A Comparison of MethodsA Comparison of Methods
MethodMethod SecuritySecurity Richness of Richness of InformationInformation
CostsCosts
Query-set RestrictionQuery-set Restriction LowLow LowLow11 LowLow
MicroaggregationMicroaggregation ModerateModerate ModerateModerate ModerateModerate
Data PerturbationData Perturbation HighHigh High-ModerateHigh-Moderate LowLow
Output PerturbationOutput Perturbation ModerateModerate Moderate-lowModerate-low LowLow
AuditingAuditing Moderate-LowModerate-Low ModerateModerate HighHigh
SamplingSampling ModerateModerate Moderate-LowModerate-Low ModerateModerate
1 Quality is low because a lot of information can be eliminated if the query does not meet the requirements
SourcesSources
This presentation is posted onThis presentation is posted on
http://www.cs.jmu.edu/users/aboutams Adam, Nabil R. ; Wortmann, John C.; Adam, Nabil R. ; Wortmann, John C.; Security-Control Security-Control
Methods for Statistical Databases: A Comparative Study; Methods for Statistical Databases: A Comparative Study; ACM Computing Surveys, Vol. 21, No. 4, December ACM Computing Surveys, Vol. 21, No. 4, December 1989 1989 (
http://delivery.acm.org/10.1145/80000/76895/p515-adam.pdf?key1=76895&key2=1947043301&coll=portal&dl=ACM&CFID=4702747&CFTOKEN=83773110)
Official HIPAA –Official HIPAA – (http://cms.hhs.gov/hipaa/) incur
Bernstein, Stephen W.; Bernstein, Stephen W.; Impact of HIPAA on Impact of HIPAA on BioTech/Pharma Research: Rules of the RoadBioTech/Pharma Research: Rules of the Road (
http://www.privacyassociation.org/docs/3-02bernstein.pdf)
Service Bureau; Service Bureau; 3rd Party Testing3rd Party Testing (http://hipaatesting.com/service_bureau.html)