Upload
tommy96
View
729
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
STEGANOGRAPHY: STEGANOGRAPHY: Data Mining:Data Mining:STEGANOGRAPHY: STEGANOGRAPHY: Data Mining:Data Mining:
SOUNDARARAJAN EZEKIEL
Department of Computer Science
Indiana University of Pennsylvania
Indiana, PA 15705
SOUNDARARAJAN EZEKIEL
Department of Computer Science
Indiana University of Pennsylvania
Indiana, PA 15705
Steganography Cryptography Data MiningSteganography Cryptography Data MiningSteganography Cryptography Data MiningSteganography Cryptography Data Mining
Art of hiding Art of hiding information information in ways that in ways that prevent the prevent the detection of detection of hidden hidden messagemessage
Existence is Existence is not knownot know
Science of Science of writing in writing in secret code secret code
It encodes a It encodes a message so it message so it cannot be cannot be understoodunderstood
Discovering hidden Values in your data Warehouse
That isThe extraction of hidden predictive information from large database
Knowledge discovery method– extraction of implicit and interesting pattern from large data collection
Data Mining-- IntroductionData Mining-- Introduction It started when we started to store data in
computer( businesses) Continued improvements– technology that
navigate through data in real time Examples:-– Single case:– Web server collect data for every single cleick– Logs are too big and contain gibberish– Lots of data and statistics– What we collected is not really useful– Multiple Case:-– Collection of web servers with large bandwidth– Think about the size of the data we collect
Data Mining --- ContinueData Mining --- Continue It helps to design better and more intelligent
business( e-learning environments) because it supported by– Massive data collection– Powerful multiprocessor computers– Good data mining algorithms
It existed at least 10 years, but it is getting popular recently
Example:-– Winter Corporation Report
• Data warehouses with as much as 100 to 200 terabytes of raw data will be operational by next year, performing nearly 2,000 concurrent queries and occupying nearly 1 petabyte (1,000 terabytes) of disk space. In the same time period, transaction-processing databases will handle workloads of nearly 66,000 transactions per second
Evolution of Data miningEvolution of Data miningEvolutionary step Question Tech Product
providers
characteristics
Data collection
60’s
What was my total revenue last few years
Computer, tapes, disks
IBM , CDC Retrospective static data delivery
Data Access
80’s
What were unit sales in India last year January
RDBMS(Relational DataBases)
SQL( Structured Query Languages)
ODBC
Oracle
Sybase
Informix
IBM
Microsoft
Dynamic data delivery
Data warehouse and decision support
90’s
What were unit sales price in India last March?
On-line analytic processing (OLAP)
Multidimensional data base, data warehouses
Pilot
Comshare
Arbor
Cognos
Microstrategy
Dynamic data delivery in multiple level
Data mining
Now
What will be unit price in India next month?
Why?
Advanced algorithms, multiprocessor computers, massive database
Pilot
Lockheed
IBM,SGI
Many more…
Prospective, proactive information delivery
The scope of Data miningThe scope of Data miningIt is similar to sifting gold from immense
amount of dirt--- searching valuable information in a gigabytes data
Automated prediction of trends and behaviors: Data mining automates the process of finding predictive information in a large database.
• Example: Question related to target marketing– Data mining can use mailing list data– other previous data to
identify the solution
• Another example- Forecasting bankruptcy by identifying segments of a population likely to respond similarly to given events
Automated discovery of previously unknown patterns: It sweep through the database and identify previously hidden patterns in one step– Example: Unrelated items purchased together in
a store.• Detecting fraudulent credit card transactions etc
Data base can be larger in both depth and breadth– High performance data mining need to analyze
full depth of a database without pre-selecting subsets
– Larger samples yield lower estimation errors and variances
Research RankResearch Rank2001 – According to MIT’s Technology
Review – Data mining is a top 10 research area
Recently – According to Gartner Group Advanced Technology Research Note– data mining and AI is top 5 key research area.
Multi-disciplinary field with a broad applicabilityMulti-disciplinary field with a broad applicability Has several applications
– Market based analysis– Customer relationship
management– Fraud detection– Network intrusion detection– Non-destructive eavaluation– Astronomy (look up dataa)– Remote sensing data
• ( look down data)
– Text and mulitmedia mining– Medical imaging– Automated target recognition
Combined ideas from several diffferent fields– Steganography-- Cryptography
My point of view of Data My point of view of Data miningminingBorrowing the idea from•Machine Learning•Artificial Intelligence•Statistics•High performance computing•Signal and Image Processing• Mathematical Optimization• Pattern Recognition•Natural Language processing•Steganography•Cryptography
General view of Data miningGeneral view of Data mining
RawData
TargetData
Preprocesseddata
Transformed Data Pattern
Knowledge
Data processing pattern recog. Interpreting resultsData FusionSamplingMRA
De-noisingObject IdentificationFeature ExtractionNormalization
DimensionReduction
ClassificationClusteringRegression
VisualizationValidation
An Iterative and Interactive ProcessAn Iterative and Interactive Process
Our Research Based OnOur Research Based On
Data Preprocessing–Multiresolution Analysis– De-noising ( wavelet based methods)– Object Classifications– Feature Extraction
Pattern Recognition– Classification– Clustering
Visualization and Validation– Steganography– Cryptography
Where we are going from hereWhere we are going from here More robust , accurate, scalable algorthim– For pre-processing and pattern recognition– Wavelets– and fractals
Newer data types– Video and multimedia– Multi-sensor data
More complex problems– Dynamic tracking in video– Mining text, audio, video, images
Investigating Steganography in images, analysis of data hiding methods, attacks against hidden information, and counter measures to attacks against digital watermarking ( detection and distortion)
How data mining works?How data mining works? How exactly the data mining able to tell you important
things that you did not know or what is going to happen next?
The method/ techniques that is used to perform these feats in data mining is called modeling – Modeling is simply the act of building a model in one situation
where you know the answer and then applying it to another situation that you don’t
– Example: Sunken treasure ship– Bermuda shore, other ships– path-- keep all these information– build the model– if the model is good– you find the treasure in the ocean
– Example 2: Identify telephone customer– for example you have the information that is the model that 98% customer who makes $60K per year spend more than $80 per month on long distance• with this model new customer can be selectively targeted
Most commonly used techniquesMost commonly used techniques Artificial Neural Networks: Non linear predictive models
that learn through training and resemble biological neural networks in structure
Decision Trees: Tree- shaped structures that represents set of decisions . These decisions generated rules for the classification of a dataset. Specific decision tree include classification and Regression Test(CART)and Chi Square Automated Interaction Detection (CAID)
Genetic Algorithms: optimization techniques that uses processes genetic combination, mutation, and selection in a design based on the concept of evolution
Nearest Neighbor Method: Rule Induction: OUR METHODS WILL BE BASED ON WAVELETS, OUR METHODS WILL BE BASED ON WAVELETS,
FRACTALS, STEG, AND CRYPTFRACTALS, STEG, AND CRYPT
Steganography MethodsSteganography MethodsLets us discuss few methods and its
advantage and disadvantage 1. Least Significant Method– Idea:- Hide the hidden message in LSB of the
pixels– Example:- – Advantage:- quick and easy– works well in
gray image– Disadvantage:- insert in 8 bit– changes color–
noticeable change– vulnerable to image processing– cropping and compression
Redundant method– Store more than one time--- withstand
croppingSpread Spectrum – Store the hidden message everywhere
STEGANALYSISSTEGANALYSISDetection DistortionDetection Distortion
Analyst observe various Various relationship betweenCover, message, stego-mediaSteganography tool
Analyst manipulate the stego-mediaTo render the embedded informationUseless or remove it altogether
Seeing the Unseen
DCT - Discrete Cosine TransformationDCT - Discrete Cosine Transformation– Encode
• Take image• Divide into 8x8 blocks• Apply 2-D DCT--- DCT
coefficients• Apply threshold value• Store the hidden message
in that place• Take inverse– store as
image
– Decode• Start with modified image• Apply DCT• Find coefficient less than
T• Extract bits• Combine bits and make
message
219 215 214 216 218 218 217 216
219 216 216 216 215 215 215 215 217 217 218 216 212 212 213 215 215 215 215 215 211 212 214 216 217 216 214 216 215 215 217 218 216 216 215 214 215 215 215 216 215 214 210 210 211 215 215 216
218 215 211 211 213 214 216 216
1720 1.524 7.683 1.234 1.625 0.9234 -0.07047 -1.055 5.667 3.475 -4.181 -1.524 1.152 1.637 1.016 0.38020.3711 -1.442 1.067 5.944 0.3943 -0.4591 0.1313 0.7812 3.888 -3.356 -1.97 3.265 0.5632 -0.939 -0.2434 0.2354 1.625 -2.279 0.4735 1.392 1.375 0.6552 -1.143 0.03459-4.049 -1.223 0.5466 -0.5425 -1.013 -0.2651 0.5696 -0.9296 1.876 1.924 -1.369 -1.132 -0.02802 -0.4646 0.1831 0.97290.8995 -0.7233 0.667 0.436 0.1325 -0.03665 -0.3141 -0.4749
Wavelets TransformationWavelets TransformationWavelets are basis function in continuous time.a basis is a set of linearly independent functions that can be used to produce all admissible functions f(t)
( )jkw t
,
( ) combination of basis functions ( )jk jkj k
f t b w t
The special feature of wavelet basis is that all functions ( )jkw t
are constructed from a single mother wavelet w(t). This wavelet is is a small wave ( a pulse). Normally it starts at time t=0 and end at time t=N Compressed = 0 (2 )j
jw w t Shifted k time = 0 ( ) ( )kw t w t k
Combine both we have ( ) (2 )jjkw t w t k
Haar Wavelet :- 1909 Haar, 1984– theory, 88– daubechies 89- Mallat 2-d, mra, -- 92- bi-orthogonal
Haar=
Message to be Hidden
Carrier Wavelet Wavelet
TransformationTransformationThresholdingThresholdingCompressionCompression
Stego image
Error ImageError Image
Inverse TransformationInverse TransformationExtract the Hidden MessageExtract the Hidden Message
figurefigure
Information security and data miningInformation security and data miningGoal of intrusion detection – discover
intrusion into a computer or networkWith internet and available tool for attacking
networks– security becomes a critical component of network
Misuse detection: finds intrusion by looking for activity corresponding to known techniques for intrusion
Anomaly detection: the system defines the expected behavior of the network in advance
What we wantWhat we wantThe tools to filter and classify informationTools to find and retrieve the relevant
information when you need itTools that adapt to your pace and needsTools to predict information needsTools to recommend tasks and information
sourcesTools than can be personalized, manually or
automatically
The tools should be…The tools should be… Non- intrusive Secure Integrated Adaptable Controllable Automatic or semi-automatic Useful For learners For educators Integrate operational data with customer,
suppliers and market --
Profitable applicationProfitable application A wide range of companies have deployed successful
application of data mining Some applications area include
– A pharmaceutical company can analyze its recent sales force activity and their results to improve target of high-value physician and determine which marketing activities will have the greatest impact in the next few months
– A credit card companies can leverage its vast warehouse of customers transactions data to identify customers most likely to be interested in a new credit product
– A diversified transportation company with a large direct sales forces can apply data mining to identify the best prospect for its services
– A large consumer package goods company can apply data mining to improve its sales process to retailers
ConclusionConclusionIn this talk, we have discussed data mining
related topics Our goals– Research– Software and algorithms– Application
Our main focus is Science Data, though applicable to other data sets as well
More information – check out websitehttp://www.cosc.iup.eud/sezekielContact: [email protected]