Upload
gregv123
View
978
Download
0
Embed Size (px)
Citation preview
The Information SieveGreg Ver Steeg and Aram Galstyan
Soup = data
“Main ingredient” extracted at each layer
Factorial code
• Carry recipe instead of soup• Missing ingredients?• Make more soup
• Compression• Prediction• Generative model
Recipe-Ingredient 1-Ingredient 2-… Invertible transform that makes
components independent
Finding such a transform is a generally intractable problem.We use a sequence that incrementally removes dependence
Two Steps1.Find the most informative function of the input data2. Transform the data to remove the
information in Yk, and then repeat
Soup
Main ingredient
The main ingredient: multivariate information• Multivariate mutual information, or Total Correlation (Watanabe, 1960)
• TC(X|Y) = 0 if and only if Y “explains” all the dependence in X• So we search for Y that minimizes TC(X|Y) • Equivalently, we define the total correlation explained by Y as:
The main ingredient:Total Correlation Explanation (CorEx)
• Optimize over all probabilistic functions• Solution has special form that makes it tractable• Computational complexity is linear in the number of variables
Sift out the main ingredient: remainder info
The remainder is a transformation of the inputs with 2 properties:
Soup
Remainder contains no info about Y
Transformation is invertible
Iterative sifting as:
Multivariate mutual information in data (Total Correlation)
Contribution from each layer of the sieve (optimized)
Remainder (at layer r)
Decomposition of information
Iterative sifting as:
Dependence at each layer of the sieve decreases until we get to zero, i.e. complete independence
Dependence (at layer r)
Extracting dependence
Recover spatial clusters from fMRI data
Ground truth ICA Sieve
Example of recovering spatial clusters in brain data from temporal activation patterns
Lossy compression and in-painting• Sieve representation with 12 layers/bits/binary latent factors on
MNIST digits
We can use the sieve for standard prediction and generative model tasks
Lossless compression (on MNIST)• Same size codebooks for Random and Sieve-based codes• (gzip is sequence-based, shown for reference)Proof of principle for lossless compression; though specialized
compression techniques are better on MNIST.
Method Naive gzip Random codebook
Sieve codebook
Bits per digit 784 328 267 243
Conclusion
• Incrementally decomposing multivariate information is useful, practical, and delicious• Could improve with joint optimization and better
transformations for remainder info
Link to all papers and codehttp://bit.ly/corex_info
Contact: [email protected], [email protected]
• The extension to continuous random variables is nontrivial but more practical and demonstrates connections to “common information”: “Sifting Common Information from Many Variables”, arXiv:1606.02307.