Story Compression: Aggregating News Feeds Joseph W. Barker Advisor: James W. Davis Ohio State University What is Story Compression? News broadcasts from

Embed Size (px)

Citation preview

Story Compression: Aggregating News Feeds Joseph W. Barker Advisor: James W. Davis Ohio State University What is Story Compression? News broadcasts from multiple sources tend to cover same stories Stories have content overlap General content covered by multiple sources Specific content covered by one source Information gathering Waste time if view all broadcasts (general content redundancy) Miss information if only view one broadcast (specific content) Answer: Story Compression Detect general vs. specific content and create single story from all broadcasts with no redundancy Overview Divide story into content segments (i.e., single idea) Video shot (continuous scene) detection Compare segments Speech/text contains most of the informational content Word similarity Segment Similarity Detect specific vs. general segments Feline Mammal Canine Poodle Object Cat Segment Similarity Sentence similarity? Segments range from sub-sentence to multiple sentences Also, sentence boundaries (when multiple) poorly defined Sentence similarity emphasizes grammar/word order; wont work If ordering is problematic, use unordered groups instead Solution: Graph collapsing Group of nodes collapsed to single node by summing edge weights Inspired by spectral clustering and notion of random walk on graphs Random walk between groups equivalent to random walk between collapsed nodes Segment Similarity Word Similarity Most Unique Segments Manual segmentation employed Specific content Uniqueness overall dissimilarity Perfect dissimilarity similarity matrix rows/columns zero except for diagonal Thus, sum of row/column should approach zero for most dissimilar segments Perfect dissimilarity Somewhat dissimilar Perfect similaritySomewhat similar Automatic Segment Detection How to decide boundaries between segments? No sentence boundaries, so text not strong indicator Shot detection: Detect visual change from one scene to another Common techniques: Temporal extent Consecutive: compare sequential pairs of frames Key frame: compare to key frame of previous segment Distance measures Pixel-based: Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), Normalized Cross-Correlation (NCC) Color-based (histograms): 2, Bhattacharyya Texture-based: Scale Invariant Feature Transform (SIFT) MethodFTPFPFN SAD SSD NCC BATTA-H CHI2-H Conclusion and Future Work Graph collapsing can be used to derive group similarity from similarity of group members Additionally, can be used to evaluate uniqueness of objects, relatedness of groups Tested with text, working on video Future work Finalize graph collapsing video segmentation Expand word similarity to include multiple languages Investigate sub-image feature extraction/matching Examine other sources (e.g., YouTube) declaring a public health emergency. ABCNBC #1) after the virus killed.sadly had claimed 18 lives. NBC CBS #2) declaring a public health emergency. to repeat, declared a public health emergency. ABC NBC #3) ABC CBS theyve set up a special tent. a tent has been setup. #4) In Boston today, the mayor sounded the alarm ABC #1) moved onto the upper respiratory, which is a lot of coughing ABC #2) stay home when you are sick ABC #3) Ive never been hit by a Mack truck ABC #4) is on the panel that decides what goes in the vaccine CBS #5) after confirmed cases of flu reach 700 CBS #6) Consecutive Shot Detection Across All Stories Shot Detection on story FLU Video similarity Sum of diagonal blocks Frame Block Start Block End ABC CBS NBC