Upload
voquynh
View
219
Download
1
Embed Size (px)
Citation preview
G. Specht: Information Systems 4-1
4-1
4. Content-Based Search in Music
4-2
4.1 Audio Technology4.1.1 Human Ear: Acoustic Sound Waves
• Frequency: ~ Pitch• Amplitude: ~ Volume• Subsonic noise 0 Hz up to 20 Hz• Audible sound 20 Hz up to 20 KHz• Intensive domain: 600 Hz - 6.000 Hz• Ultrasound: 20 KHz up to 1 GHz
• In general no pure sinus waves: overlaps (harmonic, “tone”)
G. Specht: Information Systems 4-2
4-3
On Storage: Three Possibilities
1. Analog storage: compare to vinyl records, audio tapes.• Out of interest here
2. Digital storage: e.g. audio CDs
3. Symbolic storage: MIDI-technology, e.g. synthesizer
4-4
4.1.2 Digital Audio Storage
• Sampling Theorem:– The sampling frequency has to be greater than twice the highest
frequency occurring in the signal.
informationloss
=>
no noteworthyinformation loss
G. Specht: Information Systems 4-3
4-5
Digital Audio Storage
• Example:– Audible domain: 20 Hz - 20.000 Hz
– Telephone: 3.000 Hz > 6.000 sample points / sec.
– Hifi: 22.000 Hz > 44.000 sample points / sec.(per channel!)
– CD technology: Pulse Code Modulation (PCM-Coding):16 Bit (= 2 Byte) per sampling value
4-6
Digital Audio Storage
• Memory demands:– Audio data rate =
– Compare to phone data rate:
– Goal: via compressione.g. for mobile communication
2 Bytesampling rate
x 44.000 sampling ratesec x channel
x 2 channels x 1 KByte1024 Byte
= KBytesec
8KBitsec
64
= 1,6 KBytesec
KBitsec13
~~ 172 KBytesec
G. Specht: Information Systems 4-4
4-7
Digital Audio Storage
• Example: Audio CD: 74 min. of length
– Storage requirements
– Additional storage requirement for:• Error correction, index information, file descriptors
– Today’s Audio-CDs: 700 up to 880 MByte
> 74 Min. x 60 secMin
172 KBytesecx x
1 MByte1024 kByte
= 746 MByte
4-8
4.1.3 Symbolic Storage: MIDI-Technology
• For music only (not suitable for language) • Basic concept:
– Instead of digitalization of the sound, the score itself is stored (compare to synthesizer)
– Tuple: (Instrument, begin of a note, end of a note, fundamental frequency,
volume)– MIDI enables:
• 10 octave = 128 notes (= 27 = 7 Bit)• 128 instruments (incl. noises) altogether• Simultaneously: 16 channels = 16 instruments• Simultaneous notes per channel: 3-16 (quality feature of synthesizers)
(needed e.g. for organs, not for flutes)
G. Specht: Information Systems 4-5
4-9
Symbolic Storage: MIDI-Technology
• Example: key of a piano, begin of a note, end of a note, release of the key, key = note, velocity.
– 10 min. music ~ 200 KByte =>
• Result:
– Compact description of music files, which enables lifelike playback.
KBytesec0,33 data rate
4-10
Symbolic Storage: MIDI
• I/O:– Input via keyboard (piano keyboard) or notes on the screen– Previewer: synthesizer, PC
• Advantages of MIDI:– Compact presentation– Easily editable ( sequencer: editor, converts the sheet of music into
an internal MIDI format)– Extendable by intra document anchors (and so there exist links)
• Disadvantages of MIDI:– Not suitable for language– Artistic interpretation for one play? (only limited)– No documentary footage
G. Specht: Information Systems 4-6
4-11
Language
Acoustic and phoneticanalysis
(recognition of sound patterns)
Token
Syntactic analysis
Semantic analysis
Meaning of the text + (Gr.- Analysis)
Voice Synthesizer
Language
Sound patternSpoken referencesWord models
Word lexicon Morpheme lexiconSyntax grammar rules
Semantic word lexiconSemantic grammar rules
Fast access on mediumto high amounts of data
-Content bases search possible-- or e.g. translation (all steps backwards into the target language)
4.1.4 Outlook: Speech input and output
4-12
Outlook: Voice Input and Output
• Today’s technology:
– Speaker bound recognition of up to 20.000 words (trained software)
– Not bound to the speaker: recognition of up to 500 words!(troublesome: dialect, emotional pronunciation, background noise)
– At ca. 5% erroneously recognized words i.e. a sentence with three words: probability for a correct recognition:0,95 * 0,95 * 0,95 = 0,86
G. Specht: Information Systems 4-7
4-13
4.2. Content Based Search in Music Databases
• Goals :
– Melody Piece of music– Similar pieces of music, plagiarism etc.– Assignment, classification, genre identification
4-14
4.2.1. Data Source / Preprocessing
• MIDI -> Extraction of melody tracks
R U D U R U D (DUR)
G. Specht: Information Systems 4-8
4-15
4.2.2. Suitable Indexing
• Approches– String matching (-)– DUR – QPD (Query by Pitch Dynamic)
Window |w| = 2k = 23 = 8 Vector 2k-1
Point in a 2k-1-dim. space Storage in spatial data structure
4-16
4.2.3. Query by Pitch Dynamics(QPD)
• Example7531
P-structure: 1 1 3 1 5 5 7 5
∑-tree: 2 4 10 12
6 22
G. Specht: Information Systems 4-9
4-17
QPD
• Example (continuation)P-structure: 1 1 3 1 5 5 7 5
∑-tree: 2 4 10 12
6 22
hierarchical differences:
0 2 0 2
-2 -2
-16
4-18
QPD
• Example (continuation)
Haar-Wavelet Transformation (-16, -2, -2, 2, 0, 2, 0)
1
-1
G. Specht: Information Systems 4-10
4-19
QPD
Vectors respectively points in the2k-1-dim. space
Distance metric:
dE = (v1 - v2) o (v1 - v2)
Storage e.g. in an R-tree:
Melody search: Point searchSimilarity search: Bound-box searchClassifications: Cluster search
4-20
4.3 MusicDB
• Lyrics– Easy => performing a text query
• Metadata Tags– iTunes
• caches information from the audio format's tag (ID3)• Music Genome Project
– Discover new music– collecting details on every song
• melody, harmony, instrumentation, rhythm, vocals, lyricshttp://en.wikipedia.org/wiki/List_of_Music_Genome_Project_attributes
– find songs with interesting musical similarities• cf. Pandora – a subproject of the Music Genome Project
G. Specht: Information Systems 4-11
4-21
4.3.1 Introduction
• Query by Humming-System (QbH)
– Allows user to find a song by humming the tune– Serveral issues pose significant challenges:
• No perfect queries• Capturing pitches and notes from user is difficult• Melody extraction of a pre-recorded music file is difficult
4-22
Introduction
• QbH research project at NYU– D. Shasha, Y. Zhu– Demo available at http://querybyhum.cs.nyu.edu/
• MusicDB– Massachusetts Institute of Technology, 2005– http://web.mit.edu/edmond/www/projects/musicdb/
• Musicline– Fraunhofer Institut – Demo available at http://www.musicline.de/de/melodiesuche/input
• Musipedia– Demo available at http://www.musipedia.org/query_by_humming.0.html
G. Specht: Information Systems 4-12
4-23
4.3.2 MusicDB Concepts
• Based on MIDI files– No complicated pitch extraction necessary
• Indexing possibilities– Simple string-matching– DUR (UDS) – CubyHum (Extension to UDS)– QPD
4-24
MusicDB Concepts
• Architecture of MusicDB
MusicDB: A Query by Humming System,Edmond Lau, Annie Ding, Calvin On
G. Specht: Information Systems 4-13
4-25
MusicDB Concepts
• Melody Extractor
– Conversion of MIDI files into time series
MusicDB: A Query by Humming System,Edmond Lau, Annie Ding, Calvin On
4-26
MusicDB Concepts
– Isolation of the channel containing the main melody• Melody is a time series characterized by the highest pitched
(connected) notes in a song• MIDI data of a song:
Screenshots taken from Sonar 4
G. Specht: Information Systems 4-14
4-27
MusicDB Concepts
– Remove certain tracks and isolate the channels with the top 3 averagepitch values & choose channel with most single notes
Screenshots taken from Sonar 4
4-28
MusicDB Concepts
– Note duration and start-time get quantized and overlapping notes will bedeleted (Skyline-Alg)
Screenshots taken from Sonar 4
G. Specht: Information Systems 4-15
4-29
MusicDB Concepts
MusicDB: A Query by Humming System,Edmond Lau, Annie Ding, Calvin On
4-30
MusicDB Concepts
• Loading Pipeline
MusicDB: A Query by Humming System,Edmond Lau, Annie Ding, Calvin On
G. Specht: Information Systems 4-16
4-31
Melody Time Series
Time-series Slices Normalized Time-Series
To account off-key hummingnormalizing
MusicDB Concepts
4-32
Time-Series Database
PAA –dimensionality reduction
Transformed Database
R-Tree
Insert
Insert Into R-tree with N dimensions
MusicDB Concepts
PAA: Piecewise Aggregate Approximation(Resulting transform is similar to a Haar wavelet transform)
PAA
G. Specht: Information Systems 4-17
4-33
MusicDB Concepts
MusicDB: A Query by Humming System,Edmond Lau, Annie Ding, Calvin On
4-34
MusicDB Concepts
• Query
– Same procedure for user-humming as for the MIDI files in thedatabase
– Comparison of both time series (user & R-tree) according to a distance metric
– Results get ordered
G. Specht: Information Systems 4-18
4-35
4.3.3 Conclusion (of MusicDB Concepts)
- MusicDB fuses 2 techniques together (Melody extraction1 and a QbH system2)
- System uses widely available multi-channel polyphonic MIDI files
- Utilizes an open source tool to transcribe notes (Wave to MIDI)
1 Manipulation of Music For Melody Matching, Uitdenbogerd and J. Zobel 2 Query by Humming: A Time-Series Database Approach, Y. Zhu and D. Shasha
MusicDB: A Query by Humming System,Edmond Lau, Annie Ding, Calvin On
4-36
4.3.4 Demos
• QbH research project at NYUhttp://querybyhum.cs.nyu.edu/
• Musiclinehttp://www.musicline.de/de/melodiesuche/input
• Musipediahttp://www.musipedia.org/query_by_humming.0.html
G. Specht: Information Systems 4-19
4-37
4.3.5 References
• MusicDB - A Query by Humming System– http://web.mit.edu/edmond/www/projects/musicdb/
• Demos– http://querybyhum.cs.nyu.edu/– http://www.musicline.de/de/melodiesuche/input– http://www.musipedia.org/query_by_humming.0.html
• Pandora– www.pandora.com