...A Probabilistic Approach to Querying on Music--688IT编程网

Eric Brochu Department of Computer Science University of British Columbia Vancouver,BC,Canada

ebrochu@cs.ubc.ca

Nando de Freitas Department of Computer Science University of British Columbia Vancouver,BC,Canada

nando@cs.ubc.ca

Abstract

We present a novel,ﬂexible statistical approach for modelling music and

text jointly.The approach is based on multi-modal mixture models and

maximum a posteriori estimation.The learned models can be used to

browse databases with documents containing music and text,to search

for music using queries consisting of music and text(lyrics and other

contextual information),to annotate text documents with music,and to

automatically recommend or identify similar songs.

1Introduction

Variations on“name that song”-types of games are popular on radio programs.DJs play a short excerpt from a song and listeners phone in to guess the name of the song.It is not surprizing to anyone that callers often get it right when DJs provide extra contextual clues (such as lyrics,or a piece of trivia about the song or band).In this paper,we attempt to reproduce this ability for carrying out information retrieval(IR)by presenting a method for querying with words and/or music.

We focus on monophonic and polyphonic musical pieces of known structure(MIDIﬁles, full music notation,etc.).Retrieving these pieces in multimedia databases,such as the Web,is a problem of growing interest[1,2,3].A signiﬁcant step was taken by Downie[4], who applied standard text IR techniques to retrieve music by,initially,converting music to text format.Most research(including[4])has,however,focused on plain music retrieval. To the best of our knowledge,there has been no attempt to model text and music jointly. We propose a joint probabilistic model for documents with music and/or text.This model is simple,easily extensible,ﬂexible and powerful.It allows users to query multimedia databases using text and/or musi

c as input.It is well suited for browsing applications as it organizes the documents into“soft”clusters.The document of highest probability in each cluster serves as a music thumbnail for automated music summarisation.The model allows one to query with an entire text document to automatically annotate the document with musical pieces.It can be used to automatically recommend or identify similar songs. Finally,it allows for the inclusion of different types of text,including website content, lyrics,and meta-data such as hyper-text links.

2Model speciﬁcation

The data consists of documents with text(lyrics or information about the song)and musical scores in GUIDO notation[1].(GUIDO is a powerful language for representing musical scores in an HTML-like notation.MIDIﬁles,plentiful on the World Wide Web,can be easily converted to this format.)We model the data with a Bayesian multi-modal mixture model.Words and scores are assumed to be conditionally independent given the mixture component label.

We model musical scores withﬁrst-order Markov chains,in which each state corresponds to a note,rest,or the start of a new voice.Notes’pitches are represented by the interval change(in semitones)from the previous note,rather than by absolute pitch,so that a score or query transposed t

o a different key will still have the same Markov chain.Rhythm is represented using the standard fractional musical measurement of whole-note,half-note, quarter-note,etc.Rest states are represented similarly,save that pitch is not represented. See Figure1for an example.

Polyphonic scores are represented by chaining the beginning of a new voice to the end of a previous one.In order to ensure that theﬁrst note in each voice appears in both the row and column of the Markov transition matrix,a special“new voice”state with no interval or rhythm serves as a dummy state marking the beginning of a new voice.Theﬁrst note of a voice has a distinguishing“ﬁrst note”interval value.

[

0newvoice0

htmlradio的text出不来1rest

2firstnote

3+1

4+2

5-2

6-2

7+3

8-5

Figure1:Sample melody–the opening notes to“The Yellow Submarine”by The Beatles –in different notations.From top:GUIDO notation,standard musical notation(generated automatically from GUIDO notation),and as a series of states in aﬁrst-order Markov chain(also generated automatically from GUIDO notation).

The Markov chain representation of a piece of music is then mapped to a transition fre-quency table,where denotes the number of times we observe the transition from state to state in document.We use to denote the initial state of the Markov chain.The associated text is modeled using a standard term frequency vector,where denotes the number of times word appears in document.For notational simplic-ity,we group the music and text variable as follows:.In essence,this Markovian approach is akin to a text bigram model,save that the states are musical notes and rests rather than words.

Our multi-modal mixture model is as follows:

(1) where encompasses all the model parameters and where if theﬁrst entry of belongs to state and is otherwise.The three-dimensional matrix denotes the estimated probability of transitioning from state to state in cluster,the matrix denotes the initial probabilities of being in state ,given membership in cluster.The vector denotes the probability of each cluster. The matrix denotes the probability of the word in cluster.The mixture model is deﬁned on the standard probability simplex for all and. We introduce the latent allocation variables to indicate that a particular sequence belongs to a speciﬁc cluster.These indicator variables

correspond to an i.i.d.sample from the distribution.

This simple model is easy to extend.For browsing applications,we might prefer a hierar-chical structure with levels:

(2)

This is still a multinomial model,but by applying appropriate parameter constraints we can produce a tree-like browsing structure[5].It is also easy to formulate the model in terms of aspects and clusters as suggested in[6,7].

2.1Prior speciﬁcation

We follow a hierarchical Bayesian strategy,where the unknown parameters and the al-location variables are regarded as being drawn from appropriate prior distributions.We acknowledge our uncertainty about the exact form of the prior by specifying it in terms of some unknown parameters(hyperparameters).The allocation variables are assumed to be drawn from a multinomial distribution,.We place a conjugate Dirichlet prior on the mixing coefﬁcients.Similarly,we place Dirich-let prior distributions on each,on each,on each ,and assume that these priors are independent.

The posterior for the allocation variables will be required.It can be obtained easily using Bayes’rule:

(3) 3Computation

The parameters of the mixture model cannot be computed analytically unless one knows the mixture indicator variables.We have to resort to numerical methods.One can imple-ment a Gibbs sampler to compute the parameters and allocation variables.This is done by sampling the parameters from their Dirichlet posteriors and the allocation variables from their multinomial posterior.However,this algorithm is too computationally intensive for

the applications we have in mind.Instead we opt for expectation maximization(EM)algo-rithms to compute the maximum likelihood(ML)and maximum a posteriori(MAP)point estimates of the mixture model.

3.1Maximum likelihood estimation with the EM algorithm

After initialization,the EM algorithm for ML estimation iterates between the following two steps:

1.E step:Compute the expectation of the complete log-likelihood with respect to the dis-

tribution of the allocation variables ML old,

where old represents the value of the parameters at the previous time step.

2.M step:Maximize over the parameters:new ML

The ML function expands to

In the E step,we have to compute using equation(3).The corresponding M step requires that we maxi

mize ML subject to the constraints that all probabilities for the pa-rameters sum up to1.This constrained maximization can be carried out by introducing Lagrange multipliers.The resulting parameter estimates are:

(6)

(8)

3.2Maximum a posteriori estimation with the EM algorithm

The EM formulation for MAP estimation is straightforward.One simply has to augment the objective function in the M step,ML,by adding to it the log prior densities.That is, the MAP objective function is

MAP

old

The MAP parameter estimates are:

(10)

(12)

CLUSTER SONG

4.1Demonstrating the utility of multi-modal queries

A major intended use of the text-score model is for searching documents on a combination of text and music.Consider a hypothetical example,using our database:A music fan is struggling to recall a dimly-remembered song with a strong repeating single-pitch,dotted-eight-note/sixteenth-note bass line,and lyrics containing the words come on,come on,get down.A search on the text portion alone turns up four documents which contain the lyrics.

A search on the notes alone returns seven documents which have matching transitions.But a combined search returns only the correct document(see Figure3).This conﬁrms the hypothesis that integrating different sources of information in the query can result in more precise results.

QUERY RETRIEVED SONGS

The Beatles–Got to Get You Into My Life

The Beatles–I’m Only Sleeping

The Beatles–Yellow Submarine

Moby–Bodyrock

Moby–Porcelain

Gary Portnoy–‘Cheers’theme song

Rodgers&Hart–Blue Moon

688IT编程网

...A Probabilistic Approach to Querying on Music

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

...A Probabilistic Approach to Querying on Music

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式