ROAM: Recurrently Optimizing Tracking Model...setting, which can converge the model in a few gradient steps. This improves the convergence speed of updating the tracking model while

ROAM: Recurrently Optimizing Tracking Model

Tianyu Yang1,2 Pengfei Xu3 Runbo Hu3 Hua Chai3 Antoni B. Chan2

1 Tencent AI Lab 2 City University of Hong Kong 3 Didi Chuxing

Abstract

In this paper, we design a tracking model consisting of

response generation and bounding box regression, where

the first component produces a heat map to indicate the

presence of the object at different positions and the second

part regresses the relative bounding box shifts to anchors

mounted on sliding-window locations. Thanks to the resiz-

able convolutional filters used in both components to adapt

to the shape changes of objects, our tracking model does

not need to enumerate different sized anchors, thus saving

model parameters. To effectively adapt the model to appear-

ance variations, we propose to offline train a recurrent neu-

ral optimizer to update tracking model in a meta-learning

setting, which can converge the model in a few gradient

steps. This improves the convergence speed of updating the

tracking model while achieving better performance. We ex-

tensively evaluate our trackers, ROAM and ROAM++, on

the OTB, VOT, LaSOT, GOT-10K and TrackingNet bench-

mark and our methods perform favorably against state-of-

the-art algorithms.

1. Introduction

Generic visual object tracking is the task of estimating

the bounding box of a target in a video sequence given only

its initial position. Typically, the preliminary model learned

from the first frame needs to be updated continuously to

adapt to the target’s appearance variations caused by rota-

tion, illumination, occlusion, deformation, etc. However,

it is challenging to optimize the initial learned model effi-

ciently and effectively as tracking proceeds. Training sam-

ples for model updating are usually collected based on es-

timated bounding boxes, which could be inaccurate. Those

small errors will accumulate over time, gradually resulting

in model degradation.

To avoid model updating, which may introduce unre-

liable training samples that ruin the model, several ap-

proaches [4, 43] investigate tracking by only comparing the

first frame with the subsequent frames, using a similarity

function based on a learned discriminant and invariant deep

Siamese feature embedding. However, training such a deep

representation is difficult due to drastic appearance varia-

tions that commonly emerge in long-term tracking. Other

methods either update the model via an exponential moving

average of templates [16, 44], which marginally improves

the performance, or optimize the model with hand-designed

SGD methods [33, 41], which needs numerous iterations

to converge thus preventing real-time speed. Limiting the

number of SGD iterations can allow near real-time speeds,

but at the expense of poor quality model updates due to the

loss function not being optimized sufficiently.

In recent years, much effort has been done on localizing

the object using robust online learned classifier, while few

attention is paid on designing accurate bounding box esti-

mation. Most trackers simply resort to multi-scale search

by assuming that the object aspect ratio does not change

during tracking, which is often violated in real world. Re-

cently, SiamRPN [23] borrows the idea of region proposal

networks [37] in object detection to decompose tracking

task into two branches: 1) classifying the target from the

background, and 2) regressing the accurate bounding box

based with reference to anchor boxes mounted on different

positions. As is shown on the VOT benchmarks [19, 20],

SiamRPN achieves higher precision on bounding box esti-

mation but suffers lower robustness compared with state-of-

the-art methods [8, 9, 26] due to no online model updating.

Furthermore, SiamRPN mounts anchors with different as-

pect ratios on every spatial location of the feature map to

handle possible shape changes, which is redundant in both

computation and storage.

In this paper, we propose a tracking framework which is

composed of two modules: response generation and bound-

ing box regression, where the first component produces a re-

sponse map to indicate the possibility of covering the object

for anchor boxes mounted on sliding-window positions, and

the second part predicts bounding box shifts from the an-

chors to get refined rectangles. Instead of enumerating dif-

ferent aspect ratios of anchors as in SiamRPN, we propose

to use only one sized anchor for each position and adapt

it to shape changes by resizing its corresponding convolu-

tional filter using bilinear interpolation, which saves model

parameters and computing time. To effectively adapt the

tracking model to appearance changes during tracking, we

propose a recurrent model optimization method to learn a

more effective gradient descent that converges the model

update in 1-2 steps, and generalizes better to future frames.

The key idea is to train a neural optimizer that can con-

6718

verge the tracking model to a good solution in a few gradient

steps. During the training phase, the tracking model is first

updated using the neural optimizer, and then it is applied

on future frames to obtain an error signal for minimization.

Under this particular setting, the resulting optimizer con-

verges the tracking classifier significant faster than SGD-

based optimizers, especially for learning the initial tracking

model. In summary, our contributions are:

• We propose a tracking model consisting of resizable

response generator and bounding box regressor, where

only one sized anchor is used on each spatial posi-

tion and its corresponding convolutional filter could be

adapted to shape variations by bilinear interpolation.

• We propose a recurrent neural optimizer, which is

trained in a meta-learning setting, that recurrently up-

dates the tracking model with faster convergence.

• We conduct comprehensive experiments on large scale

datasets including OTB, VOT, LaSOT, GOT10k and

TrackingNet, and our trackers achieve favorable per-

formance compared with the state-of-the-art.

2. Related Work

Visual Tracking Predicting a heat map to indicate the

position of object is commonly used in visual tracking com-

munity [3, 4, 4, 9, 13, 16, 45]. Among them, SiamFC [4]

is one of the most popular methods due to its fast speed

and good performance. However, most response generation

based trackers, including SiamFC, estimate the bounding

box via simple multi-scale search mechanism, which is un-

able to handle aspect ratio changes. To addresses this issue,

recent SiamRPN [23] and its extensions [11, 22, 52] pro-

pose to train a bounding box regressor as in object detec-

tion [37], showing impressive performance. Different from

these algorithms which enumerates a set of predefined an-

chors with different aspect ratios on each spatial position,

we adopt a resizable anchor to adapt the shape variation of

object, which saves model parameters and computing time.

Online model updating is another important module that

SiamFC lacks. Recent works improve SiamFC [4] by intro-

ducing various model updating strategies, including recur-

rent generation of target template filters through a convo-

lutional LSTM [48], a dynamic memory network [49, 50],

where object information is written into and read from an

addressable external memory, and distractor-aware incre-

mental learning [53], which make use of hard-negative tem-

plates around the target to suppress distractors. It should

be noted that all these algorithms essentially achieve model

updating by linearly interpolating old target templates with

the newly generated one, in which the major difference is

how to control the weights when combining them. This is

far from optimal compared with optimization methods us-

ing gradient decent, which minimize the tracking loss di-

rectly to adapt to new target appearances. Instead of using

a Siamese network to build the convolutional filter, other

methods [26, 34, 41] generate the filter by performing gra-

dient decent on the first frame, which could be continuously

optimized during subsequent frames. In particular, [34] pro-

poses to train the initial tracking model in a meta-learning

setting, which shows promising results. However, it still

uses traditional SGD to optimize the tracking model dur-

ing the subsequent frames, which is not effective to adapt to

new appearance and slow in updating the model. In contrast

to these trackers, our off-line learned recurrent neural opti-

mizer applies meta-learning on both the initial model and

model updates, which allows model initialization and up-

dating in only one or two gradient steps, resulting in much

faster runtime speed, and better accuracy.

Learning to Learn. Learning to learn or meta-learning

has a long history [2, 32, 38]. With the recent successes of

applying meta-learning on few-shot classification [31, 36]

and reinforcement learning [12, 39], it has regained atten-

tion. The pioneering work [1] designs an off-line learned

optimizer using gradient decent and shows promising per-

formance compared with traditional optimization methods.

However, it does not generalize well for large numbers of

descent step. To mitigate this problem, [28] proposes sev-

eral training techniques, including parameters scaling and

combination with convex functions to coordinate the learn-

ing process of the optimizer. [46] also addresses this issue

by designing a hierarchical RNN architecture with dynami-

cally adapted input and output scaling. In contrast to other

works that output an increment for each parameter update,

which is prone to overfitting due to different gradient scales,

we instead associate an adaptive learning rate produced by

a recurrent neural network with the computed gradient for

fast convergence of the model update.

3. Proposed Algorithm

Our tracker consists of two main modules: 1) a tracking

model that is resizable to adapt to shape changes; and 2) a

neural optimizer that is in charge of model updating. The

tracking model contains two branches where the response

generation branch determines the presence of target by pre-

dicting a confidence score map and the bounding box re-

gression branch estimates the precise box of the target by

regressing coordinate shifts from the box anchors mounted

on the sliding-window positions. The offline learned neu-

ral optimizer is trained using a meta-learning framework to

online update the tracking model in order to adapt to appear-

ance variations. Note both response generation and bound-

ing box regression are built on the feature map computed

from the backbone CNN network. The whole framework is

briefly illustrated on Fig. 1

3.1. Resizable Tracking Model

Trackers like correlation filter [16] and MetaTracker [34]

initialize a convolutional filter based on the size of object

in the first frame and keep its size fixed during subsequent

frames. This setting is based on the assumption that the as-

6719

CNN

FeatureExtractor TrackingModel Predic7on LabelsFrames

HistoricalFrames

FutureFrame

F1...Fi...FN<latexit sha1_base64="Iw/uSsNywH23SWGIHOpWAZmbMXY=">AAACFHicbZDLSsNAFIZP6q3GW7RLN8EiuAqJCLosCMWVVLAXaEOYTCft0MmFmYkYQl7DrVt9B12JO3HvK/gUTtMutPWHGT7+cw5n5vcTRoW07S+tsrK6tr5R3dS3tnd294z9g46IU45JG8cs5j0fCcJoRNqSSkZ6CSco9Bnp+pPLab17R7igcXQrs4S4IRpFNKAYSWV5Rq3p5U5hWVbTo+WdXxeeUbctu5S5DM4c6o3qxysotTzjezCMcRqSSGKGhOg7diLdHHFJMSOFPkgFSRCeoBHpK4xQSISbl48vzGPlDM0g5upE0izd3xM5CoXIQl91hkiOxWJtav5bG3GUjCm+X9gvgws3p1GSShLh2fogZaaMzWk+5pBygiXLFCDMqfqBiceIIyxVirquq3CcxSiWoXNqOYpvVEpnMFMVDuEITsCBc2jAFbSgDRgyeIQneNYetBftTXuftVa0+UwN/kj7/AE6Ip50</latexit><latexit sha1_base64="KdPjJpIgksbpsOEh7IPl91nmxPg=">AAACFHicbZDLSsNAFIYn9VbjLbZLN6FFcBUSEXRZUIorqWAv0IYwmU7aoZNJmJmIIeQ13LoTfQddiTtx7yv4FE7TLrT1wAwf/38OZ+b3Y0qEtO0vrbSyura+Ud7Ut7Z3dveM/UpHRAlHuI0iGvGeDwWmhOG2JJLiXswxDH2Ku/7kfOp3bzEXJGI3Mo2xG8IRIwFBUCrJM6pNL3Nyy7KaHinu7Cr3jLpt2UWZy+DMod4of7xWLh5rLc/4HgwjlISYSUShEH3HjqWbQS4JojjXB4nAMUQTOMJ9hQyGWLhZ8fjcPFTK0Awirg6TZqH+nshgKEQa+qozhHIsFr2p+K834jAeE3S3sF8GZ25GWJxIzNBsfZBQU0bmNB9zSDhGkqYKIOJE/cBEY8ghkipFXddVOM5iFMvQObYcxdcqpRMwqzI4ADVwBBxwChrgErRAGyCQggfwBJ61e+1Fe9PeZ60lbT5TBX9K+/wB01ufkg==</latexit><latexit sha1_base64="KdPjJpIgksbpsOEh7IPl91nmxPg=">AAACFHicbZDLSsNAFIYn9VbjLbZLN6FFcBUSEXRZUIorqWAv0IYwmU7aoZNJmJmIIeQ13LoTfQddiTtx7yv4FE7TLrT1wAwf/38OZ+b3Y0qEtO0vrbSyura+Ud7Ut7Z3dveM/UpHRAlHuI0iGvGeDwWmhOG2JJLiXswxDH2Ku/7kfOp3bzEXJGI3Mo2xG8IRIwFBUCrJM6pNL3Nyy7KaHinu7Cr3jLpt2UWZy+DMod4of7xWLh5rLc/4HgwjlISYSUShEH3HjqWbQS4JojjXB4nAMUQTOMJ9hQyGWLhZ8fjcPFTK0Awirg6TZqH+nshgKEQa+qozhHIsFr2p+K834jAeE3S3sF8GZ25GWJxIzNBsfZBQU0bmNB9zSDhGkqYKIOJE/cBEY8ghkipFXddVOM5iFMvQObYcxdcqpRMwqzI4ADVwBBxwChrgErRAGyCQggfwBJ61e+1Fe9PeZ60lbT5TBX9K+/wB01ufkg==</latexit><latexit sha1_base64="KdPjJpIgksbpsOEh7IPl91nmxPg=">AAACFHicbZDLSsNAFIYn9VbjLbZLN6FFcBUSEXRZUIorqWAv0IYwmU7aoZNJmJmIIeQ13LoTfQddiTtx7yv4FE7TLrT1wAwf/38OZ+b3Y0qEtO0vrbSyura+Ud7Ut7Z3dveM/UpHRAlHuI0iGvGeDwWmhOG2JJLiXswxDH2Ku/7kfOp3bzEXJGI3Mo2xG8IRIwFBUCrJM6pNL3Nyy7KaHinu7Cr3jLpt2UWZy+DMod4of7xWLh5rLc/4HgwjlISYSUShEH3HjqWbQS4JojjXB4nAMUQTOMJ9hQyGWLhZ8fjcPFTK0Awirg6TZqH+nshgKEQa+qozhHIsFr2p+K834jAeE3S3sF8GZ25GWJxIzNBsfZBQU0bmNB9zSDhGkqYKIOJE/cBEY8ghkipFXddVOM5iFMvQObYcxdcqpRMwqzI4ADVwBBxwChrgErRAGyCQggfwBJ61e+1Fe9PeZ60lbT5TBX9K+/wB01ufkg==</latexit><latexit sha1_base64="D9uc3cTaj14D1y+MAf6rS95qOtM=">AAACFHicbZBNS8MwGMfT+TbrW3VHL8EheCqtCHocCMOTTHAvsJWSZukWlqYlScVS+jW8etXv4E28evcr+CnMuh5084GEH///8/Ak/yBhVCrH+TJqa+sbm1v1bXNnd2//wDo86sk4FZh0ccxiMQiQJIxy0lVUMTJIBEFRwEg/mF3P/f4DEZLG/F5lCfEiNOE0pBgpLflWo+3nbmHbdtun5Z3fFr7VdGynLLgKbgVNUFXHt75H4xinEeEKMyTl0HUS5eVIKIoZKcxRKkmC8AxNyFAjRxGRXl4+voCnWhnDMBb6cAVL9fdEjiIpsyjQnRFSU7nszcV/vYlAyZTix6X9KrzycsqTVBGOF+vDlEEVw3k+cEwFwYplGhAWVP8A4ikSCCudommaOhx3OYpV6J3bruY7p9m6qGKqg2NwAs6ACy5BC9yADugCDDLwDF7Aq/FkvBnvxseitWZUMw3wp4zPH5oYnKM=</latexit>

Features

Feature

UpdateLoss

MetaLossCNN

F (t+δ)<latexit sha1_base64="Yv/iBoLmz3VBDlUhSlZjJhpPFzo=">AAACDXicbZDJSgNBEIZrXOO4RT16GQxCRAgzIujNgCAeI5gFstHT05M06VnorhHDkGfwam7qO3gQxKvP4Cv4FHaWgyb+0PDxVxVV/bux4Apt+8tYWFxaXlnNrJnrG5tb29md3YqKEklZmUYikjWXKCZ4yMrIUbBaLBkJXMGqbu9yVK/eMal4FN5iP2bNgHRC7nNKUFutq1aax+OGxwSSo0E7m7ML9ljWPDhTyF28DYdPAFBqZ78bXkSTgIVIBVGq7tgxNlMikVPBBmYjUSwmtEc6rK4xJAFTzXR89cA61I5n+ZHUL0Rr7P6eSEmgVD9wdWdAsKtmayPz31pHkrjL6f3MfvTPmykP4wRZSCfr/URYGFmjYCyPS0ZR9DUQKrn+gUW7RBKKOj7TNHU4zmwU81A5KTiab+xc8RQmysA+HEAeHDiDIlxDCcpAQcIjPMOL8WC8Gu/Gx6R1wZjO7MEfGZ8/4cGeIQ==</latexit><latexit sha1_base64="43HwIVKPHVy5cor05FNIWrn7Ayc=">AAACDXicbZDLSsNAFIYn9VbjrerSTbAIFaEkIujOgiAuK9gLtGmZTCbt0MkkzJyIJfQZ3Ko7fQcXgrj1GXwFn8JJ24W2/jDw8Z9zOGd+L+ZMgW1/GbmFxaXllfyquba+sblV2N6pqyiRhNZIxCPZ9LCinAlaAwacNmNJcehx2vAGF1m9cUulYpG4gWFM3RD3BAsYwaCtzmUnLcFR26cc8OGoWyjaZXssax6cKRTP3x4zPVW7he+2H5EkpAIIx0q1HDsGN8USGOF0ZLYTRWNMBrhHWxoFDqly0/HVI+tAO74VRFI/AdbY/T2R4lCpYejpzhBDX83WMvPfWk/iuM/I3cx+CM7clIk4ASrIZH2QcAsiKwvG8pmkBPhQAyaS6R9YpI8lJqDjM01Th+PMRjEP9eOyo/naLlZO0ER5tIf2UQk56BRV0BWqohoiSKIH9IxejHvj1Xg3PiatOWM6s4v+yPj8AWmFn+Y=</latexit><latexit sha1_base64="43HwIVKPHVy5cor05FNIWrn7Ayc=">AAACDXicbZDLSsNAFIYn9VbjrerSTbAIFaEkIujOgiAuK9gLtGmZTCbt0MkkzJyIJfQZ3Ko7fQcXgrj1GXwFn8JJ24W2/jDw8Z9zOGd+L+ZMgW1/GbmFxaXllfyquba+sblV2N6pqyiRhNZIxCPZ9LCinAlaAwacNmNJcehx2vAGF1m9cUulYpG4gWFM3RD3BAsYwaCtzmUnLcFR26cc8OGoWyjaZXssax6cKRTP3x4zPVW7he+2H5EkpAIIx0q1HDsGN8USGOF0ZLYTRWNMBrhHWxoFDqly0/HVI+tAO74VRFI/AdbY/T2R4lCpYejpzhBDX83WMvPfWk/iuM/I3cx+CM7clIk4ASrIZH2QcAsiKwvG8pmkBPhQAyaS6R9YpI8lJqDjM01Th+PMRjEP9eOyo/naLlZO0ER5tIf2UQk56BRV0BWqohoiSKIH9IxejHvj1Xg3PiatOWM6s4v+yPj8AWmFn+Y=</latexit><latexit sha1_base64="aH1IhIcmwlASxeWC/Rc4YrEnRbA=">AAACDXicbZDLSsNAFIYn9VbjrerSzWARKkJJRNBlQRCXFewF2rRMJpN26GQSZk7EEvoMbt3qO7gTtz6Dr+BTOG2z0NYfBj7+cw7nzO8ngmtwnC+rsLK6tr5R3LS3tnd290r7B00dp4qyBo1FrNo+0UxwyRrAQbB2ohiJfMFa/uh6Wm89MKV5LO9hnDAvIgPJQ04JGKt308sqcNYNmAByOumXyk7VmQkvg5tDGeWq90vf3SCmacQkUEG07rhOAl5GFHAq2MTuppolhI7IgHUMShIx7WWzqyf4xDgBDmNlngQ8c39PZCTSehz5pjMiMNSLtan5b22gSDLk9HFhP4RXXsZlkgKTdL4+TAWGGE+DwQFXjIIYGyBUcfMDTIdEEQomPtu2TTjuYhTL0DyvuobvnHLtIo+piI7QMaogF12iGrpFddRAFCn0jF7Qq/VkvVnv1se8tWDlM4foj6zPH9ozm1c=</latexit>

λ(t−1)<latexit sha1_base64="vUHryu/+0sr79bTwctpjSKBh6wc=">AAAB+HicbVDLSgNBEOz1GeMjqx69DAYhHgy7EtBjwIvHCOYByRpmJ7PJkNkHM71CXPIlXjwo4tVP8ebfOEn2oIkFA0VVNd1TfiKFRsf5ttbWNza3tgs7xd29/YOSfXjU0nGqGG+yWMaq41PNpYh4EwVK3kkUp6Evedsf38z89iNXWsTRPU4S7oV0GIlAMIpG6tulnjThAX3IKnjhnk/7dtmpOnOQVeLmpAw5Gn37qzeIWRryCJmkWnddJ0EvowoFk3xa7KWaJ5SN6ZB3DY1oyLWXzQ+fkjOjDEgQK/MiJHP190RGQ60noW+SIcWRXvZm4n9eN8Xg2stElKTII7ZYFKSSYExmLZCBUJyhnBhCmRLmVsJGVFGGpquiKcFd/vIqaV1WXcPvauV6La+jACdwChVw4QrqcAsNaAKDFJ7hFd6sJ+vFerc+FtE1K585hj+wPn8AtrOSaQ==</latexit><latexit sha1_base64="vUHryu/+0sr79bTwctpjSKBh6wc=">AAAB+HicbVDLSgNBEOz1GeMjqx69DAYhHgy7EtBjwIvHCOYByRpmJ7PJkNkHM71CXPIlXjwo4tVP8ebfOEn2oIkFA0VVNd1TfiKFRsf5ttbWNza3tgs7xd29/YOSfXjU0nGqGG+yWMaq41PNpYh4EwVK3kkUp6Evedsf38z89iNXWsTRPU4S7oV0GIlAMIpG6tulnjThAX3IKnjhnk/7dtmpOnOQVeLmpAw5Gn37qzeIWRryCJmkWnddJ0EvowoFk3xa7KWaJ5SN6ZB3DY1oyLWXzQ+fkjOjDEgQK/MiJHP190RGQ60noW+SIcWRXvZm4n9eN8Xg2stElKTII7ZYFKSSYExmLZCBUJyhnBhCmRLmVsJGVFGGpquiKcFd/vIqaV1WXcPvauV6La+jACdwChVw4QrqcAsNaAKDFJ7hFd6sJ+vFerc+FtE1K585hj+wPn8AtrOSaQ==</latexit><latexit sha1_base64="vUHryu/+0sr79bTwctpjSKBh6wc=">AAAB+HicbVDLSgNBEOz1GeMjqx69DAYhHgy7EtBjwIvHCOYByRpmJ7PJkNkHM71CXPIlXjwo4tVP8ebfOEn2oIkFA0VVNd1TfiKFRsf5ttbWNza3tgs7xd29/YOSfXjU0nGqGG+yWMaq41PNpYh4EwVK3kkUp6Evedsf38z89iNXWsTRPU4S7oV0GIlAMIpG6tulnjThAX3IKnjhnk/7dtmpOnOQVeLmpAw5Gn37qzeIWRryCJmkWnddJ0EvowoFk3xa7KWaJ5SN6ZB3DY1oyLWXzQ+fkjOjDEgQK/MiJHP190RGQ60noW+SIcWRXvZm4n9eN8Xg2stElKTII7ZYFKSSYExmLZCBUJyhnBhCmRLmVsJGVFGGpquiKcFd/vIqaV1WXcPvauV6La+jACdwChVw4QrqcAsNaAKDFJ7hFd6sJ+vFerc+FtE1K585hj+wPn8AtrOSaQ==</latexit><latexit sha1_base64="vUHryu/+0sr79bTwctpjSKBh6wc=">AAAB+HicbVDLSgNBEOz1GeMjqx69DAYhHgy7EtBjwIvHCOYByRpmJ7PJkNkHM71CXPIlXjwo4tVP8ebfOEn2oIkFA0VVNd1TfiKFRsf5ttbWNza3tgs7xd29/YOSfXjU0nGqGG+yWMaq41PNpYh4EwVK3kkUp6Evedsf38z89iNXWsTRPU4S7oV0GIlAMIpG6tulnjThAX3IKnjhnk/7dtmpOnOQVeLmpAw5Gn37qzeIWRryCJmkWnddJ0EvowoFk3xa7KWaJ5SN6ZB3DY1oyLWXzQ+fkjOjDEgQK/MiJHP190RGQ60noW+SIcWRXvZm4n9eN8Xg2stElKTII7ZYFKSSYExmLZCBUJyhnBhCmRLmVsJGVFGGpquiKcFd/vIqaV1WXcPvauV6La+jACdwChVw4QrqcAsNaAKDFJ7hFd6sJ+vFerc+FtE1K585hj+wPn8AtrOSaQ==</latexit> Oθ(t−1)`

(t−1)<latexit sha1_base64="QDkHdQtreXDgf8gSrtR7NBxjB2Q=">AAACLHicbZDBShxBEIZrTIw6mmTVo5dGERSJzARBj4IXL4KBrArOutT01u429vQM3TXqMuxD+CJec9V38CIScvPgU9i7ayBZ/aHh67+qKOpPC60cR9FDMPHh4+SnqemZcHbu85evtfmFI5eXVlJd5jq3Jyk60spQnRVrOiksYZZqOk7P9wb14wuyTuXmJ/cKamTYMaqtJLK3mrWNhK1C09HUyi9Ns0q4S4xn1Rp/i9f7fZGQ1n9/zdpKtBkNJd5C/Aoruwt/Dm4B4LBZe05auSwzMiw1OncaRwU3KrSspKZ+mJSOCpTn2KFTjwYzco1qeFRfrHqnJdq59c+wGLr/TlSYOdfLUt+ZIXfdeG1gvlvrWCy6Sl6N7ef2TqNSpiiZjBytb5dacC4GuYmWsiRZ9zygtMpfIGQXLUr26YZh6MOJx6N4C0ffN2PPP3xKWzDSNCzBMqxBDNuwC/twCHWQcA2/4BbugpvgPngMfo9aJ4LXmUX4T8HTC6P5qdQ=</latexit><latexit sha1_base64="WA17lfy4ZYeBqYSk9OqN0NfLY/8=">AAACLHicbZDPSxtBFMdnrW3j9leMRy+DoZBSGnaLYI+CFy9CBBMFNw1vJy/JkNnZZeZta1jyR/iPeO1J0P/Bi4h48+DZg0cniUIb+4WBz3zfezzeN86UtBQEl97Cq8XXb96Wlvx37z98/FRerrRsmhuBTZGq1BzEYFFJjU2SpPAgMwhJrHA/Hm5N6vu/0FiZ6j0aZdhOoK9lTwogZ3XKXyMyEnRfYTf9rTtFRAMk+FnU6Fv4ZTzmESr1/OuUq0E9mIq/hPAJqpuVm53Th3vb6JTvom4q8gQ1CQXWHoZBRu0CDEmhcOxHucUMxBD6eOhQQ4K2XUyPGvPPzunyXmrc08Sn7t8TBSTWjpLYdSZAAztfm5j/rfUNZAMpjub2U+9Hu5A6ywm1mK3v5YpTyie58a40KEiNHIAw0l3AxQAMCHLp+r7vwgnno3gJre/10PGuS2mdzVRiq2yN1VjINtgm22YN1mSCHbM/7IydeyfehXflXc9aF7ynmRX2j7zbRwQ8rDA=</latexit><latexit sha1_base64="WA17lfy4ZYeBqYSk9OqN0NfLY/8=">AAACLHicbZDPSxtBFMdnrW3j9leMRy+DoZBSGnaLYI+CFy9CBBMFNw1vJy/JkNnZZeZta1jyR/iPeO1J0P/Bi4h48+DZg0cniUIb+4WBz3zfezzeN86UtBQEl97Cq8XXb96Wlvx37z98/FRerrRsmhuBTZGq1BzEYFFJjU2SpPAgMwhJrHA/Hm5N6vu/0FiZ6j0aZdhOoK9lTwogZ3XKXyMyEnRfYTf9rTtFRAMk+FnU6Fv4ZTzmESr1/OuUq0E9mIq/hPAJqpuVm53Th3vb6JTvom4q8gQ1CQXWHoZBRu0CDEmhcOxHucUMxBD6eOhQQ4K2XUyPGvPPzunyXmrc08Sn7t8TBSTWjpLYdSZAAztfm5j/rfUNZAMpjub2U+9Hu5A6ywm1mK3v5YpTyie58a40KEiNHIAw0l3AxQAMCHLp+r7vwgnno3gJre/10PGuS2mdzVRiq2yN1VjINtgm22YN1mSCHbM/7IydeyfehXflXc9aF7ynmRX2j7zbRwQ8rDA=</latexit><latexit sha1_base64="geybXVntoJ94RQE78X1cic/Gxf4=">AAACLHicbZDPSsNAEMY3/jf+q3r0slgERZREBD0WvHhUsCqYWibbabu42YTdiVpCH8IX8epV38GLiFcPPoXbWkGrHyz89psZhvniTElLQfDijYyOjU9MTk37M7Nz8wulxaVTm+ZGYFWkKjXnMVhUUmOVJCk8zwxCEis8i68OevWzazRWpvqEOhnWEmhp2ZQCyFn10mZERoJuKWykN7peRNRGgstinbbCjW6XR6jU969eKgfbQV/8L4QDKLOBjuqlj6iRijxBTUKBtRdhkFGtAENSKOz6UW4xA3EFLbxwqCFBWyv6R3X5mnMavJka9zTxvvtzooDE2k4Su84EqG2Haz3z31rLQNaW4nZoPzX3a4XUWU6oxdf6Zq44pbyXG29Ig4JUxwEII90FXLTBgCCXru/7LpxwOIq/cLqzHTo+DsqV3UFMU2yFrbJ1FrI9VmGH7IhVmWB37IE9sifv3nv2Xr23r9YRbzCzzH7Je/8EZ2unlg==</latexit>

`(t−1)

<latexit sha1_base64="W6ihsxi3/kHh+E55HOU8erDMllY=">AAACC3icbZDLSsNAFIZPvNZ4q7p0EyxCXVgSEXRnwY3LCvYCTSyT6aQdOpkMMxOxhD6CW6Er+w6uFLc+hK/gUzi9LLT1h4GP/5zDOfOHglGlXffLWlpeWV1bz23Ym1vbO7v5vf2aSlKJSRUnLJGNECnCKCdVTTUjDSEJikNG6mHvelyvPxCpaMLvdF+QIEYdTiOKkTaW7xPG7rOiPvVOBq18wS25EzmL4M2gcPU2HI4AoNLKf/vtBKcx4RozpFTTc4UOMiQ1xYwMbD9VRCDcQx3SNMhRTFSQTW4eOMfGaTtRIs3j2pm4vycyFCvVj0PTGSPdVfO1sflvrSOR6FL8OLdfR5dBRrlINeF4uj5KmaMTZxyL06aSYM36BhCW1PzAwV0kEdYmPNu2TTjefBSLUDsreYZv3UL5HKbKwSEcQRE8uIAy3EAFqoBBwDO8wMh6sl6td+tj2rpkzWYO4I+szx8oNZ0t</latexit><latexit sha1_base64="XEfYaGis80TgpD023ksJHwPD+NI=">AAACC3icbZDLSsNAFIYn9VbjrerSTbAIdWFJRNCdBTcuK9gLNLFMptN26GQyzJyIJfQR3Aq60Xdwpbj1IXwFn8JJ24W2/jDw8Z9zOGf+UHKmwXW/rNzC4tLySn7VXlvf2NwqbO/UdZwoQmsk5rFqhlhTzgStAQNOm1JRHIWcNsLBRVZv3FKlWSyuYShpEOGeYF1GMBjL9ynnN2kJjrzDUbtQdMvuWM48eFMonr89ZnqqtgvfficmSUQFEI61bnmuhCDFChjhdGT7iaYSkwHu0ZZBgSOqg3R888g5ME7H6cbKPAHO2P09keJI62EUms4IQ1/P1jLz31pPYdln5G5mP3TPgpQJmQAVZLK+m3AHYieLxekwRQnwoQFMFDM/cEgfK0zAhGfbtgnHm41iHurHZc/wlVusnKCJ8mgP7aMS8tApqqBLVEU1RJBED+gZvVj31qv1bn1MWnPWdGYX/ZH1+QOv6p7y</latexit><latexit sha1_base64="XEfYaGis80TgpD023ksJHwPD+NI=">AAACC3icbZDLSsNAFIYn9VbjrerSTbAIdWFJRNCdBTcuK9gLNLFMptN26GQyzJyIJfQR3Aq60Xdwpbj1IXwFn8JJ24W2/jDw8Z9zOGf+UHKmwXW/rNzC4tLySn7VXlvf2NwqbO/UdZwoQmsk5rFqhlhTzgStAQNOm1JRHIWcNsLBRVZv3FKlWSyuYShpEOGeYF1GMBjL9ynnN2kJjrzDUbtQdMvuWM48eFMonr89ZnqqtgvfficmSUQFEI61bnmuhCDFChjhdGT7iaYSkwHu0ZZBgSOqg3R888g5ME7H6cbKPAHO2P09keJI62EUms4IQ1/P1jLz31pPYdln5G5mP3TPgpQJmQAVZLK+m3AHYieLxekwRQnwoQFMFDM/cEgfK0zAhGfbtgnHm41iHurHZc/wlVusnKCJ8mgP7aMS8tApqqBLVEU1RJBED+gZvVj31qv1bn1MWnPWdGYX/ZH1+QOv6p7y</latexit><latexit sha1_base64="vBTVXnTfr8Xtim14TjhxvGtsS+8=">AAACC3icbZDLSgMxFIYz9VbHW9Wlm2AR6sIyI4IuC25cVrAXaMeSSTNtaCYTkjNiGfoIbt3qO7gTtz6Er+BTmLaz0NYfAh//OYdz8odKcAOe9+UUVlbX1jeKm+7W9s7uXmn/oGmSVFPWoIlIdDskhgkuWQM4CNZWmpE4FKwVjq6n9dYD04Yn8g7GigUxGUgecUrAWt0uE+I+q8CZfzrplcpe1ZsJL4OfQxnlqvdK391+QtOYSaCCGNPxPQVBRjRwKtjE7aaGKUJHZMA6FiWJmQmy2c0TfGKdPo4SbZ8EPHN/T2QkNmYch7YzJjA0i7Wp+W9toIkacvq4sB+iqyDjUqXAJJ2vj1KBIcHTWHCfa0ZBjC0Qqrn9AaZDogkFG57rujYcfzGKZWieV33Lt165dpHHVERH6BhVkI8uUQ3doDpqIIoUekYv6NV5ct6cd+dj3lpw8plD9EfO5w8gp5pj</latexit>

θ(t−1)

<latexit sha1_base64="sAdGJJQs1BsKobsf8KtKOEtPpRA=">AAACHHicbVC7SgNBFJ2Nr7i+opZpFoMQC8NuFLQzYGMZwTwgiWF2dpIMmZ1dZu6KYdnCz9DC1lZLezuxFfwFv8LZJIUmHhjmcM693HuPG3KmwLa/jMzC4tLySnbVXFvf2NzKbe/UVRBJQmsk4IFsulhRzgStAQNOm6Gk2Hc5bbjD89Rv3FCpWCCuYBTSjo/7gvUYwaClbi7fdgPuqZGvv7gNAwo4uY6LcOgcJN1cwS7ZY1jzxJmSwtnrfYqHajf33fYCEvlUAOFYqZZjh9CJsQRGOE3MdqRoiMkQ92lLU4F9qjrx+IjE2teKZ/UCqZ8Aa6z+7oixr9I9daWPYaBmvVT81+tLHA4YuZ2ZD73TTsxEGAEVZDK+F3ELAivNyfKYpAT4SBNMJNMXWGSAJSag0zRNU4fjzEYxT+rlknNUKl/ahcoxmiCL8mgPFZGDTlAFXaAqqiGC7tATekYvxqPxZrwbH5PSjDHt2UV/YHz+ABujpiE=</latexit>

M1...Mi...MN<latexit sha1_base64="DZppVC5zQtoROwq9+wJfxnIUdQU=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN8EiuAqJFHRZcOOmUsFeoA1hMp20QyeTMDMRQqiv4saFIm59EHe+jdM0C209MMPH/5/DnPmDhFGpHOfbqGxsbm3vVHdre/sHh0fm8UlPxqnApItjFotBgCRhlJOuooqRQSIIigJG+sHsZuH3H4mQNOYPKkuIF6EJpyHFSGnJN+ttP3fntm23fVrc+d3cNxuO7RRlrYNbQgPK6vjm12gc4zQiXGGGpBy6TqK8HAlFMSPz2iiVJEF4hiZkqJGjiEgvL5afW+daGVthLPThyirU3xM5iqTMokB3RkhN5aq3EP/zhqkKr72c8iRVhOPlQ2HKLBVbiySsMRUEK5ZpQFhQvauFp0ggrHReNR2Cu/rldehd2q7m+2aj1SzjqMIpnMEFuHAFLbiFDnQBQwbP8ApvxpPxYrwbH8vWilHO1OFPGZ8/cgyTSw==</latexit><latexit sha1_base64="DZppVC5zQtoROwq9+wJfxnIUdQU=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN8EiuAqJFHRZcOOmUsFeoA1hMp20QyeTMDMRQqiv4saFIm59EHe+jdM0C209MMPH/5/DnPmDhFGpHOfbqGxsbm3vVHdre/sHh0fm8UlPxqnApItjFotBgCRhlJOuooqRQSIIigJG+sHsZuH3H4mQNOYPKkuIF6EJpyHFSGnJN+ttP3fntm23fVrc+d3cNxuO7RRlrYNbQgPK6vjm12gc4zQiXGGGpBy6TqK8HAlFMSPz2iiVJEF4hiZkqJGjiEgvL5afW+daGVthLPThyirU3xM5iqTMokB3RkhN5aq3EP/zhqkKr72c8iRVhOPlQ2HKLBVbiySsMRUEK5ZpQFhQvauFp0ggrHReNR2Cu/rldehd2q7m+2aj1SzjqMIpnMEFuHAFLbiFDnQBQwbP8ApvxpPxYrwbH8vWilHO1OFPGZ8/cgyTSw==</latexit><latexit sha1_base64="DZppVC5zQtoROwq9+wJfxnIUdQU=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN8EiuAqJFHRZcOOmUsFeoA1hMp20QyeTMDMRQqiv4saFIm59EHe+jdM0C209MMPH/5/DnPmDhFGpHOfbqGxsbm3vVHdre/sHh0fm8UlPxqnApItjFotBgCRhlJOuooqRQSIIigJG+sHsZuH3H4mQNOYPKkuIF6EJpyHFSGnJN+ttP3fntm23fVrc+d3cNxuO7RRlrYNbQgPK6vjm12gc4zQiXGGGpBy6TqK8HAlFMSPz2iiVJEF4hiZkqJGjiEgvL5afW+daGVthLPThyirU3xM5iqTMokB3RkhN5aq3EP/zhqkKr72c8iRVhOPlQ2HKLBVbiySsMRUEK5ZpQFhQvauFp0ggrHReNR2Cu/rldehd2q7m+2aj1SzjqMIpnMEFuHAFLbiFDnQBQwbP8ApvxpPxYrwbH8vWilHO1OFPGZ8/cgyTSw==</latexit><latexit sha1_base64="DZppVC5zQtoROwq9+wJfxnIUdQU=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN8EiuAqJFHRZcOOmUsFeoA1hMp20QyeTMDMRQqiv4saFIm59EHe+jdM0C209MMPH/5/DnPmDhFGpHOfbqGxsbm3vVHdre/sHh0fm8UlPxqnApItjFotBgCRhlJOuooqRQSIIigJG+sHsZuH3H4mQNOYPKkuIF6EJpyHFSGnJN+ttP3fntm23fVrc+d3cNxuO7RRlrYNbQgPK6vjm12gc4zQiXGGGpBy6TqK8HAlFMSPz2iiVJEF4hiZkqJGjiEgvL5afW+daGVthLPThyirU3xM5iqTMokB3RkhN5aq3EP/zhqkKr72c8iRVhOPlQ2HKLBVbiySsMRUEK5ZpQFhQvauFp0ggrHReNR2Cu/rldehd2q7m+2aj1SzjqMIpnMEFuHAFLbiFDnQBQwbP8ApvxpPxYrwbH8vWilHO1OFPGZ8/cgyTSw==</latexit>

M (t+δ)<latexit sha1_base64="tGcr8xJUG1Qso/jg/iNtuP0HC3E=">AAAB9XicbZDLSgNBEEVr4ivGV9Slm8YgRIQwIwFdBty4ESKYByST0NPTSZr0POiuUcKQ/3DjQhG3/os7/8ZOMgtNvNBwuFVFVV8vlkKjbX9bubX1jc2t/HZhZ3dv/6B4eNTUUaIYb7BIRqrtUc2lCHkDBUrejhWngSd5yxvfzOqtR660iMIHnMTcDegwFAPBKBqrd9dLy3jR9blEej7tF0t2xZ6LrIKTQQky1fvFr64fsSTgITJJte44doxuShUKJvm00E00jykb0yHvGAxpwLWbzq+ekjPj+GQQKfNCJHP390RKA60ngWc6A4ojvVybmf/VOgkOrt1UhHGCPGSLRYNEEozILALiC8UZyokBypQwtxI2oooyNEEVTAjO8pdXoXlZcQzfV0u1ahZHHk7gFMrgwBXU4Bbq0AAGCp7hFd6sJ+vFerc+Fq05K5s5hj+yPn8AtaKR8Q==</latexit><latexit sha1_base64="tGcr8xJUG1Qso/jg/iNtuP0HC3E=">AAAB9XicbZDLSgNBEEVr4ivGV9Slm8YgRIQwIwFdBty4ESKYByST0NPTSZr0POiuUcKQ/3DjQhG3/os7/8ZOMgtNvNBwuFVFVV8vlkKjbX9bubX1jc2t/HZhZ3dv/6B4eNTUUaIYb7BIRqrtUc2lCHkDBUrejhWngSd5yxvfzOqtR660iMIHnMTcDegwFAPBKBqrd9dLy3jR9blEej7tF0t2xZ6LrIKTQQky1fvFr64fsSTgITJJte44doxuShUKJvm00E00jykb0yHvGAxpwLWbzq+ekjPj+GQQKfNCJHP390RKA60ngWc6A4ojvVybmf/VOgkOrt1UhHGCPGSLRYNEEozILALiC8UZyokBypQwtxI2oooyNEEVTAjO8pdXoXlZcQzfV0u1ahZHHk7gFMrgwBXU4Bbq0AAGCp7hFd6sJ+vFerc+Fq05K5s5hj+yPn8AtaKR8Q==</latexit><latexit sha1_base64="tGcr8xJUG1Qso/jg/iNtuP0HC3E=">AAAB9XicbZDLSgNBEEVr4ivGV9Slm8YgRIQwIwFdBty4ESKYByST0NPTSZr0POiuUcKQ/3DjQhG3/os7/8ZOMgtNvNBwuFVFVV8vlkKjbX9bubX1jc2t/HZhZ3dv/6B4eNTUUaIYb7BIRqrtUc2lCHkDBUrejhWngSd5yxvfzOqtR660iMIHnMTcDegwFAPBKBqrd9dLy3jR9blEej7tF0t2xZ6LrIKTQQky1fvFr64fsSTgITJJte44doxuShUKJvm00E00jykb0yHvGAxpwLWbzq+ekjPj+GQQKfNCJHP390RKA60ngWc6A4ojvVybmf/VOgkOrt1UhHGCPGSLRYNEEozILALiC8UZyokBypQwtxI2oooyNEEVTAjO8pdXoXlZcQzfV0u1ahZHHk7gFMrgwBXU4Bbq0AAGCp7hFd6sJ+vFerc+Fq05K5s5hj+yPn8AtaKR8Q==</latexit><latexit sha1_base64="tGcr8xJUG1Qso/jg/iNtuP0HC3E=">AAAB9XicbZDLSgNBEEVr4ivGV9Slm8YgRIQwIwFdBty4ESKYByST0NPTSZr0POiuUcKQ/3DjQhG3/os7/8ZOMgtNvNBwuFVFVV8vlkKjbX9bubX1jc2t/HZhZ3dv/6B4eNTUUaIYb7BIRqrtUc2lCHkDBUrejhWngSd5yxvfzOqtR660iMIHnMTcDegwFAPBKBqrd9dLy3jR9blEej7tF0t2xZ6LrIKTQQky1fvFr64fsSTgITJJte44doxuShUKJvm00E00jykb0yHvGAxpwLWbzq+ekjPj+GQQKfNCJHP390RKA60ngWc6A4ojvVybmf/VOgkOrt1UhHGCPGSLRYNEEozILALiC8UZyokBypQwtxI2oooyNEEVTAjO8pdXoXlZcQzfV0u1ahZHHk7gFMrgwBXU4Bbq0AAGCp7hFd6sJ+vFerc+Fq05K5s5hj+yPn8AtaKR8Q==</latexit>

L(t)<latexit sha1_base64="7xIXRlKgSuNEN3vDuU08k6NdWX0=">AAACEnicbZC7SgNBFIbPeo3rbaOlzWAQYhN2RdDOgI2FRQRzgSSG2clsMmT2wsysGpZ9C5sUgpU+g3Zi6wv4Cj6Fs0kKTfxh4OM/53DO/G7EmVS2/WUsLC4tr6zm1sz1jc2tbSu/U5NhLAitkpCHouFiSTkLaFUxxWkjEhT7Lqd1d3Ce1eu3VEgWBtdqGNG2j3sB8xjBSlsdK9/yseoTzJPL9CYpqsO0YxXskj0WmgdnCoWz19HoCQAqHeu71Q1J7NNAEY6lbDp2pNoJFooRTlOzFUsaYTLAPdrUGGCfynYyPj1FB9rpIi8U+gUKjd3fEwn2pRz6ru7MDpWztcz8t9YTOOozcj+zX3mn7YQFUaxoQCbrvZgjFaIsHdRlghLFhxowEUz/AJE+FpgonaFpmjocZzaKeagdlRzNV3ahfAwT5WAP9qEIDpxAGS6gAlUgcAeP8AwvxoPxZrwbH5PWBWM6swt/ZHz+AIWeoBM=</latexit><latexit sha1_base64="R+JHoL0mMOlAjAaH85iEPv3OjzM=">AAACEnicbZC7SgNBFIZn4y2ut42WNoNBiE3YFUE7AzYWFhHMBZI1nJ3MJkNmL8zMqmHJW9go2OozaCe2voCv4FM4m6TQxB8GPv5zDufM78WcSWXbX0ZuYXFpeSW/aq6tb2xuWYXtuowSQWiNRDwSTQ8k5SykNcUUp81YUAg8Thve4CyrN26okCwKr9Qwpm4AvZD5jIDSVscqtANQfQI8vRhdpyV1MOpYRbtsj4XnwZlC8fT1IdNjtWN9t7sRSQIaKsJBypZjx8pNQShGOB2Z7UTSGMgAerSlMYSASjcdnz7C+9rpYj8S+oUKj93fEykEUg4DT3dmh8rZWmb+W+sJiPuM3M3sV/6Jm7IwThQNyWS9n3CsIpylg7tMUKL4UAMQwfQPMOmDAKJ0hqZp6nCc2SjmoX5YdjRf2sXKEZooj3bRHiohBx2jCjpHVVRDBN2iJ/SMXox74814Nz4mrTljOrOD/sj4/AENYqHY</latexit><latexit sha1_base64="R+JHoL0mMOlAjAaH85iEPv3OjzM=">AAACEnicbZC7SgNBFIZn4y2ut42WNoNBiE3YFUE7AzYWFhHMBZI1nJ3MJkNmL8zMqmHJW9go2OozaCe2voCv4FM4m6TQxB8GPv5zDufM78WcSWXbX0ZuYXFpeSW/aq6tb2xuWYXtuowSQWiNRDwSTQ8k5SykNcUUp81YUAg8Thve4CyrN26okCwKr9Qwpm4AvZD5jIDSVscqtANQfQI8vRhdpyV1MOpYRbtsj4XnwZlC8fT1IdNjtWN9t7sRSQIaKsJBypZjx8pNQShGOB2Z7UTSGMgAerSlMYSASjcdnz7C+9rpYj8S+oUKj93fEykEUg4DT3dmh8rZWmb+W+sJiPuM3M3sV/6Jm7IwThQNyWS9n3CsIpylg7tMUKL4UAMQwfQPMOmDAKJ0hqZp6nCc2SjmoX5YdjRf2sXKEZooj3bRHiohBx2jCjpHVVRDBN2iJ/SMXox74814Nz4mrTljOrOD/sj4/AENYqHY</latexit><latexit sha1_base64="OmSWiXjHeNBLDFDcWHcEzdgnuAs=">AAACEnicbZDLSsNAFIYn9VbjLdWlm2AR6qYkIuiy4MaFiwr2Am0sk+mkHTqZhJkTtYS8hVu3+g7uxK0v4Cv4FE7aLLT1h4GP/5zDOfP7MWcKHOfLKK2srq1vlDfNre2d3T2rst9WUSIJbZGIR7LrY0U5E7QFDDjtxpLi0Oe0408u83rnnkrFInEL05h6IR4JFjCCQVsDq9IPMYwJ5ul1dpfW4CQbWFWn7sxkL4NbQBUVag6s7/4wIklIBRCOleq5TgxeiiUwwmlm9hNFY0wmeER7GgUOqfLS2emZfaydoR1EUj8B9sz9PZHiUKlp6OvO/FC1WMvNf2sjieMxI48L+yG48FIm4gSoIPP1QcJtiOw8HXvIJCXApxowkUz/wCZjLDEBnaFpmjocdzGKZWif1l3NN061cVbEVEaH6AjVkIvOUQNdoSZqIYIe0DN6Qa/Gk/FmvBsf89aSUcwcoD8yPn8AfhCdSQ==</latexit>

O<latexit sha1_base64="Y4wicZWudnH8+W1Zp2SabcuIaF4=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIp6LLgxp0V7APaUCbTSTt0MhNmboQS+hluXCji1q9x5984abPQ1gMDh3PuZc49YSK4Qc/7dkobm1vbO+Xdyt7+weFR9fikY1SqKWtTJZTuhcQwwSVrI0fBeolmJA4F64bT29zvPjFtuJKPOEtYEJOx5BGnBK3UH8QEJ5SI7H4+rNa8ureAu078gtSgQGtY/RqMFE1jJpEKYkzf9xIMMqKRU8HmlUFqWELolIxZ31JJYmaCbBF57l5YZeRGStsn0V2ovzcyEhszi0M7mUc0q14u/uf1U4xugozLJEUm6fKjKBUuKje/3x1xzSiKmSWEam6zunRCNKFoW6rYEvzVk9dJ56ruW/7QqDUbRR1lOINzuAQfrqEJd9CCNlBQ8Ayv8Oag8+K8Ox/L0ZJT7JzCHzifP4FgkVc=</latexit><latexit sha1_base64="Y4wicZWudnH8+W1Zp2SabcuIaF4=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIp6LLgxp0V7APaUCbTSTt0MhNmboQS+hluXCji1q9x5984abPQ1gMDh3PuZc49YSK4Qc/7dkobm1vbO+Xdyt7+weFR9fikY1SqKWtTJZTuhcQwwSVrI0fBeolmJA4F64bT29zvPjFtuJKPOEtYEJOx5BGnBK3UH8QEJ5SI7H4+rNa8ureAu078gtSgQGtY/RqMFE1jJpEKYkzf9xIMMqKRU8HmlUFqWELolIxZ31JJYmaCbBF57l5YZeRGStsn0V2ovzcyEhszi0M7mUc0q14u/uf1U4xugozLJEUm6fKjKBUuKje/3x1xzSiKmSWEam6zunRCNKFoW6rYEvzVk9dJ56ruW/7QqDUbRR1lOINzuAQfrqEJd9CCNlBQ8Ayv8Oag8+K8Ox/L0ZJT7JzCHzifP4FgkVc=</latexit><latexit sha1_base64="Y4wicZWudnH8+W1Zp2SabcuIaF4=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIp6LLgxp0V7APaUCbTSTt0MhNmboQS+hluXCji1q9x5984abPQ1gMDh3PuZc49YSK4Qc/7dkobm1vbO+Xdyt7+weFR9fikY1SqKWtTJZTuhcQwwSVrI0fBeolmJA4F64bT29zvPjFtuJKPOEtYEJOx5BGnBK3UH8QEJ5SI7H4+rNa8ureAu078gtSgQGtY/RqMFE1jJpEKYkzf9xIMMqKRU8HmlUFqWELolIxZ31JJYmaCbBF57l5YZeRGStsn0V2ovzcyEhszi0M7mUc0q14u/uf1U4xugozLJEUm6fKjKBUuKje/3x1xzSiKmSWEam6zunRCNKFoW6rYEvzVk9dJ56ruW/7QqDUbRR1lOINzuAQfrqEJd9CCNlBQ8Ayv8Oag8+K8Ox/L0ZJT7JzCHzifP4FgkVc=</latexit><latexit sha1_base64="Y4wicZWudnH8+W1Zp2SabcuIaF4=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIp6LLgxp0V7APaUCbTSTt0MhNmboQS+hluXCji1q9x5984abPQ1gMDh3PuZc49YSK4Qc/7dkobm1vbO+Xdyt7+weFR9fikY1SqKWtTJZTuhcQwwSVrI0fBeolmJA4F64bT29zvPjFtuJKPOEtYEJOx5BGnBK3UH8QEJ5SI7H4+rNa8ureAu078gtSgQGtY/RqMFE1jJpEKYkzf9xIMMqKRU8HmlUFqWELolIxZ31JJYmaCbBF57l5YZeRGStsn0V2ovzcyEhszi0M7mUc0q14u/uf1U4xugozLJEUm6fKjKBUuKje/3x1xzSiKmSWEam6zunRCNKFoW6rYEvzVk9dJ56ruW/7QqDUbRR1lOINzuAQfrqEJd9CCNlBQ8Ayv8Oag8+K8Ox/L0ZJT7JzCHzifP4FgkVc=</latexit> -

I(t−1)<latexit sha1_base64="L8gB/Gk9vOxvupRKf5MaugIo/pk=">AAAB/HicbVDLSsNAFL2pr1pf0S7dDBahLiyJFHRZcKO7CvYBbSyT6aQdOnkwMxFCiL/ixoUibv0Qd/6NkzYLbT0wcDjnXu6Z40acSWVZ30ZpbX1jc6u8XdnZ3ds/MA+PujKMBaEdEvJQ9F0sKWcB7SimOO1HgmLf5bTnzq5zv/dIhWRhcK+SiDo+ngTMYwQrLY3M6tDHakowT2+zh7Suzu2zbGTWrIY1B1oldkFqUKA9Mr+G45DEPg0U4VjKgW1FykmxUIxwmlWGsaQRJjM8oQNNA+xT6aTz8Bk61coYeaHQL1Borv7eSLEvZeK7ejKPKpe9XPzPG8TKu3JSFkSxogFZHPJijlSI8ibQmAlKFE80wUQwnRWRKRaYKN1XRZdgL395lXQvGrbmd81aq1nUUYZjOIE62HAJLbiBNnSAQALP8ApvxpPxYrwbH4vRklHsVOEPjM8fBP+USw==</latexit><latexit sha1_base64="L8gB/Gk9vOxvupRKf5MaugIo/pk=">AAAB/HicbVDLSsNAFL2pr1pf0S7dDBahLiyJFHRZcKO7CvYBbSyT6aQdOnkwMxFCiL/ixoUibv0Qd/6NkzYLbT0wcDjnXu6Z40acSWVZ30ZpbX1jc6u8XdnZ3ds/MA+PujKMBaEdEvJQ9F0sKWcB7SimOO1HgmLf5bTnzq5zv/dIhWRhcK+SiDo+ngTMYwQrLY3M6tDHakowT2+zh7Suzu2zbGTWrIY1B1oldkFqUKA9Mr+G45DEPg0U4VjKgW1FykmxUIxwmlWGsaQRJjM8oQNNA+xT6aTz8Bk61coYeaHQL1Borv7eSLEvZeK7ejKPKpe9XPzPG8TKu3JSFkSxogFZHPJijlSI8ibQmAlKFE80wUQwnRWRKRaYKN1XRZdgL395lXQvGrbmd81aq1nUUYZjOIE62HAJLbiBNnSAQALP8ApvxpPxYrwbH4vRklHsVOEPjM8fBP+USw==</latexit><latexit sha1_base64="L8gB/Gk9vOxvupRKf5MaugIo/pk=">AAAB/HicbVDLSsNAFL2pr1pf0S7dDBahLiyJFHRZcKO7CvYBbSyT6aQdOnkwMxFCiL/ixoUibv0Qd/6NkzYLbT0wcDjnXu6Z40acSWVZ30ZpbX1jc6u8XdnZ3ds/MA+PujKMBaEdEvJQ9F0sKWcB7SimOO1HgmLf5bTnzq5zv/dIhWRhcK+SiDo+ngTMYwQrLY3M6tDHakowT2+zh7Suzu2zbGTWrIY1B1oldkFqUKA9Mr+G45DEPg0U4VjKgW1FykmxUIxwmlWGsaQRJjM8oQNNA+xT6aTz8Bk61coYeaHQL1Borv7eSLEvZeK7ejKPKpe9XPzPG8TKu3JSFkSxogFZHPJijlSI8ibQmAlKFE80wUQwnRWRKRaYKN1XRZdgL395lXQvGrbmd81aq1nUUYZjOIE62HAJLbiBNnSAQALP8ApvxpPxYrwbH4vRklHsVOEPjM8fBP+USw==</latexit><latexit sha1_base64="L8gB/Gk9vOxvupRKf5MaugIo/pk=">AAAB/HicbVDLSsNAFL2pr1pf0S7dDBahLiyJFHRZcKO7CvYBbSyT6aQdOnkwMxFCiL/ixoUibv0Qd/6NkzYLbT0wcDjnXu6Z40acSWVZ30ZpbX1jc6u8XdnZ3ds/MA+PujKMBaEdEvJQ9F0sKWcB7SimOO1HgmLf5bTnzq5zv/dIhWRhcK+SiDo+ngTMYwQrLY3M6tDHakowT2+zh7Suzu2zbGTWrIY1B1oldkFqUKA9Mr+G45DEPg0U4VjKgW1FykmxUIxwmlWGsaQRJjM8oQNNA+xT6aTz8Bk61coYeaHQL1Borv7eSLEvZeK7ejKPKpe9XPzPG8TKu3JSFkSxogFZHPJijlSI8ibQmAlKFE80wUQwnRWRKRaYKN1XRZdgL395lXQvGrbmd81aq1nUUYZjOIE62HAJLbiBNnSAQALP8ApvxpPxYrwbH4vRklHsVOEPjM8fBP+USw==</latexit>

θ(t−1)

cf<latexit sha1_base64="iGk2FQX2DP+xkByTneTfX/Rg/WY=">AAACEXicbVDLSsNAFJ34rPVVdekmWIS6sCS1oMuCG5cV7AOaWiaTm3bo5MHMjVBCfsGNv+LGhSJu3bnzb5y0XWjrgWEO59zLvfe4seAKLevbWFldW9/YLGwVt3d29/ZLB4dtFSWSQYtFIpJdlyoQPIQWchTQjSXQwBXQccfXud95AKl4FN7hJIZ+QIch9zmjqKVBqeIgFx6kjhsJT00C/aUOjgBplt2nFTy3z7JByvxsUCpbVWsKc5nYc1ImczQHpS/Hi1gSQIhMUKV6thVjP6USOROQFZ1EQUzZmA6hp2lIA1D9dHpRZp5qxTP9SOoXojlVf3ekNFD5troyoDhSi14u/uf1EvSv+ikP4wQhZLNBfiJMjMw8HtPjEhiKiSaUSa53NdmISspQh1jUIdiLJy+Tdq1qX1Rrt/Vyoz6Po0COyQmpEJtckga5IU3SIow8kmfySt6MJ+PFeDc+ZqUrxrzniPyB8fkDx6WeLQ==</latexit>

θ(t−1)

reg<latexit sha1_base64="auZkbbStlDiT/uc/2LdiR3fNzfk=">AAACEnicbVC7SgNBFJ2Nrxhfq5Y2i0FICsNuDGgZsLGMYB6QjWF29iYZMvtg5q4Qlv0GG3/FxkIRWys7/8bJo9DEA8MczrmXe+/xYsEV2va3kVtb39jcym8Xdnb39g/Mw6OWihLJoMkiEcmORxUIHkITOQroxBJo4Aloe+Prqd9+AKl4FN7hJIZeQIchH3BGUUt9s+wiFz6krhcJX00C/aUujgBplt2nJTx3ylk/lTDM+mbRrtgzWKvEWZAiWaDRN79cP2JJACEyQZXqOnaMvZRK5ExAVnATBTFlYzqErqYhDUD10tlJmXWmFd8aRFK/EK2Z+rsjpYGarqsrA4ojtexNxf+8boKDq17KwzhBCNl80CARFkbWNB/L5xIYiokmlEmud7XYiErKUKdY0CE4yyevkla14lxUqre1Yr22iCNPTsgpKRGHXJI6uSEN0iSMPJJn8krejCfjxXg3PualOWPRc0z+wPj8Aa5qnqw=</latexit>

θ(t)

cf<latexit sha1_base64="yAmHVJ8MR8yI9uRz1m1aXRF90vc=">AAACD3icbVDLSsNAFJ34rPVVdekmWJS6KUkt6LLgxmUF+4Cmlsnkph06eTBzI5SQP3Djr7hxoYhbt+78GydtF9p6YJjDOfdy7z1uLLhCy/o2VlbX1jc2C1vF7Z3dvf3SwWFbRYlk0GKRiGTXpQoED6GFHAV0Ywk0cAV03PF17nceQCoehXc4iaEf0GHIfc4oamlQOnOQCw9Sx42EpyaB/lIHR4A0y+7TCp5ng5T52aBUtqrWFOYyseekTOZoDkpfjhexJIAQmaBK9Wwrxn5KJXImICs6iYKYsjEdQk/TkAag+un0nsw81Ypn+pHUL0Rzqv7uSGmg8l11ZUBxpBa9XPzP6yXoX/VTHsYJQshmg/xEmBiZeTimxyUwFBNNKJNc72qyEZWUoY6wqEOwF09eJu1a1b6o1m7r5UZ9HkeBHJMTUiE2uSQNckOapEUYeSTP5JW8GU/Gi/FufMxKV4x5zxH5A+PzB9ITnbs=</latexit>

θ(t)

reg<latexit sha1_base64="7PyvvU6/7dmZWCqBgep8/7HuBgI=">AAACEHicbVDLSsNAFJ34rPVVdekmWMS6KUkt6LLgxmUF+4Amlsnkth06eTBzI5SQT3Djr7hxoYhbl+78G6dtFtp6YJjDOfdy7z1eLLhCy/o2VlbX1jc2C1vF7Z3dvf3SwWFbRYlk0GKRiGTXowoED6GFHAV0Ywk08AR0vPH11O88gFQ8Cu9wEoMb0GHIB5xR1FK/dOYgFz6kjhcJX00C/aUOjgBplt2nFTzP+qmEYdYvla2qNYO5TOyclEmOZr/05fgRSwIIkQmqVM+2YnRTKpEzAVnRSRTElI3pEHqahjQA5aazgzLzVCu+OYikfiGaM/V3R0oDNV1WVwYUR2rRm4r/eb0EB1duysM4QQjZfNAgESZG5jQd0+cSGIqJJpRJrnc12YhKylBnWNQh2IsnL5N2rWpfVGu39XKjnsdRIMfkhFSITS5Jg9yQJmkRRh7JM3klb8aT8WK8Gx/z0hUj7zkif2B8/gC4dp46</latexit>

θ(t)

<latexit sha1_base64="nakvMEc7d7eXvAzbtTwA54NYxrg=">AAACAnicbVDLSsNAFJ34rPUVdSVuBotQNyWpBV0W3LisYB/QxjKZTNqhkwczN0IJwY2/4saFIm79Cnf+jZM2C209MMzhnHu59x43FlyBZX0bK6tr6xubpa3y9s7u3r55cNhRUSIpa9NIRLLnEsUED1kbOAjWiyUjgStY151c5373gUnFo/AOpjFzAjIKuc8pAS0NzeOBGwlPTQP9pQMYMyDZfVqF82xoVqyaNQNeJnZBKqhAa2h+DbyIJgELgQqiVN+2YnBSIoFTwbLyIFEsJnRCRqyvaUgCppx0dkKGz7TiYT+S+oWAZ+rvjpQEKt9SVwYExmrRy8X/vH4C/pWT8jBOgIV0PshPBIYI53lgj0tGQUw1IVRyvSumYyIJBZ1aWYdgL568TDr1mn1Rq982Ks1GEUcJnaBTVEU2ukRNdINaqI0oekTP6BW9GU/Gi/FufMxLV4yi5wj9gfH5A+d+l7M=</latexit>

B1...Bi...BN<latexit sha1_base64="hsoxTdlhufDp6P+GOGImM1f2m3c=">AAAB/HicbVDLSsNAFJ3UV62vaJduBovgKiS1oMtSN66kgn1AG8JkOmmHTiZhZiKEEH/FjQtF3Poh7vwbp2kW2nrgXg7n3MvcOX7MqFS2/W1UNja3tnequ7W9/YPDI/P4pC+jRGDSwxGLxNBHkjDKSU9RxcgwFgSFPiMDf36z8AePREga8QeVxsQN0ZTTgGKktOSZ9Y6XObllWR2PFj27yz2zYVt2AbhOnJI0QImuZ36NJxFOQsIVZkjKkWPHys2QUBQzktfGiSQxwnM0JSNNOQqJdLPi+Byea2UCg0jo4goW6u+NDIVSpqGvJ0OkZnLVW4j/eaNEBdduRnmcKMLx8qEgYVBFcJEEnFBBsGKpJggLqm+FeIYEwkrnVdMhOKtfXif9puVcWs37VqPdKuOoglNwBi6AA65AG9yCLugBDFLwDF7Bm/FkvBjvxsdytGKUO3XwB8bnDz/0ky4=</latexit>

B(t+δ)<latexit sha1_base64="r7SiNohxYD8aMpeFDZsmONt/qn0=">AAAB9XicbVBNS8NAEN3Ur1q/qh69LBahIpSkFvRY9OKxgv2ANi2bzaZdutmE3YlSQv+HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ58WCa7Dtbyu3tr6xuZXfLuzs7u0fFA+PWjpKFGVNGolIdTyimeCSNYGDYJ1YMRJ6grW98e3Mbz8ypXkkH2ASMzckQ8kDTgkYqX/TT8tw0fOZAHI+HRRLdsWeA68SJyMllKExKH71/IgmIZNABdG669gxuClRwKlg00Iv0SwmdEyGrGuoJCHTbjq/eorPjOLjIFKmJOC5+nsiJaHWk9AznSGBkV72ZuJ/XjeB4NpNuYwTYJIuFgWJwBDhWQTY54pREBNDCFXc3IrpiChCwQRVMCE4yy+vkla14lxWqve1Ur2WxZFHJ+gUlZGDrlAd3aEGaiKKFHpGr+jNerJerHfrY9Gas7KZY/QH1ucPpbOR6g==</latexit>

Figure 1: Pipeline of ROAM++. Given a mini-batch of training patches, which are cropped based on the predicted object boxes, deep features are extracted

by Feature Extractor. The fixed-size Tracking Model θ(t−1) is warped to the current target size yielding the warped tracking model θ(t−1), as in (2, 3). The

response map and bounding boxes are then predicted for each sample using θ(t−1), from which the update loss ℓ(t−1) and its gradient ▽θ(t−1)ℓ

(t−1) are

computed using ground truth labels. Next, the element-wise stack I(t−1) consisting of previous learning rates, current parameters, current update loss and

its gradient are input into a coordinate-wise LSTM O to generate the adaptive learning rate λ(t−1) as in (11). The model is then updated using one gradient

descent step (denoted by ⊖) as in (9). Finally, we apply the updated model θ(t) on a randomly selected future frame to get a meta loss for minimization as

in (13).

pect ratio of object does not change during tracking, which

is however often violated. Therefore, dynamically adapting

the convolutional filter to the object shape variations is de-

sirable, which means the number of filter parameters may

vary between frames in the video and among different se-

quences. However, this complicates the design of neural

optimizer when using separate learning rates for each filter

parameter. To simplify the meta-learning framework and

to better allow for per-parameter learning rates, we define

fixed-shape convolutional filters, which are warped to the

desired target size using bilinear interpolation before con-

volving with the feature map. In subsequent frames, the

recurrent optimizer updates the fixed-shape tracking model.

Note that MetaTracker [34] also resizes filters to the object

size for model initialization, However, instead of dynami-

cally adapting the convolutional filters to the object size for

subsequent frames, MetaTracker keep the same shape as the

initial filter during the following frames

Specifically, tracking model θ contains two parts, i.e.

correlation filter θcf and bounding box regression filter

θreg . They are both warped to adapt to the shape variation

of target,

θ = [θcf ,θreg], (1)

θcf =W(θcf , φ), (2)

θreg =W(θreg, φ), (3)

whereW resizes the convolutional filter to size φ = (fr, fc)using bilinear interpolation. The filter size is computed

from the width and height (w, h) of the object in the pre-

vious image patch (and for symmetry the filter size is odd),

fr = ⌈ρhc⌉ − ⌈ρh

c⌉mod 2 + 1, (4)

fc = ⌈ρw

c⌉ − ⌈ρw

c⌉mod 2 + 1 (5)

where ρ is the scale factor to enlarge the filter size to cover

some context information, and c is the stride of feature map,

and ⌈ ⌉means ceiling. Because of the resizable filters, there

is no need to enumerate different aspect ratios and scales

of anchor boxes when performing bounding box regres-

sion. We only use one sized anchor on each spatial location

whose size is corresponding to the shape of regression filter,

(aw, ah) = (fc, fr)/ρ, (6)

This saves regression filter parameters and achieves faster

speed. Note that we update the filter size and its correspond-

ing anchor box every τ frames, i.e. just before every model

updating, during both offline training and testing/tracking

phases. Through this modification, we are able to initial-

ize the tracking model with θ(0) and recurrently optimize

it in subsequent frames without worrying about the shape

changes of the tracked object.

6720

3.2. Recurrent Model Optimization

Traditional optimization methods have the problem of

slow convergence due to small learning rates and limited

training samples, while simply increasing the learning rate

has the risk of the training loss wandering wildly. Instead,

we design a recurrent neural optimizer, which is trained to

converge the model to a good solution in a few gradient

steps1, to update the tracking model. Our key idea is based

on the assumption that the best optimizer should be able to

update the model to generalize well on future frames. Dur-

ing the offline training stage, we perform a one step gradient

update on the tracking model using our recurrent neural op-

timizer and then minimize its loss on future frames. Once

the offline learning stage is finished, we use this learned

neural optimizer to recurrently update the tracking model

to adapt to appearance variations. The optimizer trained in

this way will be able to quickly converge the model update

to generalize well for future frame tracking.

We denote the response generation network as

G(F ;θcf , φ) and the bounding box regression net-

work as R(F ;θreg, φ), where F is the feature map input

and θ are the parameters. The tracking loss consists of two

parts: response loss and regression loss,

L(F,M,B;θ, φ) = ‖G(F ;θcf , φ)−M‖2+‖R(F ;θreg, φ)−B‖s (7)

where the first term is the L2 loss and the second term is

the smooth L1 loss [37], and B represents the ground truth

box. Note we adopt parameterization of bounding box coor-

dinates as in [37]. M is the corresponding label map which

is built using a 2D Gaussian function and the ground-truth

object location (x0, y0) and size (w, h),

M(x, y) = exp(

−α(

(x−x0)2

σ2x

+ (y−y0)2

σ2y

))

(8)

where (σx, σy) = (w/c, h/c), and α controls the shape

of the response map. Note that we use past predictions as

pseudo-labels when performing model updating during test-

ing. We only use the ground truth during offline training.

A typical tracking process updates the tracking model

using historical training examples and then tests this up-

dated model on the following frames until the next update.

We simulate this scenario in a meta-learning paradigm by

recurrently optimizing the tracking model, and then testing

it on a future frame. Specifically, the tracking network is

updated by

θ(t) = θ

(t−1) − λ(t−1) ⊙ ▽

θ(t−1)ℓ(t−1), (9)

where λ(t−1) is a fully element-wise learning rate that

has the same dimension as the tracking model parameters

1We only use one gradient step in our experiment, while considering

multiple steps is straightforward.

θ(t−1), and ⊙ is element-wise multiplication. The learn-

ing rate is recurrently generated using an LSTM with input

consisting of the previous learning rate λ(t−2), the current

parameters θ(t−1), the current update loss ℓ(t−1) and its gra-

dient ▽θ(t−1)ℓ(t−1),

I(t−1) = [λ(t−2),▽θ(t−1)ℓ(t−1),θ(t−1), ℓ(t−1)], (10)

λ(t−1) = σ(O(I(t−1);ω)), (11)

where O(·;ω) is a coordinate-wise LSTM [1] parameter-

ized by ω, which shares the parameters across all dimen-

sions of input, and σ is the sigmoid function to bound the

predicted learning rate. The LSTM input I(t−1) is formed

by element-wise stacking the 4 sub-inputs along a new

axis2. The current update loss ℓ(t−1) is computed from a

mini-batch of n updating samples,

ℓ(t−1) =1

n

n∑

j=1

L(Fj ,Mj , Bj ;θ(t−1), φ(t−1)), (12)

where the updating samples (Fj ,Mj , Bj) are collected

from the previous τ frames, where τ is the frame interval

between model updates during online tracking. Finally, we

test the newly updated model θ(t) on a randomly selected

future frame3 and obtain the meta loss,

L(t) = L(F (t+δ),M (t+δ), B(t+δ);θ(t), φ(t−1)), (13)

where δ is randomly selected within [0, τ − 1].During offline training stage, we perform the aforemen-

tioned procedure on a mini-batch of videos and get an aver-

aged meta loss for optimizing the neural optimizer,

L =1

NT

N∑

i=1

T∑

t=1

L(t)Vi, (14)

where N is the batch size and T is the number of model up-

dates, and Vi ∼ p(V) is a video clip sampled from the train-

ing set. It should be noted that the initial tracking model

parameter θ(0) and initial learning rate λ(0) are also train-

able variables, which are jointly learned with the neural op-

timizer O. By minimizing the averaged meta loss L, we

aim to train a neural optimizer that can update the track-

ing model to generalize well on subsequent frames, as well

as to learn a beneficial initialization of the tracking model,

which is broadly applicable to different tasks (i.e. videos).

The overall training process is detailed in Algorithm 1.

2We therefore get an |θ|×4 matrix, where |θ| is the number of param-

eters in θ. Note that the current update loss ℓ(t−1) is broadcasted to have

compatible shape with other vectors. To better understand this process, we

can treat the input of LSTM as a mini-batch of vectors where |θ| is the

batch size and 4 is the dimension of the input vector.3We found that using more than one future frame does not improve

performance but costs more time during the off-line training phase.

6721

Algorithm 1 Offline training of our framework

Input:

p(V): distribution over training videos.

Output:

θ(0),λ(0) : initial tracking model and learning rates.

ω: recurrent neural optimizer.

1: Initialize all network parameters.

2: while not done do

3: Draw a mini-batch of videos: Vi ∼ p(V)4: for all Vi do

5: Compute θ(1) ← θ

(0) − λ(0) ⊙ ▽

θ(0)ℓ(0) in (9).

6: Compute meta loss L(1) using (13).

7: for t = {1 + τ, 1 + 2τ, · · · , 1 + (T − 1)τ} do

8: Compute adaptive learning rate λ(t−1) using

neural optimizer O as in (11).

9: Compute updated model θ(t) using (9).

10: Compute meta loss L(t) using (13).

11: end for

12: end for

13: Compute averaged meta loss L using (14).

14: Update θ(0),λ(0),ω by computing gradient of L.

15: end while

3.3. Random Filter Scaling

Neural optimizers have difficulty to generalize well on

new tasks due to overfitting as discussed in [1, 28, 46].

By analyzing the learned behavior of the neural optimizer,

we found that our preliminary trained optimizer will pre-

dict similar learning rates (see Fig. 2 left). We attribute

this to overfitting to network inputs with similar magnitude

scales. The following simple example illustrates the over-

fitting problem. Suppose the objective function4 that the

neural optimizer minimizes is g(θ) = ‖xθ − y‖2. The

optimal element-wise learning rate is 1/2x2 since we can

achieve the lowest loss of 0 in one gradient descent step

θ(t+1) = θ

(t) − 1/2x2▽g(θ(t)) = y/x. Note that the opti-

mal learning rate depends on the network input x, and thus

the learned neural optimizer is prone to overfitting if it does

not see enough magnitude variations of x. To address this

problem, we multiply the tracking model θ with a randomly

sampled vector ǫ, which has the same dimension as θ [28]

during each iteration of offline training,

ǫ ∼ exp(U(−κ, κ)), θǫ = ǫ⊙ θ, (15)

where U(−κ, κ) is a uniform distribution in the interval

[−κ, κ], and κ is the range factor to control the scale extent.

The objective function is then modified as gǫ(θ) = g(θǫ).

In this way, the network input x is indirectly scaled without

modifying the training samples (x, y) in practice. Thus, the

learned neural optimizer is forced to predict adaptive learn-

ing rates for inputs with different magnitudes (see Fig. 2

4Our actual loss function in (7) includes an L2 loss and smooth L1 loss;

for simplicity here we consider a simple linear model with a L2 loss.

Without RFS

Predicted Learing Rates

With RFST

rain

ing

Ste

ps

Predicted Learing Rates

Figure 2: Histogram of predicted learning rates during offline training

right), rather than to produce similar learning rates for sim-

ilar scaled inputs, which improves its generalization ability.

4. Online Tracking via Proposed Framework

The offline training produces a neural optimizer O, ini-

tial tracking model θ(0) and learning rate λ(0), which we

then use to perform online tracking. The overall process is

similar to offline training except that we do not compute the

meta loss or its gradient.

Model Initialization. Given the first frame, the ini-

tial image patch is cropped and centered on the provided

groundtruth bounding box. We then augment the initial im-

age patch to 9 samples by stretching the object to different

aspect ratios and scales. Specifically, the target is stretched

by [swr, sh/r], where (w, h) are the initial object width and

height, and (s, r) are scale and aspect ratio factors. Then we

use these examples, as well as the offline learned θ(0) and

λ(0), to perform one-step gradient update to build an initial

model θ(1) as in (9).

Bounding Box Estimation. We estimate the object

bounding box by first finding the presence of target through

the response generation, and then predict accurate box by

bounding box regression. We employ the penalty strategy

used in [23] on the generated response to suppress estimated

boxes with large changes in scale and aspect ratio. In addi-

tion, we also multiply the response map by a Gaussian-like

motion map to suppress large movement. The bounding

box computed by the anchor that corresponds to the max-

imum score of response map is the final prediction. To

smoothen the results, we linearly interpolate this estimated

object size with the previous one. We denote our neural-

optimized tracking model with bounding box regression as

ROAM++. We also design a baseline variant of our tracker,

which uses multi-scale search to the estimate object box in-

stead of bounding box regression, and denote it as ROAM.

Model Updating. We update the model every τ frame.

Although offline training uses the previous τ frames to per-

form a one-step gradient update of the model, in practice,

we find that using more than one step could further improve

the performance during tracking (see Sec. 6.2). Therefore,

we adopt two-step gradient update using the previous 2τframes in our experiments.

6722

5. Implementation Details

Patch Cropping. Given a bounding box of the object

(x0, y0, w, h), the ROI of the image patch has the same cen-

ter (x0, y0) and takes a larger size S = Sw = Sh = γ√wh,

where γ is the ROI scale factor. Then, the ROI is resized to

a fixed size S × S for batch processing.

Network Structure. We use the first 12 convolution lay-

ers of the pretrained VGG-16 [40] as the feature extrac-

tor. The top max-pooling layers are removed to increase

the spatial resolution of the feature map. Both the response

generation network C and bounding box regression network

consists of two convolutional layers with a dimension re-

duction layer of 512×64×1×1 (in-channels, out-channels,

height, width) as the first layer, and either a correlation layer

of 64×1×21×21 or a regression layer of 64×4×21×21 as

the second layer respectively. We use two stacked LSTM

layers with 20 hidden units for the neural optimizer O.

The ROI scale factor is γ = 5 and the search size is

S = 281. The scale factor of filter size is ρ = 1.5. The

response generation uses α = 20 and the feature stride of

the CNN feature extractor is c = 4. The scale and aspect

ratio factors s, r used for initial image patch augmentation

are selected from {0.8, 1, 1.2}, generating a combination of

9 pairs of (s, r). The range factors used in RFS are κcf =1.6, κreg = 1.3 5.

Training Details. We use ADAM [18] optimization with

a mini-batch of 16 video clips of length 31 on 4 GPUs (4

videos per GPU) to train our framework. We use the train-

ing splits of ImageNet VID [21], TrackingNet [30], LaSOT

[10], GOT10k [17], ImageNet DET [21] and COCO [25]

for training. During training, we randomly extract a contin-

uous sequence clip for video datasets, and repeat the same

still image to form a video clip for image datasets. Note we

randomly augment all frames in a training clip by slightly

stretching and scaling the images. We use a learning rate of

1e-6 for the initial regression parameters θ(0) and the ini-

tial learning rate λ(0). For the recurrent neural optimizerO,

we use a learning rate of 1e-3. Both learning rates are mul-

tiplied by 0.5 every 5 epochs. We implement our tracker in

Python with the PyTorch toolbox [35], and conduct the ex-

periments on a computer with an NVIDIA RTX2080 GPU

and Intel(R) Core(TM) i9 CPU @ 3.6 GHz. Our tracker

ROAM and ROAM++ run at 13 FPS and 20 FPS respec-

tively. (See supplementary for detailed speed comparison)

6. Experiments

We evaluate our trackers on six benchmarks: OTB-2015

[47], VOT-2016 [19], VOT-2017 [20], LaSOT [10], GOT-

10k [17] and TrackingNet [30].

5We use different range factor κ for θcf and θreg due to different

magnitude of parameters in the two branches

0 10 20 30 40 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE

SiamRPN+ [0.923]ECO [0.910]ROAM [0.908]DSLT [0.907]ROAM++ [0.904]CCOT [0.898]DaSiamRPN [0.881]MetaTracker [0.856]C-RPN [0.853]SiamRPN [0.851]CREST [0.838]MemTrack [0.820]Staple [0.784]SiamFC [0.771]

0 0.2 0.4 0.6 0.8 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Su

cce

ss r

ate

Success plots of OPE

ECO [0.691]ROAM [0.681]ROAM++ [0.680]CCOT [0.671]SiamRPN+ [0.665]DaSiamRPN [0.658]DSLT [0.658]C-RPN [0.639]MetaTracker [0.637]SiamRPN [0.637]MemTrack [0.626]CREST [0.623]SiamFC [0.582]Staple [0.581]

Figure 3: Precision and success plot on OTB-2015.

VOT-2016 VOT-2017

EAO(↑) A(↑) R(↓) EAO(↑) A(↑) R(↓)

ROAM++ 0.441 0.599 0.174 0.380 0.543 0.195

ROAM 0.384 0.556 0.183 0.331 0.505 0.226

MetaTracker 0.317 0.519 - - - -

DaSiamRPN 0.411 0.61 0.22 0.326 0.56 0.34

SiamRPN+ 0.37 0.58 0.24 0.30 0.52 0.41

C-RPN 0.363 0.594 - 0.289 - -

SiamRPN 0.344 0.56 0.26 0.244 0.49 0.46

ECO 0.375 0.55 0.20 0.280 0.48 0.27

DSLT 0.343 0.545 0.219 - - -

CCOT 0.331 0.54 0.24 0.267 0.49 0.32

Staple 0.295 0.54 0.38 0.169 0.52 0.69

CREST 0.283 0.51 0.25 - - -

MemTrack 0.272 0.531 0.373 0.248 0.524 0.357

SiamFC 0.235 0.532 0.461 0.188 0.502 0.585

Table 1: Results on VOT-2016/2017. The evaluation metrics are expected

average overlap (EAO), accuracy value (A), robustness value (R). The top

3 performing trackers are colored with red, green and blue respectively.

6.1. Comparison Results with Stateoftheart

We compare our ROAM and ROAM++ with recent re-

sponse generation based trackers including MetaTracker

[34], DSLT [26], MemTrack [49], CREST [41], SiamFC

[4], CCOT [9], ECO [8], Staple [3], as well as recent state-

of-the-art trackers including SiamRPN[23], DaSiamRPN

[53], SiamRPN+ [52] and C-RPN [11] on both OTB and

VOT datasets. For the methods using SGD updates, the

number of SGD steps followed their implementations.

OTB. Fig. 3 presents the experiment results on OTB-

2015 dataset, which contains 100 sequences with 11 an-

notated video attributes. Both our ROAM and ROAM++

achieve similar AUC compared with top performer ECO

and outperform all other trackers. Specifically, our ROAM

and ROAM++ surpass MetaTracker [34], which is the base-

line for meta learning trackers that uses traditional opti-

mization method for model updating, by 6.9% and 6.7%

on the success plot respectively, demonstrating the effec-

tiveness of the proposed recurrent model optimization al-

gorithm and resizable bounding box regression. In addi-

tion, both our ROAM and ROAM++ outperform the very

recent meta learning based tracker MLT [7] by a large mar-

gin (ROAM/ROAM++: 0.681/0.680 vs MLT: 0.611) under

the AUC metric on OTB-2015.

VOT. Table 1 shows the comparison performance on

VOT-2016 and VOT-2017 datasets. Our ROAM++ achieves

the best EAO on both VOT-2016 and VOT-2017. Specially,

6723

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

Pre

cis

ion

Precision plots of OPE

[0.445] ROAM++[0.373] MDNet[0.368] ROAM[0.360] VITAL[0.339] SiamFC[0.333] StructSiam[0.322] DSiam[0.301] ECO[0.298] STRCF[0.295] SINT[0.279] ECO_HC[0.259] CFNet

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Su

cce

ss r

ate


[0.447] ROAM++[0.397] MDNet[0.390] ROAM[0.390] VITAL[0.336] SiamFC[0.335] StructSiam[0.333] DSiam[0.324] ECO[0.314] SINT[0.308] STRCF[0.304] ECO_HC[0.275] CFNet

Figure 4: Precision and success plot on LaSOT test dataset

MDNet

[33]

CF2

[29]

ECO

[8]

CCOT

[9]

GOTURN

[15]

SiamFC

[4]

SiamFCv2

[44]ROAM ROAM++

AO(↑) 0.299 0.315 0.316 0.325 0.342 0.348 0.374 0.436 0.465

SR0.5(↑) 0.303 0.297 0.309 0.328 0.372 0.353 0.404 0.466 0.532

SR0.75(↑) 0.099 0.088 0.111 0.107 0.124 0.098 0.144 0.164 0.236

Table 2: Results on GOT-10k. The evaluation metrics include average

overlap (AO), success rate at 0.5 overlap threshod. (SR0.5 ), success rate

at 0.75 overlap threshod. (SR0.75 ). The top three performing trackers are

colored with red, green and blue respectively.

both our ROAM++ and ROAM shows superior performance

on robustness value compared with RPN based trackers

which have no model updating, demonstrating the effective-

ness of our recurrent model optimization scheme. In ad-

dition, our ROAM++ and ROAM outperform the baseline

MetaTracker [34] by 39.2% and 21.1% on EAO of VOT-

2016, respectively.

LaSOT. LaSOT [10] is a recently proposed large-scale

tracking dataset. We evaluate our ROAM against the top-

10 performing trackers of the benchmark [10], including

MDNet [33], VITAL [42], SiamFC [4], StructSiam [51],

DSiam [14], SINT [14], ECO [8], STRCF [24], ECO HC

[8] and CFNet [44], on the testing split which consists

of 280 videos. Fig. 4 presents the comparison results of

precision plot and success plot on LaSOT testset. Our

ROAM++ achieves the best result compared with state-of-

the-art trackers on the benchmark, outperforming the sec-

ond best MDNet with an improvement of 19.3% and 12.6%

on precision plot and success plot respectively.

GOT-10k. GOT-10k [17] is large highly-diverse dataset

for object tracking that is also proposed recently. There is

no overlap in object categories between the training set and

test set, which follows the one-shot learning setting [12].

Therefore, using external training data is strictly forbidden

when testing trackers on their online server. Following their

protocol, we train our ROAM by using only the training

split of this dataset. Table 2 shows the detailed compari-

son results on GOT-10k test dataset. Both our ROAM++

and ROAM surpass other trackers with a large margin. In

particular, our ROAM++ obtains an AO of 0.465, a SR0.5 of

0.532 and a SR0.75 of 0.236, outperforming SiamFCv2 with

an improvement of 24.3%, 31.7% and 63.9% respectively.

TrackingNet. TrackingNet [30] provides more than 30K

videos with around 14M dense bounding box annotations

by filtering shorter video clips from Youtube-BB [5]. Ta-

ble 3 presents the detailed comparison results on Track-

ingNet test dataset. Our ROAM++ surpasses other state-of-

Staple

[3]

CSRDCF

[27]

ECOhc

[8]

ECO

[8]

SiamFC

[4]

CFNet

[44]

MDNet

[33]ROAM ROAM++

AUC(↑) 0.528 0.534 0.541 0.554 0.571 0.578 0.606 0.620 0.670

Prec.(↑) 0.470 0.480 0.478 0.492 0.533 0.533 0.565 0.547 0.623

Norm. Prec.(↑) 0.603 0.622 0.608 0.618 0.663 0.654 0.705 0.695 0.754

Table 3: Results on TrackingNet. The evaluation metrics include area un-

der curve (AUC) of success plot, Precision, Normalized Precison. The top

three performing trackers are colored with red, green and blue respectively.

0 0.2 0.4 0.6 0.8 1

Overlap threshold

0

0.2

0.4

0.6

0.8

1

Success r

ate


ROAM-oracle [0.742]SGD-oracle [0.699]ROAM [0.675]SGD [0.636]ROAM-w/o RFS [0.636]

0 0.2 0.4 0.6 0.8 1

Overlap threshold

0

0.2

0.4

0.6

0.8

1

Success r

ate


ROAM [0.675]ROAM-GRU [0.668]ROAM-FC [0.647]ROAM-ConstLR [0.612]

Figure 5: Ablation Study on OTB-2015 with different variants of ROAM.

the-art tracking algorithms on all three evaluation metrics.

In detail, our ROAM++ obtains an improvement of 10.6%,

10.3% and 6.9% on AUC, precision and normalized pre-

cision respectively compared with top performing tracker

MDNet on the benchmark.

6.2. Ablation Study

For a deeper analysis, we study our trackers from various

aspects. Note that all these ablations are trained only on

ImageNet VID dataset for simplicity.

Impact of Different Modules. To verify the effective-

ness of different modules, we design four variants of our

framework: 1) SGD: replacing recurrent neural optimizer

with traditional SGD for model updating (using the same

number of gradient steps as ROAM); 2) ROAM-w/o RFS:

training a recurrent neural optimizer without RFS; 3) SGD-

Oracle: using the ground-truth bounding boxes to build

updating samples for SGD during the testing phase (using

the same number of gradient steps as ROAM); 4) ROAM-

Oracle: using the ground-truth bounding boxes to build up-

dating samples for ROAM during the testing phase. The

results are presented in Fig. 5 (Left). ROAM gains about

6% improvement on AUC compared with the baseline SGD,

demonstrating the effectiveness of our recurrent model op-

timization method on model updating. Without RFS dur-

ing offline training, the tracking performance drops sub-

stantially due to overfitting. ROAM-Oracle performs bet-

ter than SGD-Oracle, showing that our offline learned neu-

ral optimizer is more effective than traditional SGD method

given the same updating samples. In addition, these two or-

acles (SGD-oracle and ROAM-oracle) achieve higher AUC

score compared with their normal versions, indicating that

the tracking accuracy could be boosted by improving the

quality of the updating samples.

Architecture of Neural Optimizer. To investigate more

architectures of the neural optimizer, we presents three vari-

ants of our method: 1) ROAM-GRU: using two stacked

Gated Recurrent Unit (GRU) [6] as our neural optimizer;

2) ROAM-FC: using two linear fully-connected layers fol-

6724

0 2 4 6 8 10 12 14 16Iterations

0.58

0.61

0.64

0.67

0.7

AU

C

AUC on OTB-2015 under different gradient steps

ROAMSGD(7e-6)SGD(7e-7)

Figure 6: AUC vs. gradient steps on OTB-2015.

0

0.2

0.4

update

loss

Car24

0

0.05

0.1

0.15Deer

0

0.05

0.1

0.15Fish

0

0.1

0.2

0.3Jumping

0 2000

frame number

0

0.2

0.4

0.6

meta

loss

0 50

frame number

0

0.2

0.4

0 200 400

frame number

0

0.1

0.2

0.3

0 200

frame number

0

0.2

0.4

ROAMSGD

Figure 7: Update loss (top row) and meta loss (bottom row) comparison

between ROAM and SGD.

lowed by tanh activation function as the neural optimizer;

3) ROAM-ConstLR: using a learned constant element-wise

learning rate for model optimization instead of the adap-

tively generated one. Fig. 5 (Right) presents the re-

sults. using ROAM-GRU decreases the AUC slightly, while

ROAM-FC has significantly lower AUC compared with

ROAM, showing the importance of our recurrent structure.

Moreover, the performance drop of ROAM-ConstLR ver-

ifies the necessity of using an adaptable learning rate for

model updating.

More Steps in Updating. During offline training, we

only perform one-step gradient decent to optimize the up-

dating loss. We investigate the effect of using more than

one gradient step on tracking performance during the test

phase for both ROAM and SGD (see Fig. 6). Our method

can be further improved with more than one step, but will

gradually decrease when using too many steps. This is be-

cause our framework is not trained to perform so many steps

during the offline stage. We also use two fixed learning rates

for SGD, where the larger one is 7e-6 6 and the smaller one

is 7e-7. Using a larger learning rate, SGD could reach its

best performance much faster than using a smaller learn-

ing rate, while both have similar best AUC. Our ROAM

consistently outperforms SGD(7e-6), showing the superi-

ority of adaptive element-wise learning rates. Furthermore,

ROAM with 1-2 gradient steps outperforms SGD(7-e7) us-

ing a large number of steps, which shows the improved gen-

eralization of ROAM.

Update Loss and Meta Loss Comparison. To show

the effectiveness of our neural optimizer, we compare the

update loss and meta loss over time between ROAM and

SGD after two gradient steps for a few videos of OTB-2015

in Fig. 7 (see supplementary for more examples). Under

the same number of gradient updates, our neural optimizer

obtains lower loss compared with traditional SGD, demon-

strating its faster converge and better generalization ability

6MetaTracker uses this learning rate.

Figure 8: Histogram of initial learning rates and updating learning rates on

OTB-2015.

for model optimization.

Why does ROAM work? As discussed in [34], directly

using the learned initial learning rate λ(0) for model op-

timization in subsequent frames could lead to divergence.

This is because the learning rates for model initialization

are relatively larger than the ones needed for subsequent

frames, which therefore causes unstable model optimiza-

tion. In particular, the initial model θ(0) is offline trained

to be broadly applicable to different videos, which there-

fore needs relatively larger gradient step to adapt to a spe-

cific task, leading to a relative larger λ(0). For the subse-

quent frames, the appearance variations could be sometimes

small or sometimes large, and thus the model optimization

process needs an adaptive learning rate to handle different

situations. Fig. 8 presents the histograms of initial learn-

ing rates and updating learning rates on OTB-2015. Most

of updating learning rates are relatively small because usu-

ally there are only minor appearance variations between up-

dates. As is shown in Fig. 3, our ROAM, which performs

model updating for subsequent frames with adaptive learn-

ing rates, obtains substantial performance gain compared

with MetaTracker [34], which uses a traditional SGD with

a constant learning rate for model updating.

7. Conclusion

In this paper, we propose a novel tracking model con-

sisting of resziable response generator and bounding box

regressor, where the first part generates a score map to in-

dicate the presence of target at different spatial locations

and the second module regresses bounding box coordinates

shift to the anchors densely mounted on sliding window po-

sitions. To effectively update the tracking model for ap-

pearance adaption, we propose a recurrent model optimiza-

tion method within a meta learning setting in a few gradi-

ent steps. Instead of producing an increment for parameter

updating, which is prone to overfit due to different gradi-

ent scales, we recurrently generate adaptive learning rates

for tracking model optimization using an LSTM. Extensive

experiments on OTB, VOT, LaSOT, GOT-10k and Track-

ingNet datasets demonstrate superior performance com-

pared with state-of-the-art tracking algorithms.

Acknowledgement: This work was supported by grants

from the Research Grants Council of the Hong Kong Spe-

cial Administrative Region, China (Project No. [T32-

101/15-R] and CityU 11212518).

6725

References

[1] Marcin Andrychowicz, Misha Denil, Sergio Gomez,

Matthew W. Hoffman, David Pfau, Tom Schaul, and Nando

de Freitas. Learning to Learn by Gradient Descent by Gradi-

ent Descent. In NeurIPS, 2016. 2, 4, 5[2] Samy Bengio, Yoshua Bengio, Jocelyn Cloutier, and Jan

Gecsei. On the Optimization of a Synaptic Learning Rule. In

Preprints Conf. Optimality in Artificial and Biological Neu-

ral Networks, 1992. 2[3] Luca Bertinetto, Jack Valmadre, Stuart Golodetz, Ondrej

Miksik, and Philip Torr. Staple: Complementary Learners

for Real-Time Tracking. In CVPR, 2016. 2, 6, 7[4] Luca Bertinetto, Jack Valmadre, Joao F. Henriques, Andrea

Vedaldi, and Philip H. S. Torr. Fully-Convolutional Siamese

Networks for Object Tracking. In ECCV Workshop on VOT,

2016. 1, 2, 6, 7[5] Google Brain, Jonathon Shlens, Stefano Mazzocchi, Google

Brain, and Vincent Vanhoucke. YouTube-BoundingBoxes

: A Large High-Precision Human-Annotated Data Set for

Object Detection in Video Esteban Real bear dog airplane

zebra. In CVPR, 2017. 7[6] Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre,

Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and

Yoshua Bengio. Learning Phrase Representations using

RNN Encoder-decoder for Statistical Machine Translation.

arXiv preprint arXiv:1406.1078, 2014. 7[7] Janghoon Choi, Junseok Kwon, and Kyoung Mu Lee. Deep

Meta Learning for Real-Time Target-Aware Visual Tracking.

In ICCV, 2019. 6[8] Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and

Michael Felsberg. ECO: Efficient Convolution Operators for

Tracking. In CVPR, 2017. 1, 6, 7[9] Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan,

and Michael Felsberg. Beyond correlation filters: Learn-

ing continuous convolution operators for visual tracking. In

ECCV, 2016. 1, 2, 6, 7[10] Heng Fan, Liting Lin, Fan Yang, and Peng Chu. LaSOT

: A High-quality Benchmark for Large-scale Single Object

Tracking. In CVPR, 2019. 6, 7[11] Heng Fan and Haibin Ling. Siamese Cascaded Region Pro-

posal Networks for Real-Time Visual Tracking. In CVPR,

2019. 2, 6[12] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-

Agnostic Meta-Learning for Fast Adaptation of Deep Net-

works. In ICML, 2017. 2, 7[13] Hamed Kiani Galoogahi, Terence Sim, and Simon Lucey.

Correlation Filters with Limited Boundaries. In CVPR, 2015.

2[14] Qing Guo, Wei Feng, Ce Zhou, Rui Huang, Liang Wan, and

Song Wang. Learning Dynamic Siamese Network for Visual

Object Tracking. In ICCV, 2017. 7[15] David Held, Sebastian Thrun, and Silvio Savarese. Learning

to Track at 100 FPS with Deep Regression Networks. In

ECCV, 2016. 7[16] Joao F. Henriques, Rui Caseiro, Pedro Martins, and Jorge

Batista. High-speed tracking with kernelized correlation fil-

ters. TPAMI, 2015. 1, 2[17] Lianghua Huang, Xin Zhao, and Kaiqi Huang. GOT-10k : A

Large High-Diversity Benchmark for Generic Object Track-

ing in the Wild. arXiv, 2019. 6, 7[18] Diederik P Kingma and Jimmy Ba. Adam: A method for

stochastic optimization. arXiv preprint arXiv:1412.6980,

2014. 6[19] Matej Kristan, Ales Leonardis, Jiri Matas, and Michael Fels-

berg. The Visual Object Tracking VOT2016 Challenge Re-

sults. In ECCV Workshop on VOT, 2016. 1, 6[20] Matej Kristan, Ales Leonardis, Jiri Matas, Michael Fels-

berg, Roman Pflugfelder, Luka Cehovin Zajc, Tomas Vojır,

Gustav Hager, Alan Lukezic, Abdelrahman Eldesokey, Gus-

tavo Fernandez, and Others. The Visual Object Tracking

VOT2017 Challenge Results. In ICCV Workshop on VOT,

2017. 1, 6[21] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton.

Imagenet classification with deep convolutional neural net-

works. In NeurIPS, 2012. 6[22] Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing,

and Junjie Yan. SiamRPN++: Evolution of Siamese Visual

Tracking with Very Deep Networks. In CVPR, 2019. 2[23] Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu.

High Performance Visual Tracking with Siamese Region

Proposal Network. In CVPR, 2018. 1, 2, 5, 6[24] Feng Li, Cheng Tian, Wangmeng Zuo, Lei Zhang, and Ming-

Hsuan Yang. Learning Spatial-Temporal Regularized Corre-

lation Filters for Visual Tracking. In CVPR, pages 4904–

4913, 2018. 7[25] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays,

Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence

Zitnick. Microsoft COCO: Common Objects in Context. In

ECCV, pages 740–755, 2014. 6[26] Xiankai Lu, Chao Ma, Bingbing Ni, Xiaokang Yang, Ian

Reid, and Ming-hsuan Yang. Deep Regression Tracking with

Shrinkage Loss. In ECCV, 2018. 1, 2, 6[27] Alan Lukezic, Tomas Vojı, Luka Cehovin, Jiı Matas, and

Matej Kristan. Discriminative Correlation Filter with Chan-

nel and Spatial Reliability. In CVPR, 2017. 7[28] Kaifeng Lv, Shunhua Jiang, and Jian Li. Learning Gradi-

ent Descent: Better Generalization and Longer Horizons. In

ICML, 2017. 2, 5[29] Chao Ma, Jia-bin Huang, Xiaokang Yang, and Ming-hsuan

Yang. Hierarchical Convolutional Features for Visual Track-

ing. In ICCV, 2015. 7[30] Matthias Muller, Adel Bibi, Silvio Giancola, Salman Al-

Subaihi, and Bernard Ghanem. TrackingNet: A Large-Scale

Dataset and Benchmark for Object Tracking in the Wild. In

ECCV, 2018. 6, 7[31] Tsendsuren Munkhdalai and Hong Yu. Meta Networks. In

ICML, 2017. 2[32] Devang K Naik and RJ Mammone. Meta-Neural Networks

that Learn by Learning. In Neural Networks, 1992. IJCNN.,

International Joint Conference on, 1992. 2[33] Hyeonseob Nam and Bohyung Han. Learning Multi-Domain

Convolutional Neural Networks for Visual Tracking. In

CVPR, 2016. 1, 7[34] Eunbyung Park and Alexander C. Berg. Meta-Tracker: Fast

and Robust Online Adaptation for Visual Object Trackers. In

ECCV, 2018. 2, 3, 6, 7, 8[35] Adam Paszke, Sam Gross, Soumith Chintala, Gregory

Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Al-

ban Desmaison, Luca Antiga, and Adam Lerer. Automatic

Differentiation in PyTorch. In NeurIPS Workshop, 2017. 6[36] Sachin Ravi and Hugo Larochelle. Optimization As a Model

for Few-Shot Learning. In ICLR, 2017. 2[37] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun.

6726

Faster R-CNN: Towards Real-Time Object Detection with

Region Proposal Networks. In NeurIPS, 2015. 1, 2, 4[38] Jurgen Schmidhuber. Evolutionary Principles in Self-

referential Learning: On Learning how to Learn: the Meta-

meta-meta...-hook. PhD thesis, 1987. 2[39] John Schulman, Jonathan Ho, Xi Chen, and Pieter Abbeel.

Meta Learning Shared Hierarchies. arXiv, 2018. 2[40] Karen Simonyan and Andrew Zisserman. Very Deep Con-

volutional Networks for Large-scale Image Recognition. In

ICLR, 2015. 6[41] Yibing Song, Chao Ma, Lijun Gong, Jiawei Zhang, Rynson

Lau, and Ming-Hsuan Yang. CREST: Convolutional Resid-

ual Learning for Visual Tracking. In ICCV, 2017. 1, 2, 6[42] Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, Linchao

Bao, Wangmeng Zuo, Chunhua Shen, Rynson Lau, and

Ming-Hsuan Yang. VITAL: VIsual Tracking via Adversarial

Learning. In CVPR, 2018. 7[43] Ran Tao, Efstratios Gavves, and Arnold W. M. Smeulders.

Siamese Instance Search for Tracking. In CVPR, 2016. 1[44] Jack Valmadre, Luca Bertinetto, F Henriques, Andrea

Vedaldi, and Philip H S Torr. End-to-end representation

learning for Correlation Filter based tracking. In CVPR,

2017. 1, 7[45] Ning Wang, Yibing Song, Wengang Zhou, Wei Liu, and

Houqiang Li. Unsupervised Deep Tracking. In CVPR, 2019.

2[46] Olga Wichrowska, Niru Maheswaranathan, Matthew W.

Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Nando

de Freitas, and Jascha Sohl-Dickstein. Learned Optimizers

that Scale and Generalize. In ICML, 2017. 2, 5[47] Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. Object Track-

ing Benchmark. TPAMI, 2015. 6[48] Tianyu Yang and Antoni B. Chan. Recurrent Filter Learning

for Visual Tracking. In ICCV Workshop on VOT, 2017. 2[49] Tianyu Yang and Antoni B. Chan. Learning Dynamic Mem-

ory Networks for Object Tracking. In ECCV, 2018. 2, 6[50] Tianyu Yang and Antoni B. Chan. Visual Tracking via Dy-

namic Memory Networks. TPAMI, 2019. 2[51] Yunhua Zhang, Lijun Wang, Dong Wang, and Huchuan Lu.

Structured Siamese Network for Real-Time Visual Tracking.

In ECCV, 2018. 7[52] Zhipeng Zhang, Houwen Peng, and Qiang Wang. Deeper

and Wider Siamese Networks for Real-Time Visual Track-

ing. In CVPR, 2019. 2, 6[53] Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, and

Weiming Hu. Distractor-aware Siamese Networks for Visual

Object Tracking. In ECCV, 2018. 2, 6

6727

Documents

ROAM: Recurrently Optimizing Tracking Model...setting, which can converge the model in a few gradient steps. This improves the convergence speed of updating the tracking model while