57
Dependency Modeling for Information Fusion with Applications in Visual Recognition Andy Jinhua MA Advisor: Prof. Pong Chi YUEN 1

Dependency Modeling for Information Fusion with Applications in Visual Recognition

  • Upload
    ilana

  • View
    27

  • Download
    1

Embed Size (px)

DESCRIPTION

Dependency Modeling for Information Fusion with Applications in Visual Recognition. Andy Jinhua MA Advisor: Prof. Pong Chi YUEN. Outline. Motivation Related Works Supervised Spatio -Temporal Manifold Learning Linear Dependency Modeling Reduced Analytic Dependency Modeling - PowerPoint PPT Presentation

Citation preview

Prblem1

Motivation Related WorksSupervised Spatio-Temporal Manifold LearningLinear Dependency ModelingReduced Analytic Dependency ModelingConclusion 2OutlineMenu The outline of this presentation is shown in this slide. 2Motivation Related WorksSupervised Spatio-Temporal Manifold LearningLinear Dependency ModelingReduced Analytic Dependency ModelingConclusion 3OutlineMenu At the beginning, I would like to gives an introduction to the motivation of this project. 3Motivation Related WorksSupervised Spatio-Temporal Manifold LearningLinear Dependency ModelingReduced Analytic Dependency ModelingConclusion 7OutlineMenu For the related works, 7Multiple features provide complementary information, e.g.Color information can differ Daffodil from Windflower

Motivation

Windflower Daffodil 4Flower images from Oxford Flowers dataset [CVPR06]Menu The motivation of this project is to combine multiple complementary features for better visual recognition. For example, from these two images, we can see that color information can differ Daffodil from Windflower.4Multiple features provide complementary information, e.g.Color information can differ Daffodil from Windflower Shape characteristics can differ Daffodil from Buttercup

Motivation

ButtercupDaffodil 5Flower images from Oxford Flowers dataset [CVPR06]

Windflower Menu On the other hand, shape characteristics can differ Daffodil from Buttercup as shown in this image. Therefore, if we want to distinguish these three flowers, it is better to combine the color and shape features for recognition.5Motivation6Menu Multiple features can be combined by estimating the joint distribution of them, but the problem is the density estimation may not be accurate, when the dimension of classifiers or features is high. To simplify the fusion process, independent assumption can be employed. Under independent assumption, the joint distribution is simplified as the product of posteriors of each feature. However, this assumption may not be valid in practice, since the features are extracted from the same sample. And thus, the fusion performance will degrade. In this context, dependency modeling has been proposed, but existing methods are derived based on normal assumption, which may not be robust to non-normal cases. In order to overcome these limitations, two dependency modeling methods without normal assumption have been developed and will be introduced later in this presentation. 6MotivationRelated WorksSupervised Spatio-Temporal Manifold LearningLinear Dependency ModelingReduced Analytic Dependency ModelingConclusions 54OutlineMenu At last, I would like to summarize this presentation.54Probabilistic approachIndependent assumption based [TPAMI98]Product, Sum, Majority VotesNormal assumption based [TPAMI2009]Independent Normal (IN) combinationDependent Normal (DN) combinationNon-probabilistic approachSupervised weightingLPBoost [ML2002]LP-B [ICCV09]Reduced multivariate polynomial (RM) [TCSVT04]Multiple kernel learning (MKL) [ICML04, JMLR06, JMLR08]Unsupervised approachSignal strength combination (SSC) [TNNLS12]Graph-regularized robust late fusion (GRLF) [CVPR12]8Related WorksMenu Existing fusion methods can be categorized into probabilistic and non-probabilistic approaches. For probabilistic approach, existing fusion methods are developed based on different assumptions. For example, commonly used combination rules including Sum, Product, Majority Votes, are derived under independent assumption, while the IN and DN combination rules are induced under normal assumption. For non-probabilistic fusion, existing methods can be further categorized into supervised and unsupervised approaches. The supervised weighting methods learns the optimal weighting by minimizing the empirical classification error, while the unsupervised methods utilize general theories, for example, signal strength concept or robust low rank decomposition to learn the fusion models.8Why manifold learningCan discover non-linear structures in visual dataSuccessful applications in image analysis, e.g. LaplacianfacesLimitation for video applicationsTemporal information not fully considered Proposed method canDiscover non-linear structures Utilize global constraint of temporal labels10Spatio-Temporal Manifold LearningMenu The reasons to use the manifold learning for feature extraction lie in two points. The first one is that manifold learning can discover nonlinear structures, which commonly appear in visual data. The second one is that manifold learning has been successfully used in lots of image analysis applications, for example the Laplacianfaces. However, existing manifold learning methods do not take full advantage of temporal information, which is very important for video applications. To overcome this limitation, we propose a new method, which can not only discover non-linear structures but also utilize global constraint of temporal labels.1011Manifold Learning Based Action Recognition FrameworkVideo

Action UnitInput Preprocessing Feature VectorsImage representationSpatio-temporal manifold projectionEmbedded ManifoldLabel Classification Output By information saliency method [PR09]Menu The Manifold Learning Based Action Recognition Framework is presented in this slide. Given a video as input, existing method can be used to extract the salient action unit. Then, each image in the salient unit can be represented by image descriptor and a sequence of feature vectors is obtained. After that, the feature vectors are projected to low-dimensional manifold by the spatio-temporal manifold learning method. At last, a set-based classifier is used to classify the input video and give an action label.1112Supervised Spatial (SS) Topology

Menu To learn the spatio-temporal manifold projection, we propose to use the concept of topology, which is important for the definition of manifold. Using local and label information, the topological base B_ss, which is the minimum open set generating the topological space, is given by the neighborhood of each data point N(x_n) in the same action Ai or Aj as defined by this equation and shown in this figure. Since poses deform continuously over time, the temporal adjacency neighbors are also contained in the supervised spatial topological base.1213Temporal Pose Correspondence (TPC) Topology

Menu On the other hand, the global constraint of temporal labels is used to construct the second topology. For example, as shown in this figure, sequences of the same action share similar poses over time. Therefore, we compare the global similarity between two sequences of the same action by dynamic time warping and construct the TPC set C(x_n) by the corresponding poses. Then, the TPC topological base B_TPC is defined by the intersection of the TPC set C(x_n) and the neighborhood set N(xn).

13Combine SS and TPC topological bases

Supervised spatio-temporal neighborhood topology learning (SSTNTL) canPreserve local structureSeparate sequences of different actions

14Topology CombinationMenu Combining the SS and TPC topological bases, the proposed Supervised spatio-temporal neighborhood topology learning (SSTNTL) can not only preserve local structure but also separate sequences of different actions.

1415ExperimentsMenu We compare the proposed method with four existing manifold learning techniques as well as State-of-the-art action recognition algorithms. For the manifold learning methods, the nearest neighbor framework with median Hausdorff distance is used for classification.15Datasets For EvaluationWeizmann Human ActionKTH Human Action UCF SportsHOllywood Human Action (HOHA)Cambridge GestureImage representation after preprocessingGray-scale for Weizmann and KTH Gist [IJCV01] for KTH, UCF sports, HOHA and Cambridge GesturePerform principle component analysis (PCA) [TPAMI97] to avoid singular matrix problem16ExperimentsMenu Five action datasets are used for evaluation, including Weizmann, KTH, UCF Sports, HOHA, Cambridge Gesture datasets. The images after preprocessing from raw videos are represented by Gray-scale feature for Weizmann and KTH, and Gist feature for KTH, UCF sports, HOHA and Cambridge Gesture datasets. In order to avoid singular matrix problem, PCA is performed before manifold learning.16Accuracy (%) compared with other manifold embedding methods

Our method achieves highest accuracyImage representation method affects the performance17Results Datasets (Representation)Weizmann (Gray)KTH (Gray)KTH (Gist)UCF (Gist)HOHA (Gist)Gesture (Gist)Ours 100.079.694.491.344.588.5LPP92.272.289.488.629.283.3SLPP 84.470.888.486.627.063.0LSDA 93.375.083.384.629.464.8LSTDE 93.375.980.682.629.270.8Menu The recognition accuracies of manifold embedding methods are shown in this table. From this table, we can see that our method achieves the highest accuracy in the five action datasets. Another observation is that the image representation method could affect the recognition performance. And the performance can be further improved, if we use a more discriminative image feature, e.g. the gist feature.17Accuracy (%) compared with state-of-the-art methods under different scenarios in KTHOutdoor (S1), Scale Change (S2), Clothes Change (S3), Indoor (S4)

Interest region based method outperforms others under fixed camera settingInterest points based, e.g. Tracklet and AFMKL, are better for scale change18Results Method S1S2S3S4MeanOurs 98.088.796.0100.095.7Tracklet [ECCV10]98.092.792.096.794.8Augmented Feature MKL [CVPR11]96.791.393.396.794.5Hierarchical ST Model [TSVCT09]95.687.490.794.792.1Menu Then, we compare our method with state-of-the-art algorithms under different scenarios including Outdoor, Scale Change, Clothes Change and Indoor in KTH dataset. From this table, we can see that our Interest region based method outperforms others under fixed camera setting in scenarios 1, 3 and 4. On the other hand, the interest points based methods, Tracklet and AFMKL are better under scenario 2 with scale change. 18Does global constraint of temporal labels help?

Compare the proposed method with and without TPC neighbors19ResultsDatasetsWeizmannKTHUCF HOHAGestureWith TPC100.094.491.344.588.5Without TPC95.690.389.332.380.4

Neighbors not detected by local similarityMenu We are interested in the question, does the global constraint of temporal labels really help? To answer this question, we show the TPC neighbors, which cannot be detected by local spatial similarity. From this figure, we can see that the TPC neighbors detected by the global constraint of temporal labels are corresponding poses for the references. Therefore, the proposed method with TPC neighbors gives better results and the global information of temporal labels really help to improve the recognition performance. 1921Linear Dependency ModelingFeature 1Input Feature 2Feature MClassifier 1Classifier 2Classifier M(Windflower image)Not windflowerOutput Independent fusionAdd dependency terms, s.t. the fused score is large

Menu The main idea of Linear Dependency Modeling is presented as follows. Let us consider a situation that, there are M-1 classifiers giving very high confidence, e.g. 1, for classifying the input image as windflower, while only one classifier gives very low confidence 0, which may be due to a noisy feature. In this case, if the scores go through an independent fusion system, the fused score is calculated by the product of the M scores and the output is 0. This means final decision is dominated by only one classifier and the classification result is incorrect. In order to overcome this limitation, we propose to add dependency terms to the product formulation, such that the fused score is large to make a correct decision.2122Linear Classifier Dependency Modeling (LCDM)Dependency weightSmall numberPrior Menu The dependency terms are designed from three aspects. First, dependency terms cannot be too large, such that original decisions take little effect. Second, we assign different dependency weights to different features for different levels of importance. Third, we use the prior probability to determine the dependency term. Following previous method, we assume that posteriors will not deviate dramatically from priors. With this assumption, delta_lm measuring the difference between the posterior and prior is a small number. And the dependency term is defined by the multiplication of three number, the dependency weight alpha_lm, the small number delta_lm and the prior probability Pr(omega_l).2223Linear Classifier Dependency Modeling (LCDM)Feature 1Feature 2Feature MClassifier 1Classifier 2Classifier MWindflowerOutput Dependency modelDependency termInput (Windflower image)

Menu Adding dependency terms to the product formulation, the joint distribution is proportional to the product of each posterior plus a dependency term. Considering the previous example, the dependency weights are learnt to ensure that the fused score is large to make a correct decision. To simplified the model, 2324Linear Classifier Dependency Modeling (LCDM)Menu we expand the product formulation by neglecting the terms of delta with order higher or equal than 2, Since delta_lm are small numbers. And, the Linear Classifier Dependency Model, LCDM, is given by the prior probability plus the summation of the dependency weight beta_lm multiplying difference of the posterior and the prior. Here, beta_lm is equal to the dependency weight alpha_lm added by one. 2425Linear Feature Dependency Modeling (LFDM)Menu According to the data processing inequality, feature level contains more information than classifier level for visual recognition. Therefore, we extend the linear dependency model to feature level.2526Linear Feature Dependency Modeling (LFDM)Menu Given M feature vectors, the posterior probability of feature vectors can be rewritten by posterior of feature elements as in this equation. Similar to the derivation in the LCDM, the Linear Feature Dependency Model, LFDM is given by the prior probability plus the summation of dependency weight gamma_lmn multiplying the difference between the posterior of each feature entry and the prior. And, the posterior of each feature entry Pr(omega_l) given x_mn, can be calculated by one-dimensional density estimation method.26Objective function in LCDM27Model LearningMaximizing marginsNormalization constraintDependency model constraintMenu With training data, the optimal LCDM is learnt by maximizing the margins between genuine and poster fusion scores with the Normalization constraint and Dependency modeling constraint. In this optimization problem, rho is the margin and xi denote the slack variables, and beta are the dependency weights for the fusion model.27Objective function in LFDM

Solve by off-the-shelf techniques28Model LearningMenu Similarly, the objective function in LFDM is defined by this constrained linear programming problem, which can be solved by off-the-shelf techniques.2829Estimation Error AnalysisMenu Comparing the estimation errors in the LCDM and LFDM methods, we have the conclusion that the upper bound of the error factor in LFDM is smaller than that in LCDM. This means LFDM is better than LCDM in the worst case. 29Methods for comparisonIndependent assumption: Sum rule [TPAMI98]Normal assumption [TPAMI09]: Independent Normal (IN) and Dependent Normal (DN) combination rulesBoosting methods: LPBoost [ML02] and LP-B [ICCV09]Multiple kernel learning (MKL) [JMLR08]Support vector machines (SVM) as base classifierDatasets for evaluationSynthetic dataOxford 17 FlowerHuman Action

30ExperimentsMenu We compare the proposed method with both classifier level and feature level fusion methods. For the classifier level fusion methods, SVMs are used as base classifier for each feature. Synthetic data as well as real datasets are used for evaluation. 30Data setting: 4 kinds of distributionsIndependent Normal (IndNormal)Dependent Normal (DepNormal) Independent Non-Normal (IndNonNor) Dependent Normal (DepNonNor)Results: recognition rates

31Experiments with Synthetic data IN and DN methods outperform others under normal distributionsTest SumINDNLPBoostLP-BLCDMIndNormal95.340.7197.670.4097.560.4296.810.4897.510.4897.660.46DepNormal86.441.1692.520.9895.640.8995.290.8593.171.3093.880.93IndNonNorDepNonNorMenu Since it is impossible to know the underlying distributions in real datasets, we generate four kinds of distributions for evaluation. The recognition rates are recorded in this table. From this table, it is reasonable to see that the normal assumption based IN and DN combination rules outperform others under normal distributions.31Data setting: 4 kinds of distributionsIndependent Normal (IndNormal)Dependent Normal (DepNormal) Independent Non-Normal (IndNonNor) Dependent Normal (DepNonNor)Results: recognition rates

32Experiments with Synthetic data IN and DN methods outperform others under normal assumptionLCDM achieves the best results when the distributions are non-normalTest SumINDNLPBoostLP-BLCDMIndNormal95.340.7197.670.4097.560.4296.810.4897.510.4897.660.46DepNormal86.441.1692.520.9895.640.8995.290.8593.171.3093.880.93IndNonNor74.671.5084.801.6591.411.3890.101.6189.000.0893.000.07DepNonNor66.330.8965.460.9269.841.3568.951.3069.371.972.141.52Menu On the other hand, since the proposed LCDM is derived without normal assumption, it achieves the best results for non-normal distributions.32Data setting17 flowers with 80 images per category3 predefined splits with 17 40 for training, 17 20 for validation, and 1720 for testing7 kinds of features [CVPR06]Shape, color, texture, HSV, HoG, SIFT internal, and SIFT boundaryResults: recognition accuracy33Experiments with Oxford 17 Flower Dataset Method Accuracy Best Feature70.41.4Sum85.43.1LPBoost82.70.8LP-B85.52.4IN 85.51.7DN 84.21.9LCDM86.32.4

Example imagesFeature combination outperform single featureLCDM highest accuracyMenu The classifier level fusion methods are compared with the real Oxford 17 Flower Dataset. Some of the flower images in this dataset are shown in this figure. With the predefined setting, 7 kinds of features are used for experiments. The mean recognition accuracy and standard derivation of the 3 predefined splits are shown in this table. From this table, we can see that the fusion methods outperform the best single feature remarkably. And the proposed LCDM achieves the highest accuracy, which implies that the normal assumption is not valid in this real dataset.33Data settingWeizmannNine fold cross-validationKTHTraining (8 persons), validation (8 persons), and testing (9 persons)Space-time interest point (STIP) detection [VSPETS05]34Experiments with Human Action DatasetsSTIP detection example in WeizmannSTIP detection example in KTH

Menu For the weizmann and KTH human action datasets, space-time interest point detection method is employed to detect the STIPs as shown in these two videos.34Data settingWeizmannNine fold cross-validationKTHTraining (8 persons), validation (8 persons), and testing (9 persons)Space-time interest points (STIP) detection [VSPETS05]8 kinds of descriptors are computed on each STIPGray-scale intensityIntensity differenceHoF and HoG without gridHoF and HoG with 2D gridHoF and HoG with 3D grid8 kinds of features are generated by Bag-of-Words35Experiments with Human Action DatasetsMenu After interest point detection, eight kinds of descriptors are computed on each point and 8 kinds of features are generated by the Bag-of-Words approach.35Recognition accuracy (%)

LFDM outperforms othersFeature-level Improvement by LFDM is significant36Experiments with Human Action DatasetsMethod WeizmannKTHSum 84.4484.72IN 85.5684.26DN 84.4483.80LPBoost 83.3383.33LP-B 84.4485.19LCDM85.5685.19Method WeizmannKTHSum-F57.7878.70IN-F68.8977.31DN-F77.78---LPBoost-F68.8975.93LP-B-F 70.0076.56MKL 81.1182.42LFDM86.6788.43Classifier fusionFeature fusionMenu The classifier level and feature level fusion methods are compared in these two tables, respectively. From these two tables, we can see that the linear feature dependency modeling method, LFDM outperforms other methods including classifier level and feature level fusion algorithms. Comparing feature level fusion methods, the improvement by LFDM is significant.3638Problems in Linear Dependency Modeling (LDM)Dependency termMenu First, the LDM is derived by adding terms to the independent assumption based product formulation. Without independent assumption, the product formulation may not be true. Thus, product formulation may not be the best way to model dependency. Second, the assumption that posteriors will not deviate dramatically from priors may not be valid. With strong classifier, delta_lm measuring the difference between posterior and prior could be a large number. Therefore, a new dependency modeling method is proposed to remove these two assumptions.38Observation Independent fusion [TPAMI98]

39Analytic Dependency ModelingConstant w.r.t labelMenu The proposed method is based on the following observation. Under independency assumption, the joint distribution is equal to a constant with respect to label, multiplied by a function of the M posteriors.3940Analytic Dependency ModelingMenu Denote the constant as P_0 and the posterior of class label omega_l given feature vector x_m as s_lm. The joint distribution becomes the multiplication of P_0 and a function h_product of scores s_l1 to s_lM. Similarly, the linear dependency model can be rewritten as the same constant P0 multiplied by another function h_LCDN. 4041Analytic Dependency ModelingMenu Generally speaking, the score level fusion model can be defined as the constant P_0 multiplied by a general function h. To explicitly write out the function h, we propose to determine it by converged power series with weight vector alpha. Then, we further rearranged the analytic function h by each variable s_lm, i.e. the multivariate power series h is considered as a power series of variable s_lm with coefficient g_lmr, which is an analytic function of scores except s_lm. And alpha_lmr is the weight vector for analytic function g_lmr.4142Analytic Dependency ModelingTrivial solution to equation systemMenu By Bayes rule and marginal distribution property, a set of equations about alpha_lmr is derived. Since capital G_lmr is a linear function of weight vector alpha_lmr, alpha_lm0, alpha_lm2, alpha_lm3 equal to 0 is a trivial solution to the derived equation system.4243Analytic Dependency ModelingMenu And it can be proved that the independent condition is equivalent to the situation that solution to this equation system is trivial. Therefore, dependency can be modeled by setting non-trivial solution to the derived equation system.43Analytic function contains infinite number of coefficients44Reduced ModelMenu Since analytic function contains infinite number of coefficients, 44Analytic function contains infinite number of coefficientsApproximate by converged power series property

Reduced Analytic Dependency Model (RADM)45Reduced ModelMenu we reduce the analytic model by convergence property of power series and approximate the analytic function by the first capital K and R terms in the power series as in these two equations. Combining these two equations, the Reduced Analytic Dependency Model is given by this polynomial with variable order R and model order K.45Objective function of empirical classification error

Objective function of dependency model constraint

Final optimization problem with regularization term

Solve by setting the first derivative to zero

46Modeling LearningMenu The optimal RADM is learnt by minimizing empirical classification error and approximating dependency modeling constraint, i.e. the non-trivial solution constraint derived by marginal distribution property. Adding a regularization term to prevent from over fitting, the optimization problem is defined as this unconstrained quadratic programming problem, which can be solved by setting the first derivative to zero. 46Methods for comparisonSum rule [TPAMI98]Independent Normal (IN) combination rule [TPAMI09] Dependent Normal (DN) combination rule [TPAMI09]Multi-class LPBoost namely LP-B [ICCV09] Reduced multivariate polynomial (RM) [TCSVT04]Signal strength combination (SSC) [TNNLS12] Graph-regularized robust late fusion (GRLF) [CVPR12]Datasets for evaluationPASCAL VOC 2007Columbia Consumer Video (CCV)HOllywood Human Action (HOHA)

47ExperimentsMenu The proposed method is compared with 7 state-of-the-art score level fusion methods. And, three challenging datasets are used for evaluation.4748Experiments with VOC 2007 and CCV DatasetsMenu The fusion methods are evaluated with the VOC and CCV datasets. Using the default split, eight features for VOC and three features for CCV are used for experiments. 48Mean average precision (MAP)

49Experiments with VOC 2007 and CCV DatasetsMethod VOC2007CCVBest feature42.6350.81Sum44.3959.81IN45.2358.92DN46.5958.52LP-B49.3559.87RM50.4861.32SSC44.5059.61GRLF46.0060.61LCDM49.8961.20RADM52.0362.99RADM achieves highest MAPMenu The mean average precisions of the fusion methods are shown in this table. From this table, we can see that the proposed RADM achieves the highest MAP, which indicates that RADM can better model dependency by removing the assumptions in the LCDM method. 49Data settingHOHA dataset is used8 actionsAnswer Phone (AnP), Get out of Car (GoC), Hand Shake (HS), Hug Person (HP), Kiss (Ki), Sit Down (SiD), Sit Up (SiU), Stand Up (StU)Features Supervised spatio-temporal neighborhood topology learning (SSTNTL)8 kinds of space-time interest point (STIP) based features50RADM Fusion with SSTNTLSTIP detection examples

Menu In the last experiment, I would like to evaluate the RADM fusion performance by combining the proposed manifold based feature, SSTNTL, and eight kinds of STIP based feature as mentioned before. Example STIP detection results on HOHA dataset are shown in these two videos.50Results: per-class average precision (AP) and mean average precision (MAP)

RADM improves both per-class AP and MAP51RADM Fusion with SSTNTLMethod AnPGoCHSHPKiSiDSiUStUMAPBest STIP29.443.127.130.830.742.643.141.136.0STIP fusion38.548.545.635.841.845.449.554.144.9SSTNTLFusion (all)Menu The per-class average precision and mean average precision are shown in this table. From this table, we can see that RADM improve both per-class AP by combining the eight kinds of STIP based features.51Results: per-class average precision (AP) and mean average precision (MAP)

RADM improves both per-class AP and MAPSSTNTL outperforms best STIP and close to STIP fusion52RADM Fusion with SSTNTLMethod AnPGoCHSHPKiSiDSiUStUMAPBest STIP29.443.127.130.830.742.643.141.136.0STIP fusion38.548.545.635.841.845.449.554.144.9SSTNTL40.062.544.438.144.230.244.452.444.5Fusion (all)Menu Compared with the proposed SSTNTL feature, it can be seen that SSTNTL is a discriminative feature, so that it outperforms the best STIP based feature and its performance is close to that of the STIP fusion.52Results: per-class average precision (AP) and mean average precision (MAP)

RADM improves both per-class AP and MAPSSTNTL outperforms best STIP and close to STIP fusionRADM improves the performance by fusing with discriminative SSTNTL53RADM Fusion with SSTNTLMethod AnPGoCHSHPKiSiDSiUStUMAPBest STIP29.443.127.130.830.742.643.141.136.0STIP fusion38.548.545.635.841.845.449.554.144.9SSTNTL40.062.544.438.144.230.244.452.444.5Fusion (all)42.362.454.840.653.245.750.055.150.5Menu Combining all the nine features, the RADM can further improve the performance by fusing with the discriminative SSTNTL feature.53Supervised spatio-temporal neighborhood topology learning (SSTNTL)Global constraint of temporal labels helps to recognize actions in videosLinear dependency modeling (LCDM and LFDM)Modeling dependency improves recognition performanceModeling dependency in feature level is betterReduced analytic dependency modeling (RADM)Reducing fusion assumptions further improves recognition performance55Conclusions Menu In this presentation, I have talked about a feature extraction method for video applications and two dependency modeling frameworks for general visual applications. From the feature extraction method, SSTNTL, we can see that global constraint of temporal labels helps to recognize actions in videos more accurately. From the linear dependency model, we have the conclusion that explicitly modeling dependency can improve the recognition performance. And the dependency modeling in feature is better than that in classifier level. Removing the assumptions in the linear dependency model, the RADM method showed that performance can be further improved by removing fusion assumptions.55JournalAndy J Ma and Pong C Yuen, "Reduced Analytic Dependency Modeling: Robust Fusion for Visual Recognition," submitted to IJCV, 2013.Andy J Ma, Pong C Yuen, and Jian-Huang Lai, "Linear Dependency Modeling for Classifier Fusion and Feature Combination," IEEE TPAMI, vol. 35, no. 5, pp. 1135-1148, 2013.Andy J Ma, Pong C Yuen, Weiwen Zou, and Jian-Huang Lai, "Supervised Spatio-Temporal Neighborhood Topology Learning for Action Recognition," IEEE TCSVT, vol. 23, no. 8, pp. 1447-1460, 2013.Conference Andy J Ma and Pong C Yuen, "Reduced Analytical Dependency Modeling for Classier Fusion," ECCV, 2012.Andy J Ma and Pong C Yuen, "Linear Dependency Modeling for Feature Fusion," ICCV, 2011.56PublicationsMenu Here is a list of my publications related to this presentation.56Thank you!57Q & A

The EndMenu Thanks for your attention.57