20
Movie Title Generation Group 2 0556611 0557205 0656618 0316213

Movie Title Generation

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Movie Title Generation

Movie Title Generation

Group 20556611 ��� 0557205 ���0656618 �� 0316213 ��

Page 2: Movie Title Generation

Task 1: Movie Title GenerationOverview:2�)<HG��Jo,>q9M�q#�J��O;�>qSc�-;1"�C;��qJa�'2�R]�(�TN9��%J>q�I;-$jJo,p\Z model ��B�>q�/���i[�

!A2�k'�e��8�➊�q0➋#�0fX_^�r���FWVn 1000 #hsL3o,J&*@�5U�4�:T Ybo,Jml&�Unigram����`P�Bigram��:.�4�&*:KdJ+ho,��g�o,J>qD?=��E`0Q67J&`f��HG�7J>q�

Page 3: Movie Title Generation

Task 1: Movie Title GenerationProcedures:

1. Parse movie title and subtitle

2. Find keywords in movie subtitle

3. Movie similarity calculation

4. Find the top-10 best match titles with the highest similarity and generate templates

5. Filled keywords into chosen templates to create movie title

6. Evaluate the best title with evaluation method

Page 4: Movie Title Generation

~1000 Movie subtitle w/ title

Movie subtitle w/o title

Keywords Keyphrases

Most-similar movie title

unigram with highestTextRank

bigram with highest tf–idf

+

cosine similarity of the top 300 tf–idf words

pos tagging template

cloze

+ + → generated title

Page 5: Movie Title Generation

POS template generation● Keyword : Calculate the possible keyword of movie title from the movie

title database (with frequency)● POS :

○ Find the possible POS pattern of the movie title with keyword

○ Find the possible pattern of the movie title without keyword

� NN,�,NN NN,�� NN,��,NN ��,NN NN,VV,�

�� ���,�,�"� #$,�� ���,��,� ��,��� ��,!�,�

� NN,NN NN,VV JJ,NN NN,CD

�� ��%,�&� ��,� � ,� ��,47

Page 6: Movie Title Generation

Determine Important Terms Using Two Metrics● input: parsed movie subtitle● informativeness

○ keyword extraction○ TextRank with stop words○ give a score to each term

● phraseness○ find co-occurring terms (bigrams)

○ give a score to each bigram● sum up these two scores

● example of generated phrases

● example of generated keywords

Page 7: Movie Title Generation

Movie Similarity Calculation● Select 300 most frequent words of each movie subtitle● Calculate the tf–idf of the 300 words in each movie● Find top 10 most similar movies using cosine similarity

Page 8: Movie Title Generation

Apply Templates● Use similar movie titles as templates.

○ ex: ����� → �� (��) � (��) ���● Fill in the blanks with terms that match the POS of the blanks.

○ highest-scored term/bigram first● Calculate the score of the title weighted by

○ scores of informativeness and phraseness○ length penalty○ popularity of the template ○ word vector distance to known popular title

● Select the title that has the highest score.

Page 9: Movie Title Generation

Title Generated with Keyword & Keyphrases● We also use keywords and keyphrases with highest scores as movie titles.● So we will have many candidate titles generated from POS templates,

keywords and keyphrases.● We will use task 2 to evaluate those titles to choose the best one.

Page 10: Movie Title Generation

Task 2: Movie Title Evaluation����������������

1. check if the POS tagging match our POS title templates2. check if the length of the title is adequate3. check if the words in the title is included in our popular movie list4. if the generated titles receive same points, select the generated title with

higher word vector similarity to the known popular movie titles

������������…

Page 11: Movie Title Generation

Overview:C��6�92#��%�;:�<� feature �E1= @(,G

#� 1. �<+2*1����-.�����2>?

#� 2. �<+2>"*'B! ��0

#� 3. �<+2�7��42/EF�3�

#� 4. �<+2�7�D�&)�5

#� 5. &<A)�4/EF�2�7

#� 6. &<>$�8��0

Page 12: Movie Title Generation

� 1. �7&2%0����+-����28:● ! �0<>��?.AB��,;"����=5)8��'9(C0�28:��4.AB�082@1/�D��6�,�+-�28

● feature: joint_percentage9(C#8��&��*�$��4.AB�083�

Page 13: Movie Title Generation

�� 2. �2�*3���;����(

● �� )69��:'<=�&5�����8-#3�1�$�+3�" (e.g. NN,NN) �*,%�

● 3��;%7�/*�.��0&%7�*!>

● feature: pos_percentage4!>*3�" ��'<=��*%�

Page 14: Movie Title Generation

� 3. ����������������● �����������

�������● feature: exactly_same

○ if same: 1○ else: 0

Page 15: Movie Title Generation

� 4. �9*0�3�<�")�2● �1�&>��30��<�)5.19�● +%�. '��(��":;���$�94)/���= �7�-�5��#�!��6�0,�&8��&

Page 16: Movie Title Generation

�� 5. �����������● �imdb���������������● feature: Title_IMDB, Title_Top500

Page 17: Movie Title Generation

� 6. �$%��!���● ���%�(����'��#%���● �word2vec����&�'�������& �"�

● feature: mv_1, sim_1

Page 18: Movie Title Generation

��-feature��5��� 1. �.!' &����"#�����'/0

�� 2. �.!'/� �2���%

�� 3. �.!'�+ �)'$46�(

�� 4. �.!'�+�3����*

�� 5. �.1��)$46�'�+

�� 6. �./��,�%

joint_percentage

pos_percentage

exactly_same

sim_1

Title_IMDB Title_Top500

len_score

Page 19: Movie Title Generation

� ��03��4 feature %�!1 normalize to 0 ~ 1��$-)�*��Sum������� �2�# /.��'%3����+,&… etc.��"����%��(�5��

Page 20: Movie Title Generation

References

● Farnaz Ronaghi Khameneh, “Automatic Title Generation,” Computational

Approaches to Digital Stewardship, Stanford University, 2009. [Online].

Available: http://cads.stanford.edu/projects/presentations/summer-2009/

Farnaz%20-%20Automatic%20title%20generation.pdf. [Accessed: Nov. 30,

2017].