27
WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICSS WITH E- RESEARCH TOOLS Understanding WeboNaver : API-based search tool for the Naver search engine Han Woo PARK Associate Professor Dept of Media & Communication, YeungNam University, S.Korea Director of WCU Webometrics Institute http://english-webometrics.yu.ac.kr han [email protected] http://www.hanpark.net Se Jung Park Ph.D. Student, Dept of Media and Communication, YeungNam University. S.Korea David Stuart Honorary Research Fellow, Statistical Cybermetrics Research Group, University of Wolverhampton, Wulfruna Street, Wolverhampton, UK. Seung Wook Lee MA Student, Dept of Information and Communication of Engineering, YeungNam University, S.Korea 박박박 , 박박박 , David Stuart, 박박박 (2010). API 박 박박박 박박 박박박박 WeboNaver 박 박박박 박박 : 18 박 박박박박박 박 박박박 박박 . Journal of the Korean Data Analysis Society. 11 박 6 박 (B).

Understanding WeboNaver

Embed Size (px)

DESCRIPTION

With the rise of Web 2.0, API-based software has appeared. This article examines the API-based search tool created for the Korean search engine Naver: Webonaver (Webometrics Tool for Naver). The software is able to collect large amounts of data automatically and can easily distinguish between different types of information on the web, which was impossible before. In particular, Internet researchers can improve efficiency of data analysis within a specified timeframe using this tool. This paper illustrates how to use WeboNaver and tries to verify the usability and reliability through several case studies. In this article, Korean National Assembly Members’ web presence was analyzed, as was the web presence of the term H1N1. Web 2.0의 도래와 함께 Open API를 응용한 소프트웨어 프로그램이 등장하면서 더 이상 사용자들은 웹에서 정보를 수동으로 검색하면서 일일이 살펴보는 번거로움을 겪지 않아도 된다. 공개된 API를 활용해 몇 번의 간단한 조작으로 방대한 데이터를 체계적으로 수집하고 관리할 수 있다. 본 논문은 Open API를 응용해 개발한 검색전문 프로그램 WeboNaver(Webometrics Tool for Naver)를 소개한다. 이는 한국에서 가장 영향력 있는 검색엔진 중의 하나인 네이버를 이용해 방대한 데이터를 카테고리별로 자동수집하여 저장해주는 프로그램이다. 연구자들은 이를 활용해 데이터 관리와 처리, 분석 과정에 정확성과 고도의 효율성을 기할 수 있을 것이다. 논문의 목적은 WeboNaver의 사용을 원하는 학생, 일반인, 연구자의 이해를 돕고자 실제 사례들을 통하여 분석절차를 구체적으로 제시해 그 유용성을 입증하는 것이다. 이 프로그램을 사용하여 18대 국회의원 292명의 웹가시성을 조사하였다. 또한 신종플루와 관련된 단어들의 웹 가시성을 분석하였다.

Citation preview

Page 1: Understanding WeboNaver

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Understanding WeboNaver: API-based search tool for the Naver search engine

Han Woo PARKAssociate Professor

Dept of Media & Communication, YeungNam University, S.KoreaDirector of WCU Webometrics Institute

http://english-webometrics.yu.ac.kr [email protected]

http://www.hanpark.net

Se Jung ParkPh.D. Student, Dept of Media and Communication, YeungNam University. S.Korea

David StuartHonorary Research Fellow, Statistical Cybermetrics Research Group, University of Wolverhampton, Wulfruna Street, Wolverhampton, UK.

Seung Wook LeeMA Student, Dept of Information and Communication of Engineering, YeungNam University, S.Korea

박한우 , 박세정 , David Stuart, 이승욱 (2010). API 를 활용한 검색 프로그램 WeboNaver 의 이해와 적용 : 18 대 국회의원의 웹 가시성 분석 . Journal of the Korean Data Analysis Society. 11 권 6 호 (B).

Page 2: Understanding WeboNaver

2

The purpose of this paper is to introduce the API-based webometrics tool created for the Korean search engine Naver

This non-commercial software is designed to collect large amounts of data automatically and can easily distinguish between different types of information on the web, which was impossible before.

(Image Source: Newsweek, 5 Nov 2007)

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Webonaver (Webometrics Tool for Naver)Webonaver (Webometrics Tool for Naver)

Page 3: Understanding WeboNaver

3

Rationale for the Naver

• “Republic of Naver” (Kim & Sohn, 2007)

• CEO Google, 30 May 2007)• “Korea is a great laboratory of the digital age.”

(Eric Schmidt,

• “Korea’s Naver is now the world’s 5th search service provider, behind Google, Yahoo, Baidu and Microsoft.” (The AP, 9 Oct 2007)

(Image Source: Newsweek, 5 Nov 2007)

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Page 4: Understanding WeboNaver

4

Rationale for the Naver

• “Google left behind as Koreans Naver-gate the internet” (Financial Times, 2 Jan 2008)

• “IN SOUTH KOREA People who want to looksomething up on the internet don’t “Google it”. Instead they “ask Naver”. (Economist, 30 Feb 2009)

• Yeon-Ok Lee and Park. H. W., (2008). "The Importance of Search Engines in Digital News Consumption A Comparative Study Between South Korea and the UK". refereed paper presented at the Workshop “Gatekeepers in a Digital Asian-European Media Landscape: The rising structural power of Internet search engines”(2008).

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Page 5: Understanding WeboNaver

5

Component of Naver

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Log-in

The articles title (changing automatically)

The press linkedToday’s issues

Quick menubrowser window

Page 6: Understanding WeboNaver

6

Terms of Use

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Page 7: Understanding WeboNaver

7

Interface

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

The interface is fairly self-explanatory:

-Tick or untick to collect either only hit number or the title, URL, and description of the results

- Select which of the search options you want to include

- Click on the '...' button to select the text file that contains the queries you wish to run

- Click 'Run Queries'

The interface is fairly self-explanatory:

-Tick or untick to collect either only hit number or the title, URL, and description of the results

- Select which of the search options you want to include

- Click on the '...' button to select the text file that contains the queries you wish to run

- Click 'Run Queries'

Page 8: Understanding WeboNaver

8

Getting an API key

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

URL : http://dev.naver.com/openapi/register**This page need the Login

Page 9: Understanding WeboNaver

9

Search query limit

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

API Key

Search query limit per day: 25,000

Page 10: Understanding WeboNaver

10

Input file

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

*.txt file must encoded by UTF-8 type

Page 11: Understanding WeboNaver

11

Output file

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

When you tick it to collect only hit number, it shows the number of web pages containing search query

Page 12: Understanding WeboNaver

Pilot comparison between API and Manual Webpage search

12

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Comparison was done using randomly selected Korean politicians as of 12 Aug 2009

Page 13: Understanding WeboNaver

13

• There are always differences between the

API’s results and the normal search results, but these are miniscule differences in comparison with the APIs of Google, Yahoo and Bing

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Pilot comparison between API’s and Manual

Page 14: Understanding WeboNaver

14

Output file

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

When you tick it to parse the title, URL, and description of theresults (except for images which doesn’t have) the number of web pages containing search query

Page 15: Understanding WeboNaver

15

Issues in multiple hits

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

The multiple hits reflect the numbers displayed in pages of the Naver. It’s not program error. The Webonaver show actual different numbers of the results from the Naver. The last hits show actual results Webonaver can extract. It’s not the only case of the Naver, but also other search engines such as yahoo display different results as an each webpage.

Page 16: Understanding WeboNaver

16

Issues in multiple hits example

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Page 17: Understanding WeboNaver

17

Issues in multiple hits example

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Page 18: Understanding WeboNaver

18

Issues in multiple hits example

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Page 19: Understanding WeboNaver

19

Output file

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

The Naver API doesn't provide the real URL. Rather it supplies a 'Naver API’. But, user can visit an actual site when they put the api-based URL to the window browser of the Naver.

Page 20: Understanding WeboNaver

There are programming and time issues, but there are also issues regarding how service provider responds.

Naver is more protective of handing out API keys, in comparison with yahoo or Bing which provide Open APIs.

It is important that services are not abused.

20

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Ethical issues

Page 21: Understanding WeboNaver

Korean National Assembly Members’ web presence

• Web presence of Korean 18th congressman between 27th August and 24th September is automatically collected using WeboNaver at an interval of a week.

• 20 members’ web presence including blog, scholar, news and web documents is visualized.

21

Page 22: Understanding WeboNaver

22

Page 23: Understanding WeboNaver

• web presence of the term H1N1 is examined using Webonaver. We tested the usability and reliability of this tool.

Queres: 신종플루 (A virus subtype H1N1) 신종 인플루엔자 (Influenza A virus subtype H1N1) 신종인플루엔자 (Influenza A virus subtype H1N1)

• Users can get same results from certain words containing space character and the one without space using WeboNaver.

• But, it can not assume similar words as same. Users should consider which specific data they want to extract before using this tool.

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS Web presence of the term H1N1

23

Page 24: Understanding WeboNaver

24

Page 25: Understanding WeboNaver

• API-based search tool such as WeboNaver enables researchers to capture massive web data automatically and systemically.

• This software can help researchers improve efficiency of data analysis within a specified timeframe.

• Web data gathered using this tool can also be visualized to describe online trend.

25

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Conclusion

Page 26: Understanding WeboNaver

• Mayr, Philipp; Tosques, Fabio (2005).

Google Web APIs - An instrument for webometric analyses? pp. 677-678. In: Ingwersen, Peter; Larsen,Birger (eds.): 10th International Conference of theInternational Society for Scientometrics and informetrics.Stockholm (Sweden)

http://www.ib.hu-berlin.de/~mayr/arbeiten/ISSI2005_Mayr_Toques.pdf

• The project site with the APIs demos is offline thesedays. But they are recovering the demos soon. http://bsd119.ib.hu-berlin.de/%7Eft/

26

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Reference (Web APIs)

Page 27: Understanding WeboNaver

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

This tool is publicly available. Please check out WWI official website. http://www.hanpark.net →software

27