Download ppt - Relevance and Quality of Health Information on the Web Tim Tang DCS Seminar October, 2005

Relevance and Quality of Health Information on the Web

Tim TangDCS Seminar

October, 2005

2

Outlines

• Motivation - Aims• Experiments & Results

– Domain specific vs. general search– A Quality Focused Crawler

• Conclusion & Future work

3

Why health information on the Web?

• Internet is a free medium• High user demand for health information• Health information of various quality• Incorrect health advice is dangerous

4

Problems

• Normal definition of relevance: Topical relevance• Normal way to search: Word-matching

Q: Are these applicable to health information?

A: Not complete, we also need quality = usefulness of the information

5

Problem: Quality of health info

The various quality of health information in search results

6

Wrong advice

7

Dangerous information

8

Dangerous information

9

Dangerous Information

10

Problem: Commercial sites

Health information for commercial purposes

11

Commercial promotion

12

Problem: Types of search engine

The difference between domain-specific search and general-purpose search.

13

Querying BPS

14

Querying Google: Irrelevant information

15

Problem of domain-specific portals

Domain-specific portals may be good, but …

It often requires intensive effort to build and maintain (will be discussed more in experiment 2)

16

Aims

• To analyse the relative performance of domain specific and general purpose search engines

• To discover how to provide effective domain specific search, particularly in the health domain

• To automate the quality assessment of medical web sites

17

Two experiments

• First: Compare search results for health info between general and domain specific engines

• Second: Build and evaluate a Quality focused crawler for a health topic

18

The First Experiment

A Comparison of the relative performance of general purpose search engines and domain-

specific search engines

In Journal of Information Retrieval ‘05 – Special Issue

with Nick Craswell, Dave Hawking, Kathy Griffiths and Helen Christensen

19

Domain specific vs. General engines

• General search engines: Google, Yahoo, MSN search, …

• Domain specific: Search service for scientific papers, search service for health, or a topic in the health domain.

• A depression portal: BluePages (http://bluepages.anu.edu.au)

20

BluePages Search (BPS)

21

BPS result list

22

Engines

– Google– GoogleD (Google with “depression”)– BPS– 4sites (4 high quality depression sites)– HealthFinder (HF): A health portal search named

Health Finder– HealthFinderD (HFD): HF with depression

23

Queries

• 101 queries about depression:– 50 treatment queries suggested by domain experts– 51 non-treatment queries collected from 2 query logs:

domain-specific query log and general query log.• Examples:

– Treatment queries: acupuncture, antidepressant, chocolate

– Non-treatment queries: depression symptoms, clinical depression

24

Experiment details

• Run the 101 queries on the 6 engines.• For each query, top 10 results from each engine are

collected.• All results were judged by research assistants: degrees

of relevance, recommendation of advice• Relevance and quality for all engines were then

compared

25

Results

Engine Relevance Quality

GoogleD0.407 78

BPS0.319 127

4sites0.225 143

Google0.195 28

HFS0.0756 0

26

Findings

• Google is not good in either relevance or quality • GoogleD can retrieve more relevant pages, but less high

quality pages. • 4sites and BPS provide good quality but have poor

coverage.

It’s important to have a domain-specific portal which provides both high quality and high coverage. How to improve coverage?

27

Experiment 2

Building a high quality domain-specific portal using focused crawling techniques

In CIKM ’05

With Dave Hawking, Nick Craswell, Kathy Griffiths

28

A Quality Focused Crawler

• Why?– The first experiment shows: Quality can be achieved

using domain specific portals– The current method for building such a portal is

expensive.– Focused crawling may be a good way to build a

health portal with high coverage, while reducing human effort.

29

The problems of BPS

• Manual judgments of health sites by domain experts for two weeks to decide what to include.

• 207 Web sites are included, i.e., a lot of useful web pages are left out.

• Tedious maintenance process: Web pages change, cease to exist, new pages, etc.

• Also, the first experiment shows: High quality but quite low coverage.

30

Focused Crawling (FC)

• Designed to selectively fetch content relevant to a specified topic of interest using the Web’s hyperlink structure.

• Examples of topics: sport, health, cancer, or scientific papers, etc.

31

FC Process

URL Frontier

Link extractorDownload

Classifier

{URLs, link info}

dequeue

{URLs, scores}

enqueue

Link info = anchor text, URL, source page’s content, so on.

32

FC: simple example• Crawling pages about psychotherapy

33

Relevance prediction

• anchor text: text appearing in a hyperlink• text around the link: 50 bytes before and after the link• URL words: parse the URL address

34

Relevance Indicators

• URL: http://www.depression.com/psychotherapy.html

=> URL words: depression, com, psychotherapy

• Anchor text: psychotherapy• Text around the link:

– 50 bytes before: section, learn

– 50 bytes after: talk, therapy, standard, treatment

35

Methods

• Machine learning approach: Train and test relevant and irrelevant URLs using the discussed features.

• Evaluated different learning algorithms: k-nearest neighbor, Naïve Bayes, C4.5, Perceptron.

• Result: The C4.5 decision tree was the best to predict relevance.

• The same method applied to predict quality but not successful!!!

36

Quality prediction

• Using evidence-based medicine, and

• Using Relevance Feedback (RF) technique

37

Evidence-based Medicine

• Interventions that are supported by a systematic review of the evidence as effective.

• Examples of effective treatments for depression:– Antidepressants– ECT (electroconvulsive therapy)– Exercise– Cognitive behavioral therapy

• These treatments were divided into single and 2-word terms.

38

Relevance Feedback

• Well-known IR approach of query by examples.• Basic idea: Do an initial query, get feedback from users

about what documents are relevant, then add words from relevant document to the query.

• Goal: Add terms to the query in order to get more relevant results.

39

RF Algorithm

• Identify the N top-ranked documents• Identify all terms from these documents• Select the terms with highest weights• Merge these terms with the original query• Identify the new top-ranked documents for the new query

(Usually, 20 terms are added in total)

40

Our Modified RF approach

• Not for relevance, but Quality• No only single terms, but also phrases• Generate a list of single terms and 2-word phrases and

their associated weights • Select the top weighted terms and phrases• Cut-off points at the lowest-ranked term that appears in

the evidence-based treatment list• 20 phrases and 29 single words form a ‘quality query’

41

Terms represent topic “depression”Term WeightDepression 13.3

Health 6.9

Treatment 5.7

Mental 5.4

patient 3.3

Medication 3

ECT 2.4

antidepressants 1.9

Mental health 1.2

Cognitive therapy 0.84

42

Predicting Quality

• For downloaded pages, quality score (QScore) is computed using a modification of the BM25 formula, taking into account term weights.

• Quality of a page is then predicted based on the quality of all downloaded pages linking to it.

(Assumption: Good pages are usually inter-connected)• Predicted quality score of a page with n downloaded

source pages:

PScore = ΣQScore/n

43

Combining relevance and quality

• Need to have a way of balancing relevance and quality• Quality and relevance score combination is new• Our method uses a product of the two scores• Other ways to combine these scores will be explored in

future work• A quality focused crawler rely on this combined score to

order the crawl queue

44

The Three Crawlers

• A Web crawler (spider): – A program which browses the WWW in a methodical, automated

manner– Usually used by a search engine to index web pages to provide

fast searches.• We built three crawlers:

– The Breadth-first crawler: Traverses the link graph in a FIFO fashion (serves as baseline for comparison)

– The Relevance crawler: For relevance purpose, ordering the crawl queue using the C4.5 decision tree

– The Quality crawler: For both relevance and quality, ordering the crawl queue using the combination of the C4.5 decision tree and RF techniques.

45

Results

46

Relevance

47

Relevance Results

• The relevance and quality crawls each stabilised after 3000 pages, at 80% and 88% relevance respectively.

• The BF crawl continued to degrade over time, and down to 40% at 10,000 pages.

• The quality crawler outperformed the relevance crawler due to the incorporation of the RF quality scores.

48

Quality

49

High quality pages

AAQ = Above Average Quality: top 25%

50

Low quality pages

BAQ = Below Average Quality: bottom 25%

51

Quality Results

• The quality crawler performed significantly better than the relevance crawler. (50% better towards the end of the crawl)

• All the crawls did well in crawling high quality pages. The quality crawler performed very well, with more than 50% of its pages being high quality.

• The quality crawl only has about 5% pages from low quality sites while the BF crawl has about 3 times higher.

52

Findings

• Topical-relevance could be well predicted using link anchor context.

• Link anchor context could not be used to predict quality.

• Relevance feedback technique proved its usefulness in quality prediction.

53

Overall Conclusions

• Domain-specific search engines could offer better quality of results than general search engines.

• The current way to build a domain-specific portal is expensive. We have successfully used focused crawling techniques, relevance decision tree and relevance feedback technique to build high-quality portals cheaply.

54

Future works

• So far we only experimented in one health topic. Our plan is to repeat the same experiments with another topic, and generalise the technique to another domain.

• Other ways of combining relevance and quality should be explored.

• Experiments to compare our quality crawl with other health portals is necessary.

• How to remove spam from the crawl is another important step.