Relevance and Quality of Health Information on the Web
Tim TangDCS Seminar
October, 2005
2
Outlines
• Motivation - Aims• Experiments & Results
– Domain specific vs. general search– A Quality Focused Crawler
• Conclusion & Future work
3
Why health information on the Web?
• Internet is a free medium• High user demand for health information• Health information of various quality• Incorrect health advice is dangerous
4
Problems
• Normal definition of relevance: Topical relevance• Normal way to search: Word-matching
Q: Are these applicable to health information?
A: Not complete, we also need quality = usefulness of the information
5
Problem: Quality of health info
The various quality of health information in search results
6
Wrong advice
7
Dangerous information
8
Dangerous information
9
Dangerous Information
10
Problem: Commercial sites
Health information for commercial purposes
11
Commercial promotion
12
Problem: Types of search engine
The difference between domain-specific search and general-purpose search.
13
Querying BPS
14
Querying Google: Irrelevant information
15
Problem of domain-specific portals
Domain-specific portals may be good, but …
It often requires intensive effort to build and maintain (will be discussed more in experiment 2)
16
Aims
• To analyse the relative performance of domain specific and general purpose search engines
• To discover how to provide effective domain specific search, particularly in the health domain
• To automate the quality assessment of medical web sites
17
Two experiments
• First: Compare search results for health info between general and domain specific engines
• Second: Build and evaluate a Quality focused crawler for a health topic
18
The First Experiment
A Comparison of the relative performance of general purpose search engines and domain-
specific search engines
In Journal of Information Retrieval ‘05 – Special Issue
with Nick Craswell, Dave Hawking, Kathy Griffiths and Helen Christensen
19
Domain specific vs. General engines
• General search engines: Google, Yahoo, MSN search, …
• Domain specific: Search service for scientific papers, search service for health, or a topic in the health domain.
• A depression portal: BluePages (http://bluepages.anu.edu.au)
20
BluePages Search (BPS)
21
BPS result list
22
Engines
– Google– GoogleD (Google with “depression”)– BPS– 4sites (4 high quality depression sites)– HealthFinder (HF): A health portal search named
Health Finder– HealthFinderD (HFD): HF with depression
23
Queries
• 101 queries about depression:– 50 treatment queries suggested by domain experts– 51 non-treatment queries collected from 2 query logs:
domain-specific query log and general query log.• Examples:
– Treatment queries: acupuncture, antidepressant, chocolate
– Non-treatment queries: depression symptoms, clinical depression
24
Experiment details
• Run the 101 queries on the 6 engines.• For each query, top 10 results from each engine are
collected.• All results were judged by research assistants: degrees
of relevance, recommendation of advice• Relevance and quality for all engines were then
compared
25
Results
Engine Relevance Quality
GoogleD0.407 78
BPS0.319 127
4sites0.225 143
Google0.195 28
HFS0.0756 0
26
Findings
• Google is not good in either relevance or quality • GoogleD can retrieve more relevant pages, but less high
quality pages. • 4sites and BPS provide good quality but have poor
coverage.
It’s important to have a domain-specific portal which provides both high quality and high coverage. How to improve coverage?
27
Experiment 2
Building a high quality domain-specific portal using focused crawling techniques
In CIKM ’05
With Dave Hawking, Nick Craswell, Kathy Griffiths
28
A Quality Focused Crawler
• Why?– The first experiment shows: Quality can be achieved
using domain specific portals– The current method for building such a portal is
expensive.– Focused crawling may be a good way to build a
health portal with high coverage, while reducing human effort.
29
The problems of BPS
• Manual judgments of health sites by domain experts for two weeks to decide what to include.
• 207 Web sites are included, i.e., a lot of useful web pages are left out.
• Tedious maintenance process: Web pages change, cease to exist, new pages, etc.
• Also, the first experiment shows: High quality but quite low coverage.
30
Focused Crawling (FC)
• Designed to selectively fetch content relevant to a specified topic of interest using the Web’s hyperlink structure.
• Examples of topics: sport, health, cancer, or scientific papers, etc.
31
FC Process
URL Frontier
Link extractorDownload
Classifier
{URLs, link info}
dequeue
{URLs, scores}
enqueue
Link info = anchor text, URL, source page’s content, so on.
32
FC: simple example• Crawling pages about psychotherapy
33
Relevance prediction
• anchor text: text appearing in a hyperlink• text around the link: 50 bytes before and after the link• URL words: parse the URL address
34
Relevance Indicators
• URL: http://www.depression.com/psychotherapy.html
=> URL words: depression, com, psychotherapy
• Anchor text: psychotherapy• Text around the link:
– 50 bytes before: section, learn
– 50 bytes after: talk, therapy, standard, treatment
35
Methods
• Machine learning approach: Train and test relevant and irrelevant URLs using the discussed features.
• Evaluated different learning algorithms: k-nearest neighbor, Naïve Bayes, C4.5, Perceptron.
• Result: The C4.5 decision tree was the best to predict relevance.
• The same method applied to predict quality but not successful!!!
36
Quality prediction
• Using evidence-based medicine, and
• Using Relevance Feedback (RF) technique
37
Evidence-based Medicine
• Interventions that are supported by a systematic review of the evidence as effective.
• Examples of effective treatments for depression:– Antidepressants– ECT (electroconvulsive therapy)– Exercise– Cognitive behavioral therapy
• These treatments were divided into single and 2-word terms.
38
Relevance Feedback
• Well-known IR approach of query by examples.• Basic idea: Do an initial query, get feedback from users
about what documents are relevant, then add words from relevant document to the query.
• Goal: Add terms to the query in order to get more relevant results.
39
RF Algorithm
• Identify the N top-ranked documents• Identify all terms from these documents• Select the terms with highest weights• Merge these terms with the original query• Identify the new top-ranked documents for the new query
(Usually, 20 terms are added in total)
40
Our Modified RF approach
• Not for relevance, but Quality• No only single terms, but also phrases• Generate a list of single terms and 2-word phrases and
their associated weights • Select the top weighted terms and phrases• Cut-off points at the lowest-ranked term that appears in
the evidence-based treatment list• 20 phrases and 29 single words form a ‘quality query’
41
Terms represent topic “depression”Term WeightDepression 13.3
Health 6.9
Treatment 5.7
Mental 5.4
patient 3.3
Medication 3
ECT 2.4
antidepressants 1.9
Mental health 1.2
Cognitive therapy 0.84
42
Predicting Quality
• For downloaded pages, quality score (QScore) is computed using a modification of the BM25 formula, taking into account term weights.
• Quality of a page is then predicted based on the quality of all downloaded pages linking to it.
(Assumption: Good pages are usually inter-connected)• Predicted quality score of a page with n downloaded
source pages:
PScore = ΣQScore/n
43
Combining relevance and quality
• Need to have a way of balancing relevance and quality• Quality and relevance score combination is new• Our method uses a product of the two scores• Other ways to combine these scores will be explored in
future work• A quality focused crawler rely on this combined score to
order the crawl queue
44
The Three Crawlers
• A Web crawler (spider): – A program which browses the WWW in a methodical, automated
manner– Usually used by a search engine to index web pages to provide
fast searches.• We built three crawlers:
– The Breadth-first crawler: Traverses the link graph in a FIFO fashion (serves as baseline for comparison)
– The Relevance crawler: For relevance purpose, ordering the crawl queue using the C4.5 decision tree
– The Quality crawler: For both relevance and quality, ordering the crawl queue using the combination of the C4.5 decision tree and RF techniques.
45
Results
46
Relevance
47
Relevance Results
• The relevance and quality crawls each stabilised after 3000 pages, at 80% and 88% relevance respectively.
• The BF crawl continued to degrade over time, and down to 40% at 10,000 pages.
• The quality crawler outperformed the relevance crawler due to the incorporation of the RF quality scores.
48
Quality
49
High quality pages
AAQ = Above Average Quality: top 25%
50
Low quality pages
BAQ = Below Average Quality: bottom 25%
51
Quality Results
• The quality crawler performed significantly better than the relevance crawler. (50% better towards the end of the crawl)
• All the crawls did well in crawling high quality pages. The quality crawler performed very well, with more than 50% of its pages being high quality.
• The quality crawl only has about 5% pages from low quality sites while the BF crawl has about 3 times higher.
52
Findings
• Topical-relevance could be well predicted using link anchor context.
• Link anchor context could not be used to predict quality.
• Relevance feedback technique proved its usefulness in quality prediction.
53
Overall Conclusions
• Domain-specific search engines could offer better quality of results than general search engines.
• The current way to build a domain-specific portal is expensive. We have successfully used focused crawling techniques, relevance decision tree and relevance feedback technique to build high-quality portals cheaply.
54
Future works
• So far we only experimented in one health topic. Our plan is to repeat the same experiments with another topic, and generalise the technique to another domain.
• Other ways of combining relevance and quality should be explored.
• Experiments to compare our quality crawl with other health portals is necessary.
• How to remove spam from the crawl is another important step.