20
Presented By: Somnath Mazumdar [email protected] https://www.csi.ucd.ie/users/somnath-mazumdar

Weblog analsys

Embed Size (px)

DESCRIPTION

The main idea of this presentation is to give an overall idea of web log analysis tool.

Citation preview

Page 1: Weblog analsys

Presented By: Somnath Mazumdar [email protected]

https://www.csi.ucd.ie/users/somnath-mazumdar

Page 2: Weblog analsys

z Introduction z Pros & Cons of Methods z AWStats z Google Analytics z AWStats Vs Google Analytics z Packet Sniffing z Approach z Conclusion

1

Page 3: Weblog analsys

z Weblogs: Activity/transaction information of web servers

z  Earlier weblogs are used to count the visitors. z Web Analysis: off-site and on-site. z On site information retrieval: 1. Page Tag

2. Historical Web data Analysis. z Usages : 1.Performance

2.Security 3.Prediction (Regression/CART) 4.Reporting&Profiling: 4.1. Web statistics 4.2. Business

Analytics(K-means, MC)

2

Page 4: Weblog analsys

z  Pros: 1. Accuracy: End user data. 2. Speed of Data Reporting 3. Data Collection Flexibility 4. No need of own web server

z  Cons: 1. User or Firewalls can restrict tag L

2. Tag each page L 3. cannot report on non-pages hit 4. Unable to track bandwidth, server

response time or completed downloads.

3

Page 5: Weblog analsys

z  Pros: 1. Non-invasive data collection 2. Can track bandwidth and completed downloads 3. Helps to optimize for search engine 4. Securely capture http user names 5. Can track “spiders” or robots.

4

Page 6: Weblog analsys

6. Exact content delivery information 7. Website content time-to-serve time 8. Missing or broken pages information

z  Cons: 1. Proxy/caching inaccuracies

2. No event (javascript, flash or AJAX ) tracking

3. Log management :Log generation, Log storage, and log file transfer.

5

Page 7: Weblog analsys

z Goal: System based or Product based z  Cost: Freeware or Commercial z Storage: Log Storage (3rd party) z  Report/Tips: Generate report static or real time with

tips.. AWStats is a powerful log analyzer creates

advanced web, ftp, mail and streaming server statistics reports.

Google Analytics provides in depth product marketing information and tips (Google Adwords/AdSense).

6

Page 8: Weblog analsys

z  Freeware z Graphically presented reports z  Customizable reports z  Reports based on users, OS, browser, location, data

transfer, bookmark, total visits and so on. z Standard and custom log format supported z Works from CLI as well as a CGI (Flexibility) z Written in Perl z Many desired features.. z  But Less visualized/interactive (GA)

7

Page 9: Weblog analsys

z  Issues: 1. DNS look up & Full Year View (time) 2. Database Format Using "xml" format 3 times larger than default. 3. Feature exclude records from SPAM

referrer (5 times slower). 4. To differentiate URLs of dynamic pages

(memory). 5. Accuracy hampers speed: Keywords ( 1%),

Search Engines (9%) Worms Detection(15%), OS(2%). 6. Each Extra section reduces AWStats

speed by 8%. Wrong setup may eat all memory.

8

Page 10: Weblog analsys

z Session "unknown" z AWStats counts everything as pages z  Reports cannot be generate based on current/custom

date z  Reports cannot be generate based on custom date

range and on weekly basis. z On few Intel Pentium4 / Xeon4 based host systems,

log file time can not be computed correctly L .

9

Page 11: Weblog analsys

10

Page 12: Weblog analsys

z  “Google Analytics shows you how people found your site, how they explored it, and how you can enhance their visitor experience.”—Google

z  Free z Help visitors by providing better keyword search z  Provide information related to website design. z Tagging :Automatic for content management system

or blogging platform but manual for customize website.

z  Confidentiality : Third party data processing.

11

Page 13: Weblog analsys

12

Page 14: Weblog analsys

Name AWStats Google Analytics Based on logs Yes Site Search data Page Tagging No Yes Hits count Count everything as

page IP address and

cookies Confidentiality Not an issue Issue (if not owner) Meant for website traffic

analysis. Website traffic and

marketing effectiveness.

Market Share NA Around 49.95% of top 1,000,000 hosts

13

Page 15: Weblog analsys

z  Power of analysis is limited by the information in logs. z  Extensive logging that consumes resources.

….more we measure, less accurate we understand …..

Awstats, Webalizer and Google Analytics are always different due to different techniques.

Use AWStats as well as Google Analytics to have better prediction

14

Page 16: Weblog analsys

15

Page 17: Weblog analsys

z  Packet sniffer can capture and decode data streams passing over a digital network.

z  Non-intrusive technology : no log, no page tag. z  Deploy sniffer into local network of servers to be tracked. z  Completely transparent for tracked website(s) z  Supports multiple servers without effecting server

response time.

16 Block Diagram of Packet Sniffing

Page 18: Weblog analsys

z  Packet sniffer can capture and decode data streams passing over a digital network.

z  Non-intrusive technology : no log, no page tag. z  Deploy sniffer into local network of servers to be tracked. z  Completely transparent for tracked website(s) z  Supports multiple servers without effecting server

response time.

17 Block Diagram of Packet Sniffing

Page 19: Weblog analsys

z  Client communication disconnects information z Server-side timing information z Website content delivery information z  Full spectrum of hits including non-pages z  Copes with proxy or browser caching z  Robots and automated agents data available z Website content time-to-serve time

18

Page 20: Weblog analsys

19