Web mining

From Wikipedia, the free encyclopedia

Web mining is the application of data mining techniques to discover patterns from the Web. According to analysis targets, web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining.

1 Web usage mining
- 1.1 Web Usage Mining Process
2 Web content mining
3 Web structure mining
4 Application Areas of Web Mining
5 Resources

[edit] Web usage mining

Web usage mining is the application that uses data mining to analyse and discover interesting patterns of user’s usage data on the web. The usage data records the user’s behaviour when the user browses or makes transactions on the web site. In order to better understand and serve the needs of users or Web-based applications. It is an activity that involves the automatic discovery of patterns from one or more Web servers. Organizations often generate and collect large volumes of data; most of this information is usually generated automatically by Web servers and collected in server log. Analyzing such data can help these organizations to determine the value of particular customers, cross marketing strategies across products and the effectiveness of promotional campaigns, etc.

The first web analysis tools simply provided mechanisms to report user activity as recorded in the servers. Using such tools, it was possible to determine such information as the number of accesses to the server, the times or time intervals of visits as well as the domain names and the URLs of users of the Web server. However, in general, these tools provide little or no analysis of data relationships among the accessed files and directories within the Web space. Now more sophisticated techniques for discovery and analysis of patterns are now emerging. These tools fall into two main categories: Pattern Discovery Tools and Pattern Analysis Tools.

[edit] Web Usage Mining Process

Problem identification
Data Collection
Data Pre-processing
Pattern discovery and analysis

[edit] Web content mining

Web content mining is the process to discover useful information from the content of a web page. The type of the web content may consist of text, image, audio or video data in the web. Web content mining sometimes is called web text mining, because the text content is the most widely researched area. The technologies that are normally used in web content mining are NLP (Natural language processing) and IR (Information retrieval).

[edit] Web structure mining

Web structure mining is the process of using the graphic theory to analyse the node and connection structure of a web site. According to the type of web structural data, web structure mining can be divided into two kinds.

The first kind of web structure mining is extract patterns from hyperlinks in the web. A hyperlink is a structural component that connects the web page to a different location. The other kind of the web structure mining is mining the document structure. It is using the tree-like structure to analyse and describe the HTML (Hyper Text Markup Language) or XML (eXtensible Markup Language) tags within the web page.

[edit] Application Areas of Web Mining

E-commerce
Search Enginer
Personalisation
Website Design

[edit] Resources

[edit] Books

Jesus Mena, "Data Mining Your Website", Digital Press, 1999
Soumen Chakrabarti, "Mining the Web: Analysis of Hypertext and Semi Structured Data", Morgan Kaufmann, 2002

[edit] Bibliographic references

Cooley, R. Mobasher, B. and Srivastave, J. (1997) “Web Mining: Information and Pattern Discovery on the World Wide Web” In Proceedings of the 9th IEEE International Conference on Tool with Artificial Intelligence
Cooley, R., Mobasher, B. and Srivastava, J. “Data Preparation for Mining World Wide Web Browsing Patterns”, Journal of Knowledge and Information System, Vol.1, Issue. 1, pp.5-32, 1999
Kohavi, R., Mason, L. and Zheng, Z. (2004) “Lessons and Challenges from Mining Retail E-commerce Data” Machine Learning, Vol 57, pp. 83-113
Mobasher, B., Cooley, R. and Srivastava, J. (2000) “Automatic Personalisation based on web usage Mining” Communications of the ACM, Vol. 43, No.8, pp. 142-151
Mobasher, B., Dai, H., Kuo, T. and Nakagawa, M. (2001) “Effective Personalization Based on Association Rule Discover from Web Usage Data” In Proceedings of WIDM 2001, Atlanta, GA, USA, pp. 9-15
Pierrakos, D., Paliouras, G., Papatheodorou, C., Spyropoulos C. D. (2003) “Web usage mining as a tool for personalization: a survey”, User modelling and user adapted interaction journal, Vol.13, Issue 4, pp. 311-372
Combining ethnographic and clickstream data to identify user Web browsing strategies Paper by Lillian Clark, I-Hsien Ting, Chris Kimble, Peter Wright, Daniel Kudenko in Information Research, Vol. 11 No. 2, January 2006
UBB Mining: Finding Unexpected Browsing Behaviour in Clickstream Data to Improve a Web Site’s Design Paper by I-Hsien Ting, Chris Kimble, Daniel Kudenko.

[edit] External links

KDnuggets News and discussion forum for Data Mining, Web Mining and Knowledge Discovery
Data Mining Tutorials, Resources Eruditionhome
Web Mining by Patricio Galeas
Web Mining Example of a Visual Web Mining Tool
Data Mining and Web Mining Books, conferences, white papers, people, training, jobs

[edit] Software

YALE (Yet Another Learning Environment) (http://yale.sf.net/): freely available integrated open-source software environment for knowledge discovery, data mining, machine learning, visualization etc. including web mining and text mining: YALE and its also freely available open-source plugin WordVectorTool offer a free complete software environment for many web mining and text mining tasks.
WUM(Web Utilization Miner):an integrated, Java-based Web mining environment for log file preparation, basic reporting, discovery of sequential patterns and visualization.

[edit] Related Conference

WebKDD 2006: SIGKDD Workshop on Web Mining and Web Usage Analysis
WebMine 2006:Workshop on Web Mining 2006

This software engineering-related article is a stub. You can help Wikipedia by expanding it.

Retrieved from "http://en.wikipedia.org../../../w/e/b/Web_mining.html"

Categories: Data collection | Data mining | Software engineering stubs