Measurement techniques and case studies for the characterization of Internet applications

Loading...
Thumbnail Image

Date

2006-07-13T10:33:41Z

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This thesis characterizes the two current killer applications of the Internet: World Wide Web (WWW) and Peer-to-Peer (P2P) file sharing. With the advances in network technology and radical cost reduction for Internet connectivity the Internet grows at an awesome speed in terms of number of users, available content and network traffic. Due to the huge amount of available data, developing algorithms to efficiently locate desired information is a difficult research task. Thus, the characterization of the two most popular Internet applications, which enables the design and evaluation of novel search algorithms, constitutes the two key contributions of this work. As first contribution, this thesis provides a synthetic workload model for the query behavior of peers in P2P file sharing systems which can be used for evaluating new P2P system designs. Whereas previous work has solely focused on aggregate workload statistics, this thesis presents a characterization of individual peer behavior in a form that can be used for constructing representative synthetic workloads. The characterization is based on a comprehensive 40 days measurement study in the Gnutella P2P file sharing system comprising more than 10 GBytes of trace data. As a key feature, the characterization distinguishes between user behavior and queries that are automatically generated by the client software. The analysis of the measured data exposes heterogeneous behavior that occurs on different days, in different geographical regions or at different periods of the day. Moreover, the consideration of additional correlations among the workload measures allows the generation of realistic workloads. As second contribution, this thesis characterizes and models the structural properties of German Web sites for enabling their automated classification. These structural properties encompass the size, the organization, the composition of URLs, and the link structure of Web sites. In fact, the approach is independent of the content of Web pages. Opposed to previous work, this thesis characterizes structural properties of entire Web sites instead of individual Web pages. The measurement study is based upon more than 2,300 Web sites comprising 11 million crawled pages categorized into five major classes: Brochure, Listing, Blog, Institution, and Personal. As a key insight which can be exploited for improving Internet search engines and Web directories, this thesis reveals significant correlations between the structural properties and the class of a Web site.

Description

Table of contents

Keywords

Internet measurement, P2P workload characterization, Website characterization

Citation