Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions

Jie Peng,H. Muller

Published 2008 in The Annals of Applied Statistics

ABSTRACT

We propose a distance between two realizations of a random process where for each realization only sparse and irregularly spaced measurements with additional measurement errors are available. Such data occur commonly in longitudinal studies and online trading data. A distance measure then makes it possible to apply distance-based analysis such as classification, clustering and multidimensional scaling for irregularly sampled longitudinal data. Once a suitable distance measure for sparsely sampled longitudinal trajectories has been found, we apply distance-based clustering methods to eBay online auction data. We identify six distinct clusters of bidding patterns. Each of these bidding patterns is found to be associated with a specific chance to obtain the auctioned item at a reasonable price. 1. Introduction. The goal of cluster analysis is to group a collection of subjects into clusters, such that those falling into the same cluster are more similar to each other than those in different clusters. Therefore, a measure of similarity or dissimilarity between subjects is a necessary ingredient for clustering. A metric defined on the subject space is one way to obtain dissimilarities, simply using the distance between two subjects as a measure of dissimilarity. While one can readily choose from a variety of well-known metrics for the case of classical multivariate data, or for functional data that are in the form of continuously observed trajectories, finding a suitable distance measure for irregularly observed data can be a challenge. One such situation which we study here occurs in the commonly encountered case of irregularly and sparsely observed longitudinal data, with online auction data a prominent example [Shmueli and Jank (2005), Jank and Shmueli (2006), Shmueli, Russo and Jank (2007), Liu and Muller (2008)]. As an example, a snapshot of an eBay auction history for a Palm Personal Digital Assistant is shown in Figure1. In this paper the focus is on a traditional clustering framework, where it is assumed that each subject belongs to exactly one cluster. There are alternative clustering ideas such as soft clustering [Erosheva and Fienberg (2005)] or mixed membership clustering [Erosheva, Fienberg and Lafferty (2004)]. For example, in Erosheva, Fienberg and Joutard (2007), functional disability data are

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-27 of 27 references · Page 1 of 1

CITED BY

Showing 1-100 of 129 citing papers · Page 1 of 2