We propose a distance between two realizations of a random process where for each realization only sparse and irregularly spaced measurements with additional measurement errors are available. Such data occur commonly in longitudinal studies and online trading data. A distance measure then makes it possible to apply distance-based analysis such as classification, clustering and multidimensional scaling for irregularly sampled longitudinal data. Once a suitable distance measure for sparsely sampled longitudinal trajectories has been found, we apply distance-based clustering methods to eBay online auction data. We identify six distinct clusters of bidding patterns. Each of these bidding patterns is found to be associated with a specific chance to obtain the auctioned item at a reasonable price. 1. Introduction. The goal of cluster analysis is to group a collection of subjects into clusters, such that those falling into the same cluster are more similar to each other than those in different clusters. Therefore, a measure of similarity or dissimilarity between subjects is a necessary ingredient for clustering. A metric defined on the subject space is one way to obtain dissimilarities, simply using the distance between two subjects as a measure of dissimilarity. While one can readily choose from a variety of well-known metrics for the case of classical multivariate data, or for functional data that are in the form of continuously observed trajectories, finding a suitable distance measure for irregularly observed data can be a challenge. One such situation which we study here occurs in the commonly encountered case of irregularly and sparsely observed longitudinal data, with online auction data a prominent example [Shmueli and Jank (2005), Jank and Shmueli (2006), Shmueli, Russo and Jank (2007), Liu and Muller (2008)]. As an example, a snapshot of an eBay auction history for a Palm Personal Digital Assistant is shown in Figure1. In this paper the focus is on a traditional clustering framework, where it is assumed that each subject belongs to exactly one cluster. There are alternative clustering ideas such as soft clustering [Erosheva and Fienberg (2005)] or mixed membership clustering [Erosheva, Fienberg and Lafferty (2004)]. For example, in Erosheva, Fienberg and Joutard (2007), functional disability data are
ABSTRACT
PUBLICATION RECORD
- Publication year
2008
- Venue
The Annals of Applied Statistics
- Publication date
2008-05-05
- Fields of study
Mathematics, Business, Economics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-27 of 27 references · Page 1 of 1