Communications in Information and Systems

Volume 4 (2004)

Number 4

Estimating timestamp from incomplete news corpus

Pages: 273 – 288

DOI: https://dx.doi.org/10.4310/CIS.2004.v4.n4.a1

Authors

Takao Miura (Dept.of Elect.& Elect. Engr., Hosei University, Koganei, Tokyo, Japan)

Isamu Shioya (Dept. of Management and Informatics, SANNO University, Isehara, Kanagawa, Japan)

Hiroshi Uejima (CASIO Computer Co. Ltd., Japan)

Abstract

Recently there have been a lot of researches for summarizing news stream and for detecting edges of new events in the news stream. But, in these tasks, all data are assumed to carry timestamp (temporal information). It is noteworthy that news articles without timestamp can’t make any contribution to these tasks. In this investigation, we propose a new technique to estimate timestamps to any news articles using small number of incomplete news corpus. Here we learn temporal information and topic information by means of both EM algorithm and incremental clustering, then we estimate timestamp of news article based on events that are discussed in news corpus. In this work, we examine TDT2 corpus and we show how well our approach works by some experiments.

Keywords

timestamp estimation, TDT, stream data, EM algorithm, document clustering

Published 1 January 2004