Communications in Information and Systems
Volume 4 (2004)
Estimating timestamp from incomplete news corpus
Pages: 273 – 288
Recently there have been a lot of researches for summarizing news stream and for detecting edges of new events in the news stream. But, in these tasks, all data are assumed to carry timestamp (temporal information). It is noteworthy that news articles without timestamp can’t make any contribution to these tasks. In this investigation, we propose a new technique to estimate timestamps to any news articles using small number of incomplete news corpus. Here we learn temporal information and topic information by means of both EM algorithm and incremental clustering, then we estimate timestamp of news article based on events that are discussed in news corpus. In this work, we examine TDT2 corpus and we show how well our approach works by some experiments.
timestamp estimation, TDT, stream data, EM algorithm, document clustering