Markov chains with variable length are useful stochastic models for data compression that avoid the curse of dimensionality faced by full Markov chains. In this work, we introduce a variable length Markov chain whose transition probabilities depend not only on the state history but also on exogenous covariates through a generalized linear model.The goal of the proposed procedure is to estimate not only the context of the process, that is, the history of the process that is relevant for predicting the next state, but also the coefficients corresponding to the significant exogenous variables. The proposed method is consistent in the sense that the probability that the estimated context and the coefficients are equal to the true data generating mechanism tends to 1 as the sample size increases. The proposed methodology is used to estimate the influence of climate covariates as well as other covariates to predict dengue outcomes in several municipalities in Brazil.
Joint work with A. Z. Zambom, S. Kim and M. Rocha.