Here we investigate the performance of two models in predicting human gaze behavior in cross situational word learning. Previous work has developed two diverging accounts of potential mechanisms that might serve this learning ability. The first, associative learning, relies on the integration of contextual statistics across time. The second, hypothesis testing of the ”propose-but-verify” sort, suggests that learners do not track co-occurrence statistics, instead only tracking a single label-object mapping at a time. To adjudicate between these two mechanisms, we examine real time selective attention behavior as a window into learning processes. We demonstrate systematic biasing in gaze allocation as a function of the associative evidence accumulated for a label-object pairing over time, favoring the associative learning account. Moreover, we predict learning outcomes with model parameters controlling sensitivity and noise in memory encoding. This is novel evidence supporting associative learning and highlights the unique role of memory in cross-situational learning.