Animal behavior often forms sequences, built from simple stereotyped actions and shaped by environmental cues. A comprehensive characterization of the interplay between an animal’s movements and its environment is necessary to understand the sensorimotor transformations performed by the brain. Here, we use unsupervised methods to study behavioral sequences in zebraﬁsh larvae. We generate a map of swim bouts, revealing that ﬁsh modulate their tail movements along a continuum. During prey capture, larvae produce stereotyped sequences using a subset of bouts from a broader behavioral repertoire. These sequences exhibit loworder transition dynamics and immediately respond to changes in visual cues. Chaining of prey capture bouts is disrupted in visually impaired (lakritz and blumenkohl) mutants, and removing the prey stimulus during ongoing behavior in closed-loop virtual reality causes larvae to immediately abort the hunting sequence. These results suggest that the continuous integration of sensory information is necessary to structure the behavior. This stimulus-response loop serves to bring prey into the anterior dorsal visual ﬁeld of the larvae. Fish then release a capture strike maneuver comprising a stereotyped jaw movement and tail movements ﬁne-tuned to the distance of the prey. Fish with only one intact eye fail to correctly position the prey in the strike zone, but are able to produce the strike itself. Our analysis shows that short-term integration of binocular visual cues shapes the behavioral dynamics of hunting, thus uncovering the temporal organization of a goal-directed behavior in a vertebrate.