My research projects aim to characterize the brain dynamics underlying visually guided actions in human. Specific contributions of my research projects to the field highlight interactions between object-based and ensemble-based representations, between selective and non-selective pathways, and between perceptual and action control processes. Findings of my research suggests that goal-oriented behaviors for visually-guided actions are effectively supported by multiple streams of such parallel processes. This stands in contrast to traditional theories of serial, non-overlapping processing stages across perception, cognition, and action: Instead of simplifying that action is the aftermath of cognition or perception, a new model needs to include effects of continuous interactions across perception, cognition, and action control processes, which can be observed by measuring dynamic connections between brain and behavior.

[ Research themes ]
1. Perceptual flexibility empowers cognitive capabilities and action outcomes.
2. Action goals guide perception.
3. Attentional distraction can facilitate perception and action.
4. Individual differences in perception, cognition, and action


1. Perceptual flexibility empowers cognitive capabilities and action outcomes.

The visual system reorganizes complex visual scenes using strategies for forming coherent and concise representations, rather than passively receiving all (millions of) bits of information hitting our retinas at any given moment. One powerful heuristic for this is to represent sets of similar objects as an ensemble using summary statistics such as mean, numerosity, and variance, etc.


Ensemble perception makes great intuitive sense. If we look around, we always find some redundancy and regularity in the real world images: Buildings in a city, trees in a forest, and fruits in a bush, for example, are often seen as groups of similar but not identical objects. For most everyday needs, we may not need to store individuating information from these scenes. We can instead only represent summary statistics of a scene regarding the overall layout, pattern, and gist in a succinct and compact manner. Such perceptual ability to extract ensemble representations allows our brain to “do more with less” and better understand and interact with the complex, dynamic visual world. One line of my research program investigates how our brain achieves this feat to empower our perceptual, cognitive, and action control abilities in many different ways.

* Ensemble perception as a strategy for coping with limitations on cognitive abilities Representing and storing an ensemble (e.g., average) of multiple objects helps the visual system to maintain and recall an image better. At the object level, only a few items (up to three or four) can be remembered at a time; the rest may be missed completely due to the limited memory capacity. When attempting to recall missed objects, one would have to make random guesses, increasing the overall expected error. However, average information about the image can guide one to recall the missed object to some extent by retrieving values biased toward the average of the set (Im & Chong, 2014). Say you try to remember the different colors of six disks in a visual image and report the colors you remember. If you remember that the disks were in ‘cool’ colors on average (even if you cannot remember the exact colors for each disk), you can reduce the overall error by choosing six colors from only the continuum of ‘cool’ colors and avoid ‘warm’ colors (Panel A). However, if you only remember the colors of three disks and completely missed the others (without remembering average information), you cannot avoid making extreme errors when you recall the color of one of the three forgotten disks (e.g., by randomly choosing “red” for a blue-ish disk, Panel B).


Moreover, I have discovered that a visual image in which groups of individual items are spatially clustered together allows the visual system to select, extract, and remember more sets, compared to an image of individuals that are spatially intermixed (Im & Chong, 2014; Im, Park, & Chong, 2015; see Demos here). This shows that reorganizing visual objects in a coherent manner (e.g., into sets or ensembles) is an efficient strategy to increase our memory capacity.


This is somewhat equivalent to using a “chunking” or “binding” strategy that reduces the amount of verbal information we have to deal with at once (e.g., recoding the separate letters F-B-I-C-I-A-N-S-A into three words, FBI, CIA, and NSA). My findings in this line of research suggest that ensemble perception allows us to compress individuals in a visual image into meaningful “chunks”, saving cognitive resources that can be used to process and remember more information about the scene.

* Hierarchical coding of a visual scene: Objects and Ensembles
My findings in this research program also specify a critical new aspect of the structure of visual perception by demonstrating that multiple ensembles of items, up to 3~4 sets, can be extracted in parallel, as higher-order units for visual perception and memory. The limit on the number of ensembles that can be extracted at any given time also converges with the well-documented three-or-four-object limits of visual attention (e.g., Pylyshyn & Storm, 1988) and visual working memory (e.g., Alvarez & Cavanagh, 2004; Luck & Vogel, 1997; Zhang & Luck, 2008).

Such convergence illustrates how items in a visual scene can be represented hierarchically. At least two units of visual processing - individual object and ensemble - can be extracted from a visual scene to be available at the same time, providing complementary

information on different aspects of scenes. For example, a single display of 5 red, 5 blue, 5 yellow, and 5 green dots can be represented as four color sets or 20 separate dots. At one level, “the set of blue dots” may be selected and only its average information (e.g., mean size, center location, etc.) may be stored as a single individual in visual working memory. At another level, “the set of blue dots” may be treated as 5 distinct items, available for individual size comparison, for example. This distinction highlights a hierarchical coding of “ensemble” and “individual” that is important for making sense of a visual scene.

To investigate the nature of hierarchical coding, one of my research projects (Im, Zhong, & Halberda, 2016, see Demo here) introduced a computer vision algorithm and characterized how human observers perceived “individual dots” and “sets of dots” from the same images. First of all, human observers’ grouping pattern was surprisingly consistent with one another, even though they were instructed to group the dots in whatever way they felt to be most comfortable and natural. This suggests that most of us may share the basic visual concept of “a group” or “a set” and we likely perceive “groups of dots” from a visual scene in a similar manner. My computer vision algorithm could predict precisely how human observers grouped individual dots into groups based on spatial proximity.

And more importantly, I have found that the way observers grouped the dots into sets systematically modulated the way they estimated the number of individual dots. Although Images A and B both had 29 dots, the estimated number of individual dots by human observers were dramatically different, with Image B (more clustered) being perceived as having much fewer dots than Image A (less clustered). My clustering algorithm could also accurately predict the degree to which observers underestimated the number of individual dots, depending on the degree to which individual dots were spatially clustered into groups. This finding provides an example of how

representations of “sets” and “individual objects” are extracted from the same image and interact with each other: The way dots are grouped and clustered modulates visual impressions of the number of individuals, just as the way individual objects are positioned modulates the way they are grouped into an ensemble.

Together, representing different perceptual units from a scene allows the visual system to utilize more information with less mental resource. By combining these different levels of representation of objects and ensembles, the visual system can perceive, and we can remember, the visual scene with greater detail, which can then effectively guide better action outcomes.


2. Action goals guide perception.

For decades, prevailing models of human behavior assume a functional structure of serial processing stages of perception, cognition, and action control modules. Thus the relationship between perception and motor system has been viewed as unidirectional; and the role of motor control system is only considered as final outcome of commands from perceptual and cognitive systems. However, I have discovered behavioral and neural evidence for continuous interactions between perception and action that are bidirectional.
For example, I have found that different social motivations and action goals (e.g., approach and avoidance) modulate the way we perceive emotional states and behavioral intent of crowds of people (Im et al., 2017a; Im et al., 2017b, see Demo here). When the observers’ goal was to avoid a potential threat in a social environment, they recognized a crowd of angry faces much more accurately and rapidly than a crowd of happy faces. Conversely, when the goal was to approach a friendly group, a happy crowd was perceived more accurately than an angry crowd. This suggests that our goals for ongoing interactions with the external world influences the way we perceive the world.

* Functionally and neurally differential processing routes for perceiving social visual cues
Recently, I have discovered that the dorsal and ventral pathways differentially contribute to different aspects of social visual perception and action. My fMRI and MEG data (Im et al., 2017a; Im et al., under review) show that there are (at least) two different processing routes for perceiving and responding to social visual cues: a fast processing route along the dorsal pathway in the brain for global perception of, and fast reaction to, social crowds and a slower processing route along the ventral pathway for local and detailed perception of individual faces.

When observers perceived individual faces, I have found greater involvement of brain areas along the ventral pathway (shown as blue blobs and arrows in Images A and B below). On the other hand, perceving and responding to emotional crowds of faces showed greater involvement of brain areas along the dorsal pathway (shown as red blobs and arrows in Images A and B below). For example, the fusiform gyrus in the ventral stream was more active (measured by fMRI), with lower temporal frequency (alpha frequency oscillations measured by MEG) during perception of individual emotional faces. However, the intraparietal sulcus in the dorsal stream was more active (measured by fMRI), with higher temporal frequency (beta frequency oscillations measured by MEG) during perception of crowds of emotional faces.


* Action goals and motivations modulate how our eyes see and navigate a visual scene
I have examined how human observers’ eye movement patterns reflect the current action goals and social motivations (e.g., approach or avoidance) when they freely view an image of two crowds of faces with different average emotions. When observers are to decide which crowd of faces they would approach, their first saccadic eye movement is directed towards a crowd of faces which has average emotion that is (relatively) more positive between the two. In another study, I also have found different task goals modulate which visual field the observers prefer to look at first: When they are to choose which group they would rather approach, observers show tendency to look at the right visual field more dominantly, whereas when they are to choose which group they would rather avoid, they show tendency to look at the left visual field first (Im et al., in preparation). This finding reveals the interesting laterlity pattern in which observers’ eye movement trajectories for the same images are modulated by the current task goal, suggesting the possibility that the left and right hemispheres may show differential processing biases for task goals and social motivations.


My work in this area received an ECOR Tosteson Postdoctoral Fellowship Award for Medical Discovery and was funded by Massachusetts General Hospital.


3. Attentional distraction can facilitate perception and action.

Although it is generally accepted, and usually valid, that distracted attention impairs perceptual and behavioral performance, I have found advantages of distributed (or distracted) attention over focused attention in some perceptual tasks and motor learning. For example, distributed attention facilitates global processing of a visual image and enhances memory retrieval of visually-guided actions that have been learned under similar distraction (Im, Bédard, & Song, 2015; 2016; Im et al., under review). These results suggest that as another processing route, distributed attention can enhance our perception and action under some circumstances in which focused attention may actually hinder performance. A broader attentional focus afforded by reduced control under distributed attention can allow us to learn more about the background and context of the environment, compared to the mode of focused attention.

This line of research has led me to pursue one of my future research aims of characterizing how these non-selective and selective routes guide each other to support different aspects for our complex behaviors, with the goal of understanding how cognition modulates perception and action in our complex behaviors across a range of tasks. New findings from this research will contribute to better rehabilitation/training programs for patients with motor degenerative diseases and those who need to refine and adjust their perception-action coordination after injuries or surgeries.

My work in this area was recognized and supported by a Center for Vision Research Award at Brown University.


4. Individual differences in perception, cognition, and action.

In other research projects, I have been exploring how different personal qualities such as race (Im et al., 2017b; Son et al., in preparation), trait anxiety (Im et al., 2017c), sex (Im et al., 2018), and age (Im et al., in preparation) influence the way they perceive and act upon social emotional stimuli, in order to obtain a bigger picture of what makes individual minds and behaviors so unique.