Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Role of vergence eye movements in the visual recognition of long time duration

Open Access Open Access

Abstract

When viewing dichoptic stimuli in long time duration, visual percepts are always the alternation between the left and right eye inputs, while not the combination. This is known as binocular rivalry. An efficient coding theory reported that binocular visual inputs can be combined into binocular summation (S+) and difference (S) channels in V1 brain area. In this study, we used specially designed stimuli as the previous study, in which monocular inputs caused ambiguous percepts, but S+ and S channels had unambiguous percepts. We aim to investigate whether the visual percepts alter between S+ and S channels in long time duration and whether vergence eye movements are involved in the process. To do so, the stimuli were presented in 300-s time duration in a trial, and a binocular eye tracker was used to record eye information. Participants’ real-time behavioral responses about the visual percepts and binocular information were recorded simultaneously. The results show there are perceptual flips between S+ and S channels in both central and long time viewing conditions. More importantly, in central vision there are vergence eye movements before perceptual flips, suggesting the involvement of high level visual attention; the time of a perceptual flip from S+ is shorter than that of a flip from S, which might be due to different involvements of visual attention, indicating a bias of feedback connection from higher brain areas for visual attention to S+ channel. Since S+ and S dominated signals can be carried by different types of binocular neurons, our results provide new insights into high level visual attention and binocular neurons in V1 brain area by using specially designed dichoptic stimuli and eye vergence as measuring tools.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

When the left and right eyes’ inputs are dichoptic stimuli (the left and right eye images are different), normally one monocular input is the dominant percept each time. When observing dichoptic stimuli over time, visual percepts switch between the left and right eyes, rather than receive a superimposed image; this is known as binocular rivalry. For example, when observing a red circular grating and a green radial grating from the left and right eyes, respectively; percepts switch between the two gratings temporally, instead of seeing a combined image —— plaid. The perceptual flips have been proved to be a local retinotopic level process [1], and reduced probabilities of blinks and (micro) saccades were also found at the moments of perceptual flips. However, vergence eye movements were absent during this process [13].

An efficient coding theory reported that binocular visual inputs can combined into binocular summation channel (S+) and binocular difference (S) channel in V1 brain area [4]. A researcher Zhaoping used specially designed dichoptic stimuli to investigate the difference between central and peripheral vision by verifying three separate features: motion, tilt, and color [5]. For monocular inputs, the percepts were ambiguous; but for S+ and S channels, the percepts were unambiguous. Participants’ task was to judge whether the motion direction was upward or downward, tilt orientation leftward or rightward, or color red-green or yellow-blue. Then Zhaoping analyzed whether the perceived features were dominated by S+ or S channels based on participants’ behavior data. She found a bias toward S+ percept in central vision, but not in peripheral vision [5]. She proposed that top-down feedback for visual recognition in central vision is stronger than that in periphery. This may be due to the functional difference between central and peripheral vision: since the former mainly works for visual decoding, while the latter specializes in visual selection.

However, Zhaoping presented the dichoptic stimuli in short durations (maximum 0.2 s). Whether there is still bias towards S+ percept in central vision in long time duration has not been investigated yet. Since dichoptic stimuli were used, whether visual percepts alter between S+ and S channels in long time duration and whether vergence eye movements are involved in the process remain unclear. To clarify these issues, in contrast to 0.2-s stimulus presentation duration in the previous study [5], we presented dichoptic stimuli for 300 s and simultaneously recorded the real-time behavioral response (S+, S, and neither percept SN) and binocular eye information. By referring to previous studies which presented binocular rivalry stimuli in 4 minutes or 60-s in a trial [67], our study used 300-s time duration; which should be long enough to cause a perceptual flip (if any), but not too long to cause visual or physical fatigue to the participants. Our predictions are that there is still bias towards S+ percept in central vision in long perceptual time duration; and vergence eye movements may also be involved in central visual process. The experimental procedure and results are presented and analyzed below.

2. Methods

2.1 Participants

We recruited 15 individuals (10 men, 5 women; aged 18 to 34, with a mean age of 23.6 years) from the Kochi University of Technology as participants. All had normal or corrected-to-normal vision and their stereo and motion acuities were tested with our customized programs. All participants were naïve to the experiment’s aim and were compensated for their time. The authors did not serve as participants. The Research Ethics Committee of Kochi University of Technology approved all experiments and procedures, which conformed to the tenets of the Declaration of Helsinki. Prior to experiments, all participants signed written informed consents.

2.2 Apparatus

We created a program using Matlab R2012a (Mathworks, USA) with PsychToolbox Version 3 to introduce experimental stimuli, which were presented on a 22-inch CRT color display (RDF223H; Mitsubishi, Japan; 1024 × 768 pixels, 85 Hz refresh rate) [89]. We measured the display luminance with a CS-100A colorimeter (Minolta, Japan) and linearized it using a look-up table method. During the experiment, participants sat in a dark room (fronto-parallel to the display’s surface) and observed stimuli through a mirror stereoscope (Edmund Optics, USA). A chin rest was used to prevent participants’ head movement. The distance from the CRT display to the chin rest was 50.0 cm. A head-mounted binocular eye-tracking device (Eyelink II, SR Research, Canada) recorded the eyes with a 250 Hz sampling rate. We implemented a nine-point calibration method for the left and right eyes separately with our customized targets presented at the left and right halves of the CRT display, respectively.

2.3 Stimuli

In terms of the stimuli, Zhaoping used three kinds of gratings with motion, tilt, or color features separately [5]. In this study, for simplicity, we used dichoptic stimuli with a motion feature as in the publication by Zhaoping [5]. We presented the stimuli in central and peripheral viewing conditions as shown in Fig. 1(A) and (B), respectively. For each viewing condition, the left and right eye inputs were presented on the left and right sides of the CRT display, respectively. For each monocular input in each viewing condition, an outer frame was presented with two spikes on each vertical side for easier vergence. An inner frame was drawn to enclose the stimulus gratings.

 figure: Fig. 1.

Fig. 1. Schematics of stimuli in central and peripheral viewing conditions. (A) Central viewing condition; (B) Peripheral viewing condition.

Download Full Size | PDF

Each monocular input combined two gratings, thus it caused an ambiguous visual percept. SL and SR in Eqs. (1) and (2) represent the stimuli for the left and right eyes, respectively [5,1011].

$${S_L} = ({C_ + }{S_q} + {C_ - }{S_{q^{\prime}}})/2$$
$${S_R} = ({C_ + }{S_q} - {C_ - }{S_{q^{\prime}}})/2$$
${C_ + }$ and ${C_ - }$ are the contrasts of the two gratings ${S_q}$ and ${S_{q^{\prime}}}$, respectively. ${C_ + }$ = ${C_ - }$ = 0.3.
$${S_q} = \cos \left[ {k \cdot \left( {y \mp 2\pi \frac{w}{k}t} \right) + {\emptyset_q}} \right]$$
$${S_{q^{\prime}}} = \textrm{cos}\left[ {k \cdot \left( {y \pm 2\pi \frac{w}{k}t} \right) + {\emptyset_{q^{\prime}}}} \right]$$
$$k = \frac{{4\pi }}{L}$$
$k$ represents the spatial frequency, L is the side size of the inner frame (4π/L = 2π*2/L, so we can always keep two cycles of gratings in the inner frame); y is the stimuli’s vertical coordinate; $\textrm{w}$ is the temporal frequency (w = 5 Hz); and ${\emptyset _q}$ and ${\emptyset _{q^{\prime}}}$ are two independent phases for gratings ${S_q}$ and ${S_{q^{\prime}}}$ respectively, which are evenly distributed within [0, 2π] and are generated in random order in each session.

In the central viewing condition, the side size was: L = 1.13° (equal to 1.77 cpd). In the peripheral condition, the side size was:

$$\textrm{L} = 1.13^\circ{\cdot} \left( {1 + \frac{\textrm{e}}{{{\textrm{e}_2}}}} \right)$$

In which, e2 = 3.3° and e = 7.2° (left eccentricity, equal to 0.56 cpd). Here, we used a larger size to compensate for the lower visual spatial resolution in the peripheral vision [5]. Each inner frame’s line thickness was L/25. At the inner frame’s center, a black disk was drawn with radius L/20. In the central viewing condition, the fixation point was the inner frame’s center disk; in the peripheral viewing condition, the fixation point was 7.2° right to the inner frame’s center disk, so both the fixation and the inner frame’s central disk were shifted 3.6° from the outer frame’s center [5].

Different from the previous study [5]; we used a smaller outer frame to fit our mirror stereoscope’s sizes. Viewed through the mirror stereoscope, the outer frame’s width and height were 19° and 15.86°, respectively.

Physically superimposing the monocular visual inputs SL and SR produces a binocular summation grating (S+); while subtracting one from the other will produce a binocular difference grating (S). S+ and S can be expressed as Eqs. (7) and (8).

$${S_ + } = {S_L} + {S_R}$$
$${S_ - } = {S_L} - {S_R}$$
According to Eqs. (1) and (2) and Eqs. (9) and (10),
$${S_ + } = {C_ + }{S_q}$$
$${S_ - } = {C_ - }{S_{q^{\prime}}}$$
Thus, the drifting directions of S+ and S percepts are determined by the directions of ${S_q}$ and ${S_{q^{\prime}}}$, respectively. Since the drift directions of ${S_q}$ and ${S_{q^{\prime}}}$ are determined by the signs of $2\pi \frac{w}{k}t$, which are opposite as shown in Eqs. (3) and (4); hence, a switch in percept from one direction to the other reflects a switch in perceptual dominance between S+ channel and S channel [5].

To convert the contrast signals SL and SR to luminance signals shown on CRT display, Eqs. (1) and (2) can be written as below $S{^{\prime}_L}$ and $S{^{\prime}_R}$:

$$S{^{\prime}_L} = \bar{S}[{1 + ({{C_ + }{S_q} + {C_ - }{S_{q^{\prime}}}} )/2} ]$$
$$S{^{\prime}_R} = \bar{S}[{1 + ({{C_ + }{S_q} - {C_ - }{S_{q^{\prime}}}} )/2} ]$$
In which, $\bar{S}$ is the background luminance ($\bar{S}$ = 20.8 cd/m2 when viewing through the mirror stereoscope). ${C_ + }$, ${C_ - }$, ${S_q}$ and ${S_{q^{\prime}}}$ are the same as in Eqs. (1) and (2).

For easier understanding of the stimuli, we recorded the stimuli in central viewing condition (as shown in Movie 1). The left and right panels present the stimuli for the left and right eyes, respectively. The percepts of monocular stimuli were “dithering” without clear motion direction, while S+ and S channels have clear motion directions. Please be noted that we used a camera to record the images on the CRT display, so there was flickering on the video image caused by the shutter speed of the camera. In experiment, participants could not perceive the flickering from the CRT display. The stimuli in peripheral vision were similar and omitted here.

2.4 Procedure

Figure 2 shows the experimental procedure in central viewing condition. The procedure in peripheral viewing condition is the same as in central viewing condition, and was omitted here. At the beginning of each trial, participants were asked to do the calibration for the Eyelink II to match the eye positions with CRT display’s (x, y) coordinates. To do so, we implemented a 9-point (3 × 3) calibration for the left and right eyes separately. The positions of the nine target points were customized within each outer frame. At first, participants wore the eye tracker and observed the CRT display via mirror stereoscope; then they were asked to open the two eyes but used a cardboard to block the right eye. After ready, a target point for the left eye was presented in the left outer frame with our customized position. Participants were asked to gaze on the target until the eye tracker recorded their eye position. Then they did the calibration for the next target point. For each target point, there is a group of eye information captured by the camera accordingly. After the left eye calibration was finished, participants were asked to block the left eye, and do the calibration for the right eye in a similar way (step 1 and 2). For each eye, the nine target points can cover the entire outer frame of its side. During experiment, the Eyelink software calculates the corresponding (x, y) coordinates on CRT display based on current eye information by using an interpolation algorithm designed by their company. After calibration, the binocular start−image for fixation and the text message (located under the fixation point) “Press any button for the next trial” were presented to instruct participants to trigger the next trial (step 3). Then the binocular start−image for fixation appeared, and participants were asked to gaze at the fixation point for at least 700 ms (step 4). Meanwhile, the Eyelink II started to record binocular eye positions and pupil sizes. During experiment, participant was instructed to press a button to trigger stimuli only after confirming he/she was well focused on the fixation point. Subsequently, the dichoptic stimuli were presented for 300-s (step 5); and participants judged the motion direction as upward, downward or neither and responded by real-time continuous key pressing with specified keys. For example, they continuously pressed up-arrow key for upward percepts, continuously pressed down-arrow key for downward percepts, and did not press any button if they perceived neither up nor down or both up and down motion directions. Participants were asked to change the key press as quickly as possible when the perceived motion direction changed.

 figure: Fig. 2.

Fig. 2. Experimental procedure.

Download Full Size | PDF

During each trial, participants were asked to keep their heads as steady as possible and keep their eyes on the fixation point. After each trial, participants took their heads off the chin rest, removed the Eyelink II device, and had a short break. This was done to avoid fatigue to their eyes, necks, or heads during the experiment. After each break, participants were asked to redo the calibration before the next trial.

The experiment was divided into two sessions and implemented on separate days for each participant. In each session, there were eight trials for two viewing conditions (four for each). Both viewing conditions (central and peripheral) and motion directions of ${S_q}$ and ${S_{q^{\prime}}}$ (upward and downward) were generated in counterbalanced random order. No one can predict the viewing condition or motion direction. For each participant, we collected 16-trials data in total.

3. Results

3.1 Behavioral data

During the experiment, there are perceptual flips among S+, S, and SN for each participant in both central and peripheral viewing conditions. At first, we calculated total fractions (F) of seeing three-percept (S+, S, and SN) drift directions separately in both central and peripheral viewing conditions using the 15 participants’ data. Figure 3(A), (B), and (C) show the results of F+, F, and FN, respectively. Red and blue bars represent central and peripheral viewing conditions, respectively. Error bars are standard errors of means. We used a matched sample t-test to analyze the significance between the two viewing conditions. In the comparison of central and peripheral viewing conditions, the fractions of seeing S+, S, and SN motion directions were 78% vs. 63%, 15% vs. 20%, and 7% vs. 17%, respectively. Significant differences were evident between central and peripheral viewing conditions in S+ percept condition (p = 0.001; t (14) = 4.06) and in SN percept condition (p = 0.007; t (14) = 3.18), but no significant difference was seen in S percept condition (p = 0.288, t (14) = 1.12).

 figure: Fig. 3.

Fig. 3. Total fractions (F) of seeing three percepts in central and peripheral viewing conditions. (A) Total fraction F+ of seeing S+ drift direction; (B) Total fraction F of seeing S drift direction; (C) Total fraction FN of seeing neither drift direction.

Download Full Size | PDF

Next, we divided the 5-minute time duration evenly into five sections and marked them as td1–td5, respectively. After that, we respectively calculated F+, F, and FN in each section of time period in both central and peripheral viewing conditions as shown in Fig. 4(A), (B) and (C). Horizontal axes represent the time duration in each minute (unit: minute); vertical axes represent the fractions of each percept; red line and blue dash line represent the central and peripheral viewing conditions, respectively. Error bars are standard errors of the means.

 figure: Fig. 4.

Fig. 4. Fractions (F) of seeing three percepts in each section of time period in central and peripheral viewing conditions. (A) Fractions F+ of seeing S+ drift direction; (B) Fractions F of seeing S drift direction; (C) Fractions FN of seeing neither drift direction.

Download Full Size | PDF

In S+ percept (Fig. 4(A)), 2 (central and peripheral viewing conditions) × 5 (td1–td5) ANOVA revealed a significant main effect of viewing conditions (F(1, 14) = 18.04, p < 0.001) and a significant main effect of time periods (F(4, 56) = 3.84, p = 0.008); while no significant interaction between viewing conditions and time periods (F(4, 56) = 0.50, p = 0.738). In S percept (Fig. 4(B)), 2 (central and peripheral viewing conditions) × 5 (td1–td5) ANOVA revealed a significant main effect of time periods (F(4, 56) = 2.62, p = 0.044); while neither significant main effect of viewing conditions (F(1, 14) = 1.20, p = 0.291) nor significant interaction between viewing conditions and time periods (F(4, 56) = 1.05, p = 0.387). In neither percept (Fig. 4(C)), 2 (central and peripheral viewing conditions) × 5 (td1–td5) ANOVA revealed a significant main effect of viewing conditions (F(1, 14) = 10.22, p = 0.007); while neither significant main effect of time periods (F(4, 56) = 1.46, p = 0.226) nor significant interaction between viewing conditions and time periods (F(4, 56) = 1.69, p = 0.165).

3.2 Vergence eye movements, pupil size, blinks, and (micro)saccades

3.2.1 Data analysis

We used Matlab R2012a and DataViewer software (SR Research, Canada) to analyze data on eye positions and pupil sizes of the 15 participants.

For eye-tracking data, we calculated the distribution of relative change in horizontal vergence based on following steps: (1) temporally align the eye-tracking data with behavioral key pressing data based on time stamps recorded by the Eyelink II system; (2) define t = 0 as the moment of perceptual flip based on participants’ key releasing responses, t = [−4000, 4000] ms as time window and extract horizontal positions of the left and right eye within this period (e.g., denote as x(l, t) and x(r, t), respectively); (3) subtract the right eye position from the left eye position in the horizontal direction to obtain horizontal vergence;

$$\textrm{x}({\textrm{hor}, \textrm{t}} )= x({l, t} )- x({r, t} ), t = [{ - 4000,4000} ]$$
(4) select a range at the beginning of the time window t = [−4000, −3000] ms and calculate the average value x(hor_avg) across all trials and all participants in each viewing condition separately. The averaged value in t = [−4000, −3000] ms was defined as the baseline for normalizing because we found the distribution data in this range is quite stable and also temporally far enough from the moments of perceptual flips; and we calculated the average value of vergence within this range, instead of using only one time point as baseline, to avoid artifacts of data processing. (5) normalize the curves based on this value:
$$\textrm{x}({\textrm{hor}\_\textrm{norm}, \textrm{t}} )= x({hor, t} )- x({ho{r_{avg}}} )$$
And convert the unit “pixel” into “degree” (based on the pixel per degree of the display). Because the horizontal vergence is calculated by subtracting the right eye position from the left eye position, a negative value indicates divergence. (6) calculate the mean value of the normalized horizontal vergence across all trials and all participants:
$$\textrm{x}({\textrm{hor}\_\textrm{mean}, \textrm{t}} )= {\raise0.7ex\hbox{${\mathop \sum \nolimits_1^n ({x({hor\_norm, t} )} )}$} \!\mathord{\left/ {\vphantom {{\mathop \sum \nolimits_1^n ({x({hor\_norm, t} )} )} n}}\right.}\!\lower0.7ex\hbox{$n$}}$$
We excluded the data containing another flip in time duration t = [−4000, 0] to ensure that the distribution curves before perceptual flips were pure.

Similarly, we calculated distribution of relative change in vertical vergence based on above six steps, except that in step 3 we subtracted the vertical position of the right eye from that of the left eye, e.g.,

$$\textrm{y}({\textrm{ver}, \textrm{t}} )= \textrm{y}({\textrm{l}, \textrm{t}} )-\textrm{y}({\textrm{r}, \textrm{t}} ), t = [{ - 4000,4000} ]$$
So the negative vertical vergence means right-sursumvergence (the right eye’s gaze position is higher than the left eye’s position in the vertical direction).

Moreover, we calculated the mean pupil size by: (1) averaging pupil sizes of the left (Pu (l, t)) and right eyes (Pu(r, t)) within the same time window t = [−4000, 4000]; e.g.,

$$\textrm{Pu}(\textrm{t} ) = ({\textrm{Pu}({\textrm{l}, \textrm{t}} ) + \textrm{Pu}({\textrm{r}, \textrm{t}} )} )/2, t = [{ - 4000,4000} ]$$
(2) normalizing it as:
$$\textrm{Pu}({\textrm{t}\_\textrm{norm}, \textrm{t}} ) = \textrm{Pu}(\textrm{t} ) - \textrm{Pu}({\textrm{t}\_\textrm{avg}} ), t = [{ - 4000,4000} ]$$
And (3) calculating the percentage of relative pupil size change across all trials and all participants.

3.2.2 Experimental result

Figure 5 shows the distributions of relative changes in horizontal vergence, vertical vergence, and pupil size during percept changes from S+ (left panels) and from S (right panels). In Fig. 5(A)–(C), horizontal axes represent the time window t = [−4000, 4000] ms, in which t = 0 means the moment of key releasing for perceptual change from previous percept; and vertical axes represent distributions of relative changes in horizontal vergence (unit: degree), vertical vergence (unit: degree), and pupil size (unit: %). Left and right panels represent results during percept changes from S+ and from S, respectively. For simplicity, we mainly focus on the moments before percepts flip from a pure S+ or S percept (t = [−4000, 0] ms), but do not distinguish the percepts after flips; so it might not be a pure S or S+ percept during t = [0, 4000] ms. This means it might be a S precept or a SN percept or each percept in a short time duration after perceptual flips (t = [0, 4000] ms) in the left panel, and a S+ precept or a SN percept or each percept in a short time duration after perceptual flips (t = [0, 4000] ms) in the right panel. Red and blue lines represent central and peripheral viewing conditions, respectively. The horizontal green bar indicates a significant difference between central and peripheral viewing conditions resulting from a matched t-test (p < 0.01, n = 15).

 figure: Fig. 5.

Fig. 5. Distributions of relative changes in horizontal vergence, vertical vergence and pupil size during percept changes from S+ and from S. (A) Distribution of relative changes in horizontal vergence; (B) Distribution of relative changes in vertical vergence; (C) Distribution of relative changes in pupil size. (p < 0.01).

Download Full Size | PDF

In both left and right panels of Fig. 5(A), in central viewing condition, there are decreases (divergence) of horizontal vergence (t = [−1500, −1000] ms for the left panel; t = [−2800, −2300] ms for the right panel) and obvious increases (less divergence and convergence) of horizontal vergence before key releasing for perceptual flip (t = [−1000, 0] ms for the left panel; t = [−2300, 0] ms for the right panel). However, in peripheral viewing condition, there is no obvious vergence change before key releasing. Horizontal green bars indicate significant differences of relative changes in horizontal vergence between central and peripheral viewing conditions (p < 0.01). Both panels of Fig. 5(B) show no significant difference of vertical vergence between central and peripheral viewing conditions. In both panels of Fig. 5(C), relative changes in pupil size remain stable before key releasing; but obviously increased in both central and peripheral viewing conditions after key releasing.

Moreover, we used the similar method to extract blinks and (micro)saccades data and calculated the probability distributions within the same time window t = [−4000, 4000] ms. Figure 6 shows the probability distributions of blinks and (micro)saccades during percept flips from S+ and from S. Horizontal axes in Fig. 6(A) and (B) represent the time window t = [−4000, 4000] ms, in which t = 0 means the moment of perceptual flip based on key releasing; and vertical axes represent probability distributions of blinks and (micro)saccades, respectively. The probabilities were obtained by calculating the chances of having blinks or (micro)saccades in each time bin across all trials and all participants when percept changing from S+ or from S separately. Left and right panels in both (A) and (B) represent results during percept changes from S+ and from S, respectively. Red and blue lines represent central and peripheral viewing conditions from the 15 participants’ data, respectively. The sampling rate of Eyelink II is 250 Hz, so the time bin for blink and (micro)saccade data is 4 ms (1000 ms /250 = 4 ms).

 figure: Fig. 6.

Fig. 6. Probability distributions of blinks and (micro)saccades during percept changes from S+ and from S separately. (A) Probability distribution of blinks; (B) Probability distribution of (micro)saccades.

Download Full Size | PDF

In Fig. 5 and Fig. 6, the time point t = 0 represents the moment of key releasing after participant perceives a change. Since this motor action may take several-hundred milliseconds [12], so t = –500 ms might be the actual moment of perceptual flip. In Fig. 6(A) and (B), the probabilities of blinks and (micro)saccades in both central and peripheral viewing conditions were reducing before the actual perceptual flip and increasing after perceptual flip, meaning the eyes focused longer at the fixation at the moment of perceptual flip.

4. Discussion

4.1 Significant differences of both total F+ in long time duration and F+ in each time period between central and peripheral vision suggest top-down feedback for visual recognition is stronger in central vision

Zhaoping presented the same dichoptic stimuli in short time duration and found bias toward S+ percept in central vision [5]. Then she implemented control experiments by enlarging the central stimuli to be the same size and same frequency as the peripheral stimuli, and found the bias toward S+ percept was little affected by the size and/or spatial frequency of the stimuli. She also evaluated the contrast sensitivities between central and peripheral viewing conditions and excluded the possible influence of contrast sensitivities to this bias. Finally, she concluded that this bias toward S+ in central vision indicated top-down feedback for visual recognition was stronger in central vision than that in peripheral vision. In this study, because we used the same stimuli as in her study, the bias toward S+ percept in total time duration and also in each time period suggests top-down feedback for visual recognition was stronger in central vision than that in peripheral vision.

4.2 Significant differences of both total FN in long time duration and of FN in each section of time period between central and peripheral vision suggest the functional differences of central and periphery vision

In Fig. 3(C) and Fig. 4(C), the total FN in long time duration and also FN in each section of time period are significantly larger in peripheral vision than those in central vision, suggesting the functional differences between central and peripheral vision. The former mainly works for visual recognition, thus involves higher level feedback to decode for a clear percept in most condition; whereas the latter specializes for visual selection, and does not require clear perception to the visual inputs [5,13].

4.3 Perceptual flips in both central and peripheral viewing conditions suggest rivalry between S+ and S channels in long perceptual time duration

As noted in the Introduction section, in the conventional binocular rivalry, when perceiving dichoptic stimuli, visual percepts switch between the left and right eye inputs. This is a local retinotopic level process and involves blinks and (micro) saccades but without vergence eye movements. In our study, there are perceptual flips between S+ and S percepts in both central and peripheral viewing conditions during the 300-s trial time duration, suggesting rivalry between these two channels. Based on the efficient coding theory that the left and right eye inputs are combined into S+ and S channels in V1 brain area [4], the rivalry is also a retinotopic level in primary visual cortex.

In both panels of Fig. 6(A) and (B), the probabilities of blinks and (micro)saccades reduce before t = −500 ms; after that they are gradually increasing in both central and peripheral viewing conditions. These tendencies suggest their link with perceptual flip and indicate the onset of the coming perceptual flip [2]. Since blinks and (micro)saccades happen at early stages [2], thus they are involved in both viewing conditions. These results match with those reported in binocular rivalry as mentioned at the beginning of section 1.

The relative changes in pupil size, as shown in the left and right panels of Fig. 5(C), increase in both central and peripheral viewing conditions at the moment of perceptual flip by key releasing for perceptual change (t = 0). When comparing Fig. 5(A) with Fig. 5(C), the changes in horizontal vergence happen several hundred milliseconds earlier than the changes in pupil size. If the former induces the latter, the convergence of horizontal vergence in the central viewing condition should cause constriction of pupil size [14]. However, we found pupil size dilated, which suggested no association between vergence and pupil size changes [15]. But the results might be explained by referring a previous study [12]. In their experiments, Einhäuser et al. used four kinds of dichoptic stimuli to study the change in pupil size in perceptual rivalry, and found that pupil dilation linked with perceptual flips. According to their results, during perceptual flips, locus coeruleus (LC) releases a certain amount of norepinephrine (NE) and causes pupil dilation (with several-hundred milliseconds latency). Because the association between pupil size and perceptual flip in our study is similar to the result in their study, thus the existence of the LC-NE complex may also explain the pupil size change in Fig. 5(C).

As a result, we found perceptual flips between S+ and S channels in long perceptual time duration of both viewing conditions, which show links with the probability distributions of blinks and (micro)saccades and relative changes in pupil sizes. However, in addition to blinks and (micro)saccades, there are vergence eye movements around the moments of perceptual flips in central viewing condition, but not in peripheral viewing condition. This might indicate different levels of visual processes and different underlying mechanisms. We will discuss these with more details underneath.

4.4 The existence of vergence eye movements in central vision suggests the involvement of high level visual attention

The left and right panels of Fig. 5(A) show, in central viewing condition, there are horizontal vergence eye movements (t = [−1500, 0] ms for the left panel; t = [−2800, 0] ms for the right panel) before the moments of key releasing for perceptual change. However, in peripheral viewing condition, there is no obvious vergence change before key releasing. Previous studies reported that vergence eye movements had close link with visual attention, which might involve the frontal cortex, superior colliculus, LIP and other related higher brain areas and present a cognitive mechanism [1520]. For example, Solé Puig et. al (2013a) investigated vergence changes by using different paradigms (cue/no-cue, bottom-up/top-down) and found vergence eye movements had links with both bottom-up or top-down attention [15]. By taking consideration that the retinal disparity changes (vergence changes) can active binocular neurons in primary visual cortex, and based on previous reports that binocular neurons in V1 brain area had a role for guiding vergence eye movements without depth perception, they proposed that there was a link between visual attention and binocular neurons [15,2122].

In our study, in central vision when perceiving dichoptic stimuli over time, participants may pay less attention and induce divergent eye movements and non-fusion of the left and right images. Since the left and right visual inputs are ambiguous, any non-fusion of binocular visual inputs may have influence on perceiving the motion direction; as a result, high level visual attention is involved to mediate vergence eye movements to re-fuse the left and right images for visual recognition. In other words, the vergence changes (divergence and convergence) in central vision might suggest the loss and involvement of visual attention. However, peripheral vision is not sensitive for the non-fusion problem (as explained in section 4.2), so it does not involve vergence eye movements. Physiologically speaking, based on the efficient coding theory, signals from S+ and S channels can be multiplexed so that each binocular neuron in V1 brain area carries a weighted sum of signals from these two channels [4,1011,13]. For example, tuned excitatory and inhibitory neurons can carry dominate inputs from S+ and S channels, respectively [4,1011,13]. Hence, our study may also suggest the link between visual attention and binocular neurons (e.g. tuned excitatory neurons, tuned inhibitory neurons and so on) as proposed by Solé Puig’s team [15].

4.5 A role of vergence eye movements in central vision as preparing for the coming perceptual flip

In central viewing condition of Fig. 5(A), there are decreases (divergence) of horizontal vergence (t = [−1500, −1000] ms for the left panel; t = [−2800, −2300] ms for the right panel) and obvious increases (less divergence and convergence) of horizontal vergence before key releasing for perceptual flip (t = [−1000, 0] ms for the left panel; t = [−2300, 0] ms for the right panel). Accordingly, in both panels of Fig. 6(A) and (B), there are gradually decreasing of blinks and (micro)saccades during t = [−1000, −500] ms and t = [−2300, −500] ms, respectively. It can be speculated that from t = −1000 ms in the left panel and t = −2300 ms in the right panel, visual attention are involved; along with them are increases of horizontal vergence and decreases of blinks and (micro)saccades. However, at around t = −500 ms, the probabilities of blinks and (micro)saccades reach to trough; after that there are gradual increases of blinks and (micro)saccades in both panels of Fig. 6(A) and (B). Since previous studies in binocular rivalry have reported that there could be an increase of blinks just before or at the moment of perceptual flips and an increase of (micro)saccades after perceptual flip [23]; so we speculate the actual perceptual flips happen just around or a little bit later than t = −500 ms in both panels, which are several-hundred seconds before the motor action of key releasing [12]. Previous studies reported that vergence change may switch the cortical state to make preparation for new incoming sensory signals [19,23], and there are links between perceptual flip and visual attention [24]. Since vergence eye movements involve high level visual attention as discussed in section 4.3, we propose the vergence change (the period of involving visual attention) in our study may have a role to prepare for the coming perceptual flip.

4.6 Different time durations of flips from S+ and from S suggest different feedback connections from higher brain areas to these two channels

In central vision of both panels in Fig. 5(A), the peak amplitudes of divergence and convergence eye movements are respectively around 0.05 degree and 0.15 degree. In S+ dominated percept before a perceptual flip as shown in the left panel of Fig. 5(A), once the left and right eye inputs are re-aligned (t = −500 ms as the vergence recovering from divergence to zero), a perceptual flip happens. However, in S dominated percept as shown in the right panel, a perceptual flip (t = −500 ms) happens 1500 ms later than the realignment of binocular inputs (t = −2000ms as vergence recovering from divergence to zero). It takes longer to occur for a flip from S than a flip from S+.

Previous study has reported that withdrawing attention can make perceptual flip slow, suggesting the link between visual attention and perceptual flip [24]. In our study, the longer time duration for a flip from S might suggest the less involvement of visual attention. It has been suggested that the visual attention can affect neural processing in a bottom-up process or a top-down feedback mechanism. The bottom-up process can be a stimulus salient feature, whereas the latter is known as selective attention [25]. In binocular rivalry, there are two main opinions to explain the visual processing, which remain controversial now. One is that perceptual flips are caused by top-down feedback process from frontal and parietal areas to early visual areas; the other is that perceptual flips happen at early visual areas and then provide feedforward signals to higher brain areas (like frontal-parietal areas) [25]. In our study, based on Eqs. (9) and (10), the visual inputs for S+ and S channels are equal, so they should provide similar amounts of feedforward signals. However, the different time durations of perceptual flips from S+ and from S might suggest different feedback connections from higher brain areas to each channel [19,23,2527]. In other words, higher brain areas for visual attention might provide top-down signals with a bias toward S+ channel [25].

Ooi and He (1999) implemented a serial of experiments to investigate the role of attention in binocular rivalry. They proposed that binocular rivalry involved multiple levels of visual areas; in which visual attention could be extended to early visual process (like primary visual cortex) to access the eye-of-origin based on feedback mechanisms or recurrent excitatory network [26]. Recent studies have reported that frontal cortex is the origin of both bottom-up and top-down attention and can control eye vergence [12], and fronto-parietal areas are important in providing top-down control for visual perception and perceptual flips [2527]. Kastner and Ungerleider (2000) reported that attentional top-down signals could be generated in higher-order areas in frontal and parietal cortex [25]. Therefore, in our study, the existence of vergence eye movements and perceptual flips in both panels suggests the involvements of frontal and parietal areas. Hence, we speculate the higher level visual attention covered areas (like frontal and parietal cortex) have a feedback connection bias towards S+ channel. Similarly, in Zhaoping’s FFVW (feedforward, feedback, verify and weight) model [5], feedback from higher brain areas has a bias to go to S+ channel for verification, which is also an example of biasing feedback connection. In both studies, because the prior knowledge is that binocular inputs are correlated; thus causes a top-down feedback bias towards S+ channel. Different from Zhaoping’s FFVW model, which covers higher brain areas for visual recognition (like V4, inferotemporal cortex) [5]; our study finds obvious vergence eye movements and suggests the involvement of higher brain areas for visual attention (like frontal and parietal areas).

Moreover, in natural world, since binocular visual inputs are normally correlated, so there should be a top-down bias towards S+ channel, instead of S channel, to mediate for vergence eye movements. Hence, the circuit of visual attention covered areas, including frontal cortex, neurons in V1 which carry S+ dominated signals (e.g. tuned excitatory neurons) and other related areas, has a higher signal processing efficiency to flip a percept.

4.7 The new findings and limitation of this study

In our study, there are perceptual alternations between S+ and S channels in long time duration; especially in central vision, there are vergence eye movements before perceptual flips, suggesting the involvement of high level visual attention. Previous studies implemented cue/no-cue paradigms or high level memory tasks to investigate vergence eye movements, and suggested the link between visual attention and binocular neurons [1213]. In our study, we used dichoptic stimuli which can be combined into opposite motion directions for S+ and S percepts in V1 brain area, and found different time durations between a flip from S+ and from S, suggesting different feedback connections from higher brain areas to these two channels. Since S+ and S dominated signals can be carried by different types of binocular neurons (tuned excitatory and inhibitory neurons are examples), our study reveals different feedback connections of high level visual attention to these neurons. Therefore, our study provides new insights between high level visual attention and different types of binocular neurons in primary visual cortex by using dichoptic stimuli and eye vergence as measuring tools.

In nature three-dimensional (3D) world, human visual system adapts to the binocular visual inputs which are normally correlated and thus activate those neurons in each type based on the weighted sum of S+ and S channel signals. However, in un-nature 3D world, when watching a 3D movie or playing virtual reality (VR) games, any mismatch (or uncorrelation) of binocular visual inputs may cause less involvement of high level visual attention and abnormal vergence eye movements, which might be one of the reasons of visual fatigue.

In the experiment, participants pressed one button for S+ percept and another button for S percept, and released the button for neutral percept. But there is some gap in time between releasing one button and pressing the other one, which might be considered as a neutral percept. We had summarized all the time durations of key releasing periods (gaps between key change or neutral percepts), and found those time durations shorter than 500 ms (mainly as a motor action time) have very small ratios in the total time durations (0.8% in central viewing condition and 0.5% in peripheral viewing condition) and also in each time period (similar amounts as in the total time durations). Because participants were asked to change the key press as fast as possible during the experiment; for the calculations of fractions of total duration and fractions in each time period in Fig. 3 and Fig. 4, relative to each S+, S, and SN percept, the gap in time between key change is very small and negligible. Hence, for simplicity, the gap was calculated into neutral percept, which might make it a little bit larger than actual value (but still negligible). But please note that this gap in time has no influence on the analysis of eye information data (Fig. 5 and Fig. 6), because we mainly focus on the moments before perceptual flips (a pure S+ or S percept before t = 0) and use t = 0 as the moments of key releasing for perceptual changes, but do not differentiate the percepts after perceptual flips.

5. Conclusion

In this study, to investigate whether visual percepts alter between S+ and S channels in long time duration and whether vergence eye movements are involved in the process; we used specially designed dichoptic stimuli as the previous study to produce unambiguous percepts for S+ and S channels, and adopted an eye tracker to record the eye information simultaneously. The results show there are perceptual flips between S+ and S channels in both central and peripheral viewing conditions. More importantly, we found significant vergence eye movements in central vision before perceptual flips, suggesting the involvement of high level visual attention; the time duration of a perceptual flip from S+ is shorter than that flipping from S, which might indicate different feedback connections from high brain areas to S+ and S channels. Since S+ and S dominated signals can be carried by different types of binocular neurons (tuned excitatory and inhibitory neurons are examples), our study reveals different feedback connections of high level visual attention to these neurons, and provides new insights between high level visual attention and binocular neurons in primary visual cortex by using dichoptic stimuli and eye vergence as measuring tools.

Acknowledgments

We are grateful to Professor Zhaoping Li for suggesting the idea for this study. We are grateful to the anonymous reviewers for their helpful suggestions and comments. We are grateful to Dr. Qing He for his helpful discussion on the statistical methods.

Disclosures

The authors declare no conflicts of interest.

References

1. X. Chen and S. He, “Temporal characteristics of binocular rivalry: visual field asymmetries,” Vision Res. 43(21), 2207–2212 (2003). [CrossRef]  

2. L. van Dam and R. van Ee, “The role of saccades in exerting voluntary control in perceptual and binocular rivalry,” Vision Res. 46(6-7), 787–799 (2006). [CrossRef]  

3. L. van Dam and R. van Ee, “Retinal image shifts, but not eye movements per se, cause alternations in awareness during binocular rivalry,” J. Vision 6(11), 3 (2006). [CrossRef]  

4. Z. Li and J. J. Atick, “Efficient stereo coding in the multiscale representation,” Network Comp. Neural. 5(2), 157–174 (1994). [CrossRef]  

5. L. Zhaoping, “Feedback from higher to lower visual areas for visual recognition may be weaker in the periphery: Glimpses from the perception of brief dichoptic stimuli,” Vision Res. 136, 32–49 (2017). [CrossRef]  

6. T. J. Mueller and R. Blake, “A fresh look at the temporal dynamics of binocular rivalry,” Biol. Cybern. 61(3), 223–232 (1989). [CrossRef]  

7. S. C. Chong, D. Tadin, and R. Blake, “Endogenous attention prolongs dominance durations in binocular rivalry,” J. Vision 5(11), 6 (2005). [CrossRef]  

8. D. H. Brainard, “The psychophysics toolbox,” Spatial Vis. 10(4), 433–436 (1997). [CrossRef]  

9. D. G. Pelli, “The videotoolbox software for visual psychophysics: transforming numbers into movies,” Spatial Vis. 10(4), 437–442 (1997). [CrossRef]  

10. K. A. May, L. Zhaoping, and P. Hibbard, “Perceived direction of motion determined by adaptation to static binocular images,” Curr. Biol. 22(1), 28–32 (2012). [CrossRef]  

11. K. A. May and L. Zhaoping, “Efficient coding theory predicts a tilt aftereffect from viewing untilted patterns,” Curr. Biol. 26(12), 1571–1576 (2016). [CrossRef]  

12. W. Einhäuser, J. Stout, C. Koch, and O. Carter, “Pupil dilation reflects perceptual selection and predicts subsequent stability in perceptual rivalry,” Proc. Natl. Acad. Sci. 105(5), 1704–1709 (2008). [CrossRef]  

13. L. Zhaoping, Understanding Vision: Theory, Models, and Data (Oxford University Press, 2014).

14. M. Feil, B. Moser, and M. Abegg, “The interaction of pupil response with the vergence system,” Graefe’s Arch. Clin. Exp. Ophthalmol. 255(11), 2247–2253 (2017). [CrossRef]  

15. M. Solé Puig, L. Pérez Zapata, J. Aznar-Casanova, and H. Supèr, “A Role of Eye Vergence in Covert Attention,” PLoS One 8(1), e52955 (2013). [CrossRef]  

16. M. Solé Puig, A. Romeo, J. Cañete Crespillo, and H. Supèr, “Eye vergence responses during a visual memory task,” NeuroReport 28(3), 123–127 (2017). [CrossRef]  

17. J. Bislay and M. Goldberg, “Attention, intention, and priority in the parietal lobe,” Annu. Rev. Neurosci. 33(1), 1–21 (2010). [CrossRef]  

18. P. Gamlin and K. Yoon, “An area for vergence eye movement in primate frontal cortex,” Nature 407(6807), 1003–1007 (2000). [CrossRef]  

19. H. Supèr, C. Togt, H. Spekreijse, and V. Lamme, “Internal state of monkey primary visual cortex (V1) predicts figure-ground perception,” J. Neurosci. 23(8), 3407–3414 (2003). [CrossRef]  

20. J. W. Bisley, “The neural basis of visual attention,” J. Physiol. 589(1), 49–57 (2011). [CrossRef]  

21. G. Masson, C. Busettini, and F. Miles, “Vergence eye movements in response to binocular disparity without depth perception,” Nature 389(6648), 283–286 (1997). [CrossRef]  

22. B. G. Cumming and A. J. Parker, “Responses of primary visual cortical neurons to binocular disparity without depth perception,” Nature 389(6648), 280–283 (1997). [CrossRef]  

23. M. Solé Puig, L. Puigcerver, J. Aznar-Casanova, and H. Supèr, “Difference in visual processing assessed by eye vergence movements,” PLoS One 8(9), e72041 (2013). [CrossRef]  

24. C. Paffen and D. Alais, “Attentional modulation of binocular rivalry,” Front. Hum. Neurosci. 5, 105 (2011). [CrossRef]  

25. S. Kastner and L. G. Ungerleider, “Mechanisms of visual attention in the human cortex,” Annu. Rev. Neurosci. 23(1), 315–341 (2000). [CrossRef]  

26. T. L. Ooi and Z. J. He, “Binocular rivalry and visual awareness: the role of attention,” Perception 28(5), 551–574 (1999). [CrossRef]  

27. M. Williams, C. Baker, H. Beeck, W. Shim, S. Dang, C. Triantafyllou, and N. Kanwisher, “Feedback of visual object information to foveal retinotopic cortex,” Nat. Neurosci. 11(12), 1439–1445 (2008). [CrossRef]  

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1.
Fig. 1. Schematics of stimuli in central and peripheral viewing conditions. (A) Central viewing condition; (B) Peripheral viewing condition.
Fig. 2.
Fig. 2. Experimental procedure.
Fig. 3.
Fig. 3. Total fractions (F) of seeing three percepts in central and peripheral viewing conditions. (A) Total fraction F+ of seeing S+ drift direction; (B) Total fraction F of seeing S drift direction; (C) Total fraction FN of seeing neither drift direction.
Fig. 4.
Fig. 4. Fractions (F) of seeing three percepts in each section of time period in central and peripheral viewing conditions. (A) Fractions F+ of seeing S+ drift direction; (B) Fractions F of seeing S drift direction; (C) Fractions FN of seeing neither drift direction.
Fig. 5.
Fig. 5. Distributions of relative changes in horizontal vergence, vertical vergence and pupil size during percept changes from S+ and from S. (A) Distribution of relative changes in horizontal vergence; (B) Distribution of relative changes in vertical vergence; (C) Distribution of relative changes in pupil size. (p < 0.01).
Fig. 6.
Fig. 6. Probability distributions of blinks and (micro)saccades during percept changes from S+ and from S separately. (A) Probability distribution of blinks; (B) Probability distribution of (micro)saccades.

Equations (18)

Equations on this page are rendered with MathJax. Learn more.

S L = ( C + S q + C S q ) / 2
S R = ( C + S q C S q ) / 2
S q = cos [ k ( y 2 π w k t ) + q ]
S q = cos [ k ( y ± 2 π w k t ) + q ]
k = 4 π L
L = 1.13 ( 1 + e e 2 )
S + = S L + S R
S = S L S R
S + = C + S q
S = C S q
S L = S ¯ [ 1 + ( C + S q + C S q ) / 2 ]
S R = S ¯ [ 1 + ( C + S q C S q ) / 2 ]
x ( hor , t ) = x ( l , t ) x ( r , t ) , t = [ 4000 , 4000 ]
x ( hor _ norm , t ) = x ( h o r , t ) x ( h o r a v g )
x ( hor _ mean , t ) = 1 n ( x ( h o r _ n o r m , t ) ) / 1 n ( x ( h o r _ n o r m , t ) ) n n
y ( ver , t ) = y ( l , t ) y ( r , t ) , t = [ 4000 , 4000 ]
Pu ( t ) = ( Pu ( l , t ) + Pu ( r , t ) ) / 2 , t = [ 4000 , 4000 ]
Pu ( t _ norm , t ) = Pu ( t ) Pu ( t _ avg ) , t = [ 4000 , 4000 ]
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.