Abstract
This paper identifies a specific pattern of luminance in pictures that creates a low level non-subjective neuro-aesthetic effect and provides a theoretical explanation for how it occurs. Pictures evoke both a top-down and a bottom-up visual percept of balance. Through its effect on eye movements, balance is a bottom-up conveyor of the aesthetic feelings of unity and harmony. These movements are influenced by the large effects of saliency so that it is difficult to separate out the much smaller effect of balance. Given that balance is associated with a unified, harmonious picture and that there is a pictorial effect known to painters and historically documented that does just that, it was thought that such pictures are perfectly balanced. Computer models of these pictures were found to have bilateral quadrant luminance symmetry with a lower half lighter by a factor of ∼1.07 +/- ∼0.03. A top weighted center of quadrant luminance calculation is proposed to measure imbalance. A study was done comparing identical pictures in two different frames with respect to whether they appeared different given that the sole difference is balance. Results show that with observers, mostly painters, there was a significant correlation between average pair imbalance and observations that two identical pictures appeared different indicating at a minimum that the equation for calculating balance was correct.
For those who can disregard saliency the effect is the result of the absence of forces on eye movements created by imbalance. This unaccustomed force of imbalance causes fatigue when viewing pictures carefully. A theoretical model of how the visual system could calculate the balance of any object is presented. Using this model the center of balance in non-rectangular pictures was determined that corresponded to empirically determined balanced pictures.
Introduction
A picture, a bounded flat image, corresponds to nothing found in nature or mental imagery. This includes the borderless images in prehistoric caves [1]. It differs from an image without borders in that it evokes the percept of balance: the parts relate to both the center and the borders [2]. Balance is both a bottom-up percept and a top-down aesthetic concept taught in schools. But balance of what and how? Color, form or tonal areas have been proposed. These somewhat immeasurable qualities have been said to be balanced around a geometric center or the principal pictorial axes and defined by a center-of-gravity type of equation. It is said to explain both the perception of balance and the aesthetic qualities of unity and harmony in pictures [2, 3, 4]. However, it is either impossible or would require too much calculation for the visual system to do this. Any attempt to show that low level balance is determined by some sort of equation might be associated with this erroneous concept. However, in the same way that elements of salience such as color, contrast, and orientation are used to calculate eye movements within a picture, one might be able to calculate balance as a function of pictorial properties [5].
Eye movements are predominantly influenced by saliency and top-down priorities [5, 6, 7]. These determinants of eye movements are large so that it is difficult to distinguish the much smaller effect of balance. Locher and Nodine tried to show the effect of balance by comparing eye movements in pictures that were thought to be balanced with variations of these pictures. Using trained and untrained observers, they found that trained observers had more diverse and fewer specific exploratory movements in the more balanced compositions [8, 9,10]. These effects are characteristic of better balance in that they indicate that the trained eye is less constrained to specific salient areas but do not quantify the effect of imbalance.
Given that the determination of how the visual system calculates pictorial balance would seem to be only done empirically, I thought that one approach to the problem might be to create a perfectly balanced picture. Such a picture where balance has no effect on eye movements could be used as a base from which other measurements might be made. As mentioned perfect balance has also been used to explain the feelings of unity and harmony evoked by some pictures, and this corresponds to an obscure effect observed by painters that does just that. There is no name or metaphorical description for the percept evoked by these pictures other than to say that they exhibit the aforementioned effect. It is only discussed in art schools when standing before such a picture. The unexpected change in a painting from being remarkably unified and harmonious to something less so or the reverse is quite striking. It creates the feeling that the whole picture can be seen at one time without need to focus on any pictorial element. I call this percept pictorial coherence.
This was first noted by Roger de Piles, an 18th century French critic and painter who describes an arrangement of “lights and darks” in paintings creating an effect that is spontaneous and intense giving the impression that the viewer could gaze at the entire painting without focusing on any particular form [11]. Cézanne apparently noted this. Indeed, his inability to describe this effect gives some of his conversations a cryptic quality [12, p.88-89, 110-111]. Both Delacroix and Kandinsky give vivid descriptions of seeing paintings that exhibit it [13, 14, 15] (Addendum A). To study this effect I created through experience acquired from many years of painting computer models of pictures that evoke it. Image analysis indicated that the percept was obtained when an LED picture, centered at eye level and parallel to the plane of vision, has bilateral quadrant luminance symmetry with the lower half being slightly less luminous than the upper half by a factor of ∼1.07 ± ∼0.03 (where luminance is measured from 1 to 255 with black being zero as measured by Photoshop© CR6).
A theoretical formula for balance can be derived from this which defines the observed state of perfect balance and explains observations implying a center-of-mass-like effect: if a rectangular picture with bilateral quadrant luminance symmetry and a darker upper half can be said to be perfectly balanced i.e. its center of luminance is located at the geometric center of the picture, then the standard formula for the center of mass of four connected objects can be used. In this formula quadrant luminance LxxQ replaces the mass of each object with the picture centered at (0,0), the geometric center of each quadrant is its center of luminance, and LTOTAL is taken as the average total luminance. Y values of the upper quadrants are modified by 1.07 to account for the increased visual weight of the upper half. XABQ and YABQ are the coordinates of the center of their respective quadrants, and LABQ is a quadrant’s average luminance. The equation can be expressed as an addition of four vectors so that balance is the vector
While the aesthetic effect can be intense with a picture seen by reflected light, the percept is much less so in LED pictures. There is a diminished feeling of harmony and unity. However, the effect on eye movements is maintained. As a result a study of subject preference could not be done. Bertamini et al. noted that observers preferred a print reproduction to seeing the same picture in a mirror or on a monitor and that they preferred the mirror view of the actual image to the image on an LED screen [16]. Locher P, Smith JK, et al. [17] made similar observations. These observations can be explained by the light being emitted from an LED screen is polarized to which humans are sensitive. Misson and Anderson [18] showed that polarized light alters luminance contrast perception which in some unknown manner interferes with the aesthetic effect. LED images had to be used to permit precise luminance determination and reproducibility.
With respect to the effect an image framed with a white border against a black ground is seen as one visual object – the image and the frame. This is not true for a picture framed in black against a white ground. The former permits the comparison of a balanced framed picture with the identical picture in a slightly different frame that renders the combination unbalanced. This avoids the confounding variable of salience that occurs when an image is compared with even a small variant. If an observer is able to compare the two images and to disregard the frame, any perceived difference would be ascribed to the effect. Most people who observed the two identical pictures in succession saw no difference. Only a group of painters and a few people very interested in looking at pictures were found likely to see it. Two studies were done to show that those likely to see the effect in paintings could notice a difference between “identical pairs.” A painting exhibiting coherence is said to be perfectly balanced; all others will be called unbalanced.
Materials and Methods
Each study consisted of ten pairs of images: Five in which one of the images was coherent and the other unbalanced (balanced or coherent pairs) and five in which both were unbalanced (unbalanced pairs). Images were prepared using Photoshop©. Quadrant image luminance characteristics can be changed in an unobtrusive and precise manner in Photoshop through the use of the level command on a small irregular form in which the luminance is usually changed by no more than ±5 percent. The form is then moved around until the desired quadrant value is reached. The two studies differed only in using different sets of white borders. The three possible image and border combinations are in Fig. 1. Study I compared figure 1a with either 1b or 1c.
Study 2 compared figure 1b with 1c. Observers view sequentially a picture with different frames whose widths are slight different on a color calibrated iPad using the ColorTrue™ app and a Colorite™ calibrating device. The observers viewed the iPad centered and parallel to the plane of vision at arm’s length. The images on the iPad were approximately 5×6.75 inches and subtended visual angles of roughly 19⁰ and 23⁰ respectively under the conditions of the study. The pictures used for the studies are in Fig. 2. Images 1,3,5,7, 8 were the balanced pairs (all images are in the supplemental files).
The observers were predominantly artists of any age who gave their consent to participate in an anonymous study about how people look at pictures according to the provisions of the Declaration of Helsinki. They were told that the study was about how people look at pictures and that it consisted of looking carefully at ten pairs of pictures and being asked whether the central images appeared to be the same or different. They were told that it was thought that some pairs might be seen as the same and others as different. No identifying information or medical history was obtained. There was no attempt to eliminate observers who were color deficient, had cataracts or had impaired binocular vision. 45 observers were included in the first study and 39 observers in the second study. Other observers were excluded if they could not follow the directions by either rapidly saying without careful examination that they could see no difference or if by insisting that the frame change was the only difference. There were only four of the latter. For pairs in which both pictures are unbalanced, it was presumed that observers would find the pictures to be identical so that the response of “same” was considered correct. With the balanced pairs a response of “different” would be labeled correct indicating they had seen the effect.
Observers viewed the pictures sequentially on an iPad at arm’s length while I sat slightly behind them. They were permitted, indeed encouraged, to hold the iPad themselves as long as it was maintained correctly centered and parallel to the plane of vision. There were no restrictions on the length of observation, and observers could return to a previous image as much as they wanted. However, subjects were told that the difference if any was more in the way of a feeling and were discouraged from making point by point comparisons. The difference was described as analogous to that between monophonic and stereophonic music: same music but seems different.
Results and discussion
Four observers could identify all the balanced pairs correctly, and 5 observers made one error. (Table 1) Although it cannot be statistically proven, it is very suggestive that the effect is being seen. Some subjects thought they saw differences in color. One thought the depth of field was different, and many saw differences but could not describe them.
Among observers who perceived differences in any pair, many saw differences in both balanced and unbalanced picture pairs. Initially this was thought to be due to guessing. However, there is a correlation between the percent of pairs observed to be seen as different and the pair average distance (average imbalance). Pairs can be divided into two groups: one in which the average state of imbalance within the pair is small (the balanced pairs), and the other where the average imbalance is large. The balanced pairs were identified correctly by 40% of observers, and the unbalanced pairs were identified correctly by 75% of the observers. The correlation between average pair distance and the percentage identified as different was: r (18) = .799, p < 0.001. (Table 2)
This might verify the hypothesis that observers see pairs as different within balanced pairs. However, it might just indicate that observers see pairs that have low average pair imbalance as different from those having high average pair imbalance. One could verify this by constructing unbalanced pairs with an average imbalance no different from that found in coherent pairs.
If one were to select unbalanced pairs that contain particular types of salient features, then the results would be similar to image pairs 2 and 10 which most subjects saw as the same. These pictures by Matisse and Gauguin are not only unbalanced but contain salient images such as the woman in white that strongly draw one’s attention. However, I am sure they would not be normally thought of as being unbalanced. Other salient types of forms or groups of forms will be discussed below. While table 1 might appear to show a Gaussian distribution, the answers are not simply due to chance. It is like the aforementioned analogy with monophonic and stereo music; one can place two speakers so far apart that anyone can identify a stereo effect or move them so close that no one can. One can also liken the image pairs to pairs of flags extending in the distance until they appear to everyone to be one flag. Using this analogy one could say that this particular study does not distinguish between the people who see two flags and the few who become fascinated by what is painted on them. The results indicate at a minimum that the visual system sees pictures in terms of the formula of quadrant luminance balance or a variation of it that corresponds to distance in the analogy. Additional image pairs with lower/upper ratios closer to 1.03 or 1.10 can be found in the supplemental files.
The use of an iPad in different lighting environments was necessary to bring the study to the observers as no laboratory was available. Reflections from the iPad screen increased the number of false negatives, i.e. coherent pictures being seen as unbalanced as they increase the luminance. The possibility of examiner influence cannot be excluded although an attempt was made to diminish this by being seated behind the subjects. That observers saw unbalanced pairs as being different was were contrary to expectations as seen in table 2. After all they are identical pictures. A MacNemar test showed data validity p < 001.
Pictorial balance is extremely sensitive to small differences of quadrant luminance which explains its fugacity without unvarying viewing conditions (table 3). Changing the spectrum of the lighting or the height at which the painting is hung with respect to the viewer destroys the effect. With pictures in a studio even the drying of paint in the following twelve hours creates a tonal change that is sufficient to make a coherent state disappear. On the other hand, the effect can appear at times in a picture when viewed at some particular angle or under a particular illumination. These pictures will not appear balanced when viewed in any other way. Given that not everyone can see the effect, that it is normally arrived at through chance, that it is usually fleeting and that it has no name would explain why it is not discussed in the 20th century literature. The observations of de Piles had been forgotten by painters and critics.
Conclusions
When an innately sensitive observer views a picture ignoring saliency, balance remains to direct the eye. With perfect pictorial balance there is no agent controlling eye movements. One becomes attentive to the whole picture and is able to move smoothly through it instead of being attracted to one form after another. This is what Roger de Piles describes as seeing the whole picture at one time “le tout ensemble” [14] (p.121). It was found empirically that artist particularly painters had to be used as observers. Painters have been shown to view pictures differently than untrained individuals [10, 19, 20, 21, 22]. Antes showed that fixation patterns distinguish painters from non-painters, and Koide et al [23] using a prediction model of a saliency map showed that artists are less guided by local saliency than non-artists so that the effects of balance would be more manifest.
As noted, polarization destroys the aesthetic effect much more than the eye movement effect. It would seem that polarized light somehow inhibits the aesthetic feelings elicited by eye movements. To those who equate an aesthetic effect with preference, the aesthetic aspect of this type of balance has not been proven. However, the judgment of preference is a high level cognitive response based on culture and fashion.
Geometric forms, strongly contrasting forms or many different equally prominent forms that cannot be combined conceptually force the eye to focus on each particular form. For example, if there is a picture with a distinct mouth in one corner, eyes in another, a chin in a third and so forth, they are seen as distinct salient objects. However, they are not likely to be seen that way if combined to form a face. Many different forms force the eye to look at each one and not the picture as a whole. Geometric forms do likewise and might be interpreted as pictures within the picture. Color is never a salient feature.
Normal viewing seems effortless because we are accustomed to the work performed by the eye muscles. However, balance continually exerts a force that differs in each quadrant such that no matter what is being seen, the eye cannot become accustomed to it. No one complains of visual fatigue when intently watching a moving image on an LED screen where balance plays no role, but they do when looking carefully even briefly at pictures. This fatigue seems to limit any study to perhaps 16 pairs or less at a time. The key element is viewing carefully. Casually leafing through a picture book is different.
A flat surface with no marks, i.e. one coated with an opaque layer of paint but gradually modified so that the upper half is slightly darker, also cannot be seen as coherent. There has to be some surface quality for this part of the visual system to recognize an object. It has been shown that forms in nature have a fractal quality, that fractal images have an aesthetic quality and that the visual system has evolved to respond to natural conditions [24, 25, 26]. Therefore, it can be inferred that the part of the visual system that calculates balance is also most sensitive to fractal forms. The visual system is designed to work best in this fractal environment and not the world of manmade objects that are linear and not fractal causing the eye to jump from one form to another (see comments on Cézanne in addendum A). The author has observed that binocular vision is necessary to observe the percept.
Pictures are a very recent invention requiring technology to create flat surfaces (maybe 4th or 5th BC). They are not part of human evolutionary development. Borders change an image into a picture, a visual object evoking the percept of balance. Since balance is a low level percept resulting from object luminance, it must have evolved for some other reason. It is doubtful that such a calculation is an accident without evolutionary significance such as has been suggested for the perception of polarized light [18]. Balance could be an early evolutionary method to follow a moving object: an object defined as a collection of luminous points that move together with respect to a ground. If one were to divide this into two equally luminous parts by a vertical line which is itself bisected perpendicularly by another line, the object is divided into four parts. Lines parallel to the preceding ones are drawn enclosing the object creating a “picture.” The luminance of the upper half would be arbitrarily diminished relative to the lower half by a factor so that a uniform light source would not be seen as balanced. The amount of light power from each quadrant of the object is spread over the area of its respective quadrant of the rectangle. The center of balance is calculated as the center of light power.
The visual system identifies the object by this point and the virtual rectangle which represent all the points of light that make up the object. This is constantly changing but once done, the visual system has “locked on” to it and can therefore distinguish it from all the other points of light.
A complex object of whatever size could be reduced to a moving vector representing the weighted center and an illusory rectangle. Even if it stops moving, the vector and rectangular halo would remain visible where it might otherwise blend in with the ground. The rectangle guarantees that at any moment either the prey or predator is located within its boundaries. This theory was used to solve the problem of balancing non-rectangular pictures such as figure 3 that I could only find empirically. Drawing a vertical line followed by a bisecting horizontal line and rhwn enclosing the picture within a rectangle so that quadrant calculations that included the black areas were done and enabled balanced images to be obtained which were within the limits of empirically determined values. Essentially the light is spread out into the black spaces
This is demonstrated with fig. 3 whose lower half is identical to the balanced fig 4 with a luminance of 124.8. The upper half of fig 4 has a luminance of 115.5. The proposed theory is that if visual balance is based on light per unit area or luminance, then the upper half of fig 3 has to have a luminance of 115.5. However, if the light energy or luminance flux is spread out over the total area that includes the black rectangles then the luminance of the total upper half including the black areas will be less. This is calculated as follows: the luminance of 115.5 multiplied by the number of pixels of just the picture is equal to the luminance L of the upper half including the black rectangles multiplied by the pixel area of the upper half.
(189,360-54,000) x 115.5=189,360 x L, so that L = 82.6. Changing the upper right and left quadrants of figure 3 so that they have a luminance of 82.6 (including the black rectangles) creates a balanced picture.
One can object to the results of this paper as reducing the art of painting to just painting by numbers. This might have been the response of a teacher of painting who was astonished by what he saw, but who absolutely refused to cooperate once I explained the research. Of course, were one to do so, the painting would have to be exhibited under precise conditions and might not look so good otherwise. Low level balance takes a composition that may or may not be particularly poetic (substitute whatever aesthetic term you prefer) and at least makes it easier to look at and at best under the right conditions makes it stunning. When I saw one of the masterpieces of Western art Vermeer’s The Art of Painting in Vienna, it was exhibited under conditions at that time that made it completely unbalanced creating a disconcerting feeling. I did not have this response when I saw it at Washington’s National Gallery. As music’s effect can depend on the performance, so a painting’s effect depends on how it is displayed to those who are sensitive to this. Perhaps the most marked characteristic of its aesthetic effect is that the attention is riveted in an indescribable involuntary way. It is, as Delacroix describes it so eloquently, different from the peak effects of other art forms - no tearing, shivering or trembling [27,28,29] so that how or where a low level response is interpreted on a higher level must be sought elsewhere.
Addendum A
Eugene Delacroix: “There is a kind of emotion which is entirely particular to painting: nothing [in a work of literature] gives an idea of it. There is an impression which results from the arrangement of colours, of lights, of shadows, etc. This is what one might call the music of the picture…you find yourself at too great a distance from it to know what it represents; and you are often caught by this magic accord. In this lies the true superiority of painting over the other arts for this emotion addresses itself to the innermost part of the soul… [and] like a powerful magician, takes you on its wings and carries you away.” [13, 14]
Wassily Kandinsky describes the effect, but being unaware of de Piles’ observations which would have permitted him to understand his sensations, drew the conclusion that it resulted from the absence of recognizable objects:
“It was the hour of approaching dusk. I returned home … when suddenly I saw an indescribably beautiful picture, imbibed by an inner glow. First I hesitated, then I quickly approached this mysterious picture, on which I saw nothing but shapes and colors, and the contents of which I could not understand. I immediately found the key to the puzzle: it was a picture painted by me, leaning against the wall, standing on its side. The next day, when there was daylight, I tried to get yesterday’s impression of the painting. However, I only succeeded half-ways: on its side too, I constantly recognized the objects and the fine finish of dusk was lacking. I now knew full well, that the object [objective form] harms my paintings.” [15] (p.68)
Cézanne’s comments about painting are most frequently recorded in conversations in which he repeatedly insisted that a painter must paint from nature. However, his paintings are noted for their distortions especially the series of bathers in which both people and nature are abstract. Forms in nature as opposed to man-made forms have a fractal nature and do not have particularly salient properties. I believe Cézanne was attempting to compose pictures that enabled the eye to move through the picture as it did for him (and us) when he looked at natural scenes where manmade forms were excluded [12]. It would seem that he had no vocabulary to express this other than to say one must paint from nature.
Footnotes
Conclusions expanded with different figures and general redaction of paper .