In June 2017, the Art and Artificial Intelligence Laboratory led by Ahmed Elgammal at Rutgers University published new research describing machine learning algorithms capable of creating “Art” with a capital “A.” Many artists and art critics alike discounted the article as hogwash: rhetoric from the scientific community reducing art to mathematical formulas; a middle finger to fine arts and to human creativity.
I, too, was skeptical at first — frustrated, really. The media (especially this article) exaggerates the extent of success and poses far reaching implications for this research. In truth, Elgammal tweaked an existing network architecture (Generative Adversarial Networks, GAN for short) with success to achieve algorithms capable of more effectively differentiating artistic styles, and thereby creating new images from a known style. However, his claim to fame — of creating a machine to produce “Art” from new, unknown, anticipated styles — is false.
General Observations
At low frequencies (from far away), the images produced by Elgammal’s algorithm above appear familiar. The compositions and forms bear a likeness to artwork already in our collective consciousness, like Dali (row 4, column 2) or Rothko (row 2, column 1). However, at high frequencies, these images manifest artifacts not usually found in paintings, photographs, or any other common 2D media. These artifacts include precise repetitions (3, 3), abnormal convexities (4, 2), and distorted grain-like undulations (4, 1), among others. Just as a human-made painting usually manifests the mark of man or woman, these images manifest the mark of machine. These images bear an aura all their own.
Psychological Grounding
Elgammal’s inspiration for designing this algorithm comes primarily from the realms of cognitive and perceptual psychology, with three psychologists in particular. Those psychologists and their theories include:
1. Daniel Berlyne, who describes the “arousal potential” of aesthetics — that is, the potential to be aroused by art or experience — to be influenced by five factors: Novelty, Surprisingness, Complexity, Ambiguity, and Puzzlingness.
2. Wilhelm Wundt, who devised a curve to describe the relationship between hedonism (emotional pleasure) and arousal (perceptual pleasure). There exists a sweet spot in the middle that maximizes hedonistic pleasure: too little arousal leaves us wanting more, but too much is distracting and overwhelming.
3. Colin Martindale, who wrote extensively on the inherent patterns in aesthetic trajectories. In his book, The Clockwork Muse: The predictability of artistic change, Martindale gives scientific accounts of how artistic trends can be modeled and predicted. However, Martindale has no background in art history, so bases his theories on the notion that art is, and always has been, valued primarily for its potential to arouse. This sidesteps long-lasting traditions of art as inquiry, art as social justice, and art as expression, and also turns a blind eye to the intensely market-driven nature of contemporary fine arts. With no attempt to distinguish “First Word Art” and “Last Word Art,” Martindale claims that “If artists keep producing similar works of arts, this directly reduces the arousal potential and hence the desirability of that art,” suggesting that art is visual first and its concept and process are tangential to its value.
These foundations serve to show the shaky grounding on which this algorithm is built. These psychologies, though narrow-minded and incomplete, account for some of the qualitative success of the algorithm, but as we will see, set up failure for achieving a true “arthood.”
Creative Adversarial Networks (CAN)
Adversarial Networks are machine learning algorithms consisting of two parts: a discriminator and a generator. Each part works separately and “grows” independently of the other. The only information they exchange is a “learning signal” that describes how well the discriminator thinks the generator did when it generated an image.
The novelty of Elgammal’s algorithm (called Creative Adversarial Network or CAN) is the provision of two learning signals: one which describes whether an image is “art” and another which describes how well it fits into an existing style. Thus, the pair of components is able to effectively learn how to create images that are not only art, but that fit into existing styles. The converse is also true: since the algorithm knows when an image fits into a style, it also knows how well it doesn’t fit into a style. Elgammal takes advantage of this fact when he generates the images shown earlier: he asks the algorithm to “explore parts of the creative space that lay close to the distribution of art … and at the same time maximizes the ambiguity of the generated art with respect to how it fits in the realm of standard art styles.”
Here’s the problem: “First Word Art,” to which Elgammal likely aspires (and to which I furthermore refer to when I use the word “art”), doesn’t exists between existing styles. Art isn’t born between the cracks of history; it emerges at the frontier of the present. This algorithm does nothing more than search out for unrepresented imagery in the space of what we call “art.” This yet-unseen imagery isn’t itself art any more than a chocolate bar in the produce section is considered good for you. To exemplify these facts, I present the following digram:
Here, artistic styles are represented by solid black circles, including the Renaissance, Modern, Cubist, etc. Together, these styles are recognized as residing in a space defined as art. External to this space are those extant things not yet considered art (like a pack of water bottles from the supermarket) and those unimagined things which might one day be art (some of which may one day indeed be produced by machine learning).
Elgammal’s algorithm is trained to stay within the thick black dashed line, but seek out those blue spheres existing between and amongst existing styles. Is this “interpolation” between existing structures considered “art”? I say No: the next wave of art exists in a space external to that which we already consider art. Those things yet to be imagined or contextualized as art have the possibility of attaining arthood, but only a human is capable of creating or seeking out these thin dotted spheres. The machine will never be capable of creating non-art because its definition of art has been defined by humans. Until the machine is capable of defining and defending its own vision of art, that which it endeavors to make will never be considered true art.
Out of this diagram rises an interesting implication: presumably, just as there are limitations to the spaces we envision between styles, there are mutually exclusive limitations to the spaces algorithms are capable of discovering between human-made and machine-made styles. What sort of thing might be capable of thinking between us and the machines? What is it that might one day be capable of seeing where neither us nor our “children” could ever see?
Qualitative Evaluation
Elgammal attempts to evaluate the images by their ability to fool humans into thinking they were made by humans. Studies on the perception of generative work have been made since its conception. These studies have shown time and time again that humans regularly (and quite often, in fact), mix up generative and man-made work. However, Elgammal claims incorrectly the novelty of his study in this regard. Furthermore, in his experiments, he uses very few subjects and he only uses individuals from Mechanical Turk. The reliability of these subjects and low sample count makes any attempt to identify trends useless. And, even if his data were true (that is, humans believe some of the machine-made images are more “human” than human-made work), this finding isn’t new.
Conclusion
Despite unreliable foundations and evaluation of the CAN algorithm, it produces intriguing images: remixes of existing styles. However, it does not produce “art” in its truest form, but rather, seeks out undiscovered niches in an existing visual terrain.
If, for a moment, we entertain the idea that all art can be placed on a plot with two axes: one representing perceptual excitement, and the other representing emotional excitement, we realize that the former has been entirely forgotten by this research. This is not a mistake: the perceptual components (described here as “visual style” and “visual arousal”) are not only much better understood, but can be feasibly represented in a formal logic. Emotional or conceptual content, which consists of a heavy chunk of the value placed in art, has yet to be distilled. Thus, algorithms today are simply incapable of thinking beyond the “Perceptual Horizon.” Without the mental faculties to comprehend concept, the emotional axis appears nonexistent to them. Like a human stripped of all intellectual and romantic associations, the machine does not “feel” less for knowing only the aesthetics of light; rather, it “feels” differently. Endowed only with the faculties to read pixels and fill pixels, it’s no wonder that its experience is vastly different from our own; it’s no wonder we don’t (yet) understand each other.