Art

DALL-E 2 Presents the Future of Art Through AI

With the AI system DALL-E 2, Silicon Valley stole our attention and made us collectively lame.

alexis schwartz tech DALL-E 2 Presents the Future of Art Through AI

Art history’s downfall may have begun with cleavage— specifically, cleavage from Jennifer Lopez’s green Versace silk chiffon dress at the 42nd Grammy Awards ceremony on February 23, 2000. Desperate to take a peek, droves of fans typed “J.Lo Grammy Dress” into Google but were disappointed, as searching for images had yet to be invented. In fact, former Google CEO Eric Schmidt later revealed that the abundance of searches for the dress inspired Google to create Google Images, and by July 2001, they’d succeeded. Twenty-one years later, the pioneering concept of publicly available and searchable photos has amassed over 1.12 trillion images for the Internet while shaping visual comprehension for humanity and robotics. Through the breadth of data we’ve uploaded, shared, created, and forgotten, we’ve created a new formula for image accumulation and, unknowingly, developed a vast training dataset for AI researchers and their algorithms to use.

It’s taken years to arrive at the current processing power of AI text-to-image generators. For the most part, the labeling of images has been the most significant contribution, taking tremendous man-hours for incremental change. In 2007, computer scientists at Stanford and Princeton began classifying images derailed by cursory labels that general users had simply attributed to their photos of “cat.jpeg.” They realized the need for human-based input to create sophisticated image captions for classification programs to streamline searching capacity. (Therein lies the difference between “cat” and “fat, fluffy, white cat in a basket.”) The project employed more than 20,000 people to label pictures; by 2012, the team had created a database of 14 million captioned photos. Satisfied with the developments, the scientists pinned up a poster advertising the dataset in a Miami Beach conference center, which “quickly evolved into an annual competition to see which algorithms could identify objects in the dataset’s images with the lowest error rate.” Thus began the humble genesis of a generative adversarial network (GAN).

flower bouquet flower arrangement flower plant person dress clothing
“Lucious mermaid birth party during the Flemish Renaissance with the pre-Raphaelite hair, shot by Tim Walker." Sarah Hoover.

GANs started as geek wars. As technology writer Loz Blain explained, GANs took “two AIs in competition with one another, both “trained” by being shown a huge number of real images, labeled to help algorithms learn what they’re looking at. A “generator” AI then starts to create images, and a “discriminator” AI tries to guess if they’re real images or AI creations.” Billions of iterations over, the AI became more sophisticated in creating and deciphering, but remained rudimentary in its imaginative conceptual reasoning. Sometimes, AI was convinced of the reality of photos that children would know are unrealistic. For instance, in 2015, Google’s AI image generator, DeepDream, spookily inserted dog faces into many of its creations as it reacted to the popularity of puppy images within the data set. Still, engineers persisted, and the popularity of GANs caught on.

By 2016, Andrew Conru, a businessman with a Ph.D in mechanical engineering from Stanford University who helped pioneer online shopping carts and online matchmaking, took GANs one step further. To Conru, the “logical next step in the journey of artistic evolution” was to outsource fabrication to engineers who, with a vast understanding of hardware and software, could render painters obsolete. From there, RobotArt was founded. The competition, covered by Smithsonian and Popular Mechanics, was a dystopian nerdgasm of robot enthusiasts superficially recreating paintings without the human hand. Some robots created “new work,” some replicated pop art, while others were coded to stylistically create “new” versions of Van Gogh’s impasto through AI generators. Ultimately, the winners split a $100,000 prize, packed up their robots, and headed home, unperturbed by the metaphorical slap. (Duchamp, who claimed art should merely be declared “art,” likely never anticipated Silicon Valley cash prizes.) Nevertheless, the competition was a harmless threat, winner Pindar Van Arman was eventually pooh-poohed by New York magazine senior art critic Jerry Saltz, and the art world shrugged off Battle of the Art Bots, GANs, and Silicon Valley. We scoffed, deservedly so, and moved on.

flower arrangement flower bouquet flower plant rose portrait person face photography finger
“The courage to be real.” Stuart Vevers.

But then came GANism. As is natural with art history, artists emerged within the space to liberate the medium from its commercial shackles. Artists such as Trevor Paglen and researcher Kate Crawford rewrote the labels without inherent bias and fed them through their proprietary system to create new data. Paris-based collective Obvious created a GAN system using portraits painted between the 14th and 20th centuries, eventually generating an AI portrait and selling it at Christie’s for $432,500. At the same time, Memo Akten’s 2018 film Deep Meditations trained on images tagged with abstractionist concepts conveying the meaning of existence. The medium developed a foothold, buoyed by critical methodology approaches from Casey Reas, Lev Manovich, Arthur I. Miller, and Nora N. Khan, interested buyers, and innovative approaches to deconstructing GAN. GAN Art superstars like Refik Anadol and Mario Klingemann, alums of Google Arts & Culture residency, started to emerge, while institutions including the MoMA and Centre Pompidou opened their doors. The art world had finally found a lens through which GAN could have meaning, primarily as subversion to its Valley origins. The artists’-created image captions were a nuanced approach to examining our living systems and perception.

On July 20, 2022, another GAN iteration was released: its name is DALL-E 2. A clever mix between Pixar’s WALL-E and Salvador Dalí, it may be one of the most extraordinary and potentially apocalyptic pieces of technology ever created. The brainchild of tech entrepreneurs Elon Musk, Greg Brockman, Ilya Sutskever, Wojciech Zaremba, and John Schulman, OpenAI—DALL-E’s corporate parent—has created unprecedented generative capabilities tailor-made for the art world. Interested in an “Alien in Kodak 400 film in the style of Annie Leibovitz”? DALL-E 2 has got you. What about “Entering the fourth dimension in the style of Van Gogh”? You’re all set with DALL-E 2. “The Supreme Court getting down on the dance floor in the style of Norman Rockwell”? Yup, DALL-E 2 can do it. Never before has a text-to-image generator been made for the art world, and never before has one been adopted so quickly. This was intentional.

fashion dress clothing gown formal wear person
“Post-apocalyptic garbage high priestess divine.” Julia Fox.

"Never before has a text-to-image generator been made for the art world, and never before has one been adopted so quickly. This was intentional."

By the end of summer 2022, OpenAI sent out one million DALL-E 2 beta invitations to insiders, including artists, journalists, researchers, entrepreneurs, and tastemakers. And within a week, DALL-E 2’s presence was ubiquitous on Instagram, group chats, and Twitter. Painters and tech entrepreneurs alike adopted the generator: cover star Emma Stern posted to the grid, and billionaire LinkedIn co-founder Reid Hoffman sold his DALL-E 2 creations for $24,000 on NFT marketplace Magic Eden. Using the Fyre Festival game plan (the famous orange square), DALL-E 2 became cool. But, like the ill-fated festival, it’s important to note that the users are not the benefactors, and management is in crisis.

OpenAI has metamorphosed from its original charter, which claimed the company’s “primary fiduciary duty is to humanity” as it developed AGI—a machine with the human mind’s learning and reasoning powers, bypassing the line of singularity. AGI technology, feared by those who understand the existential threat, would be less palatable if fathered by wealthy corporate greed, so to begin its mission OpenAI registered as a non-profit. As Karen Hao wrote in MIT Technology Review, “Though it never made the criticism explicit, the implication was clear: other labs, like DeepMind, could not serve humanity because they were constrained by commercial interests. While they were closed, OpenAI would be open.” Or so we thought.

publication person comics book horse mammal animal equestrian
“Western novel cover art of a cowboy on horseback jumping over a can of corn cooking above a campfire.” Canyon Castator.

Fast-forward a few years, OpenAI revoked its non-profit charter and registered as a for-profit company in 2019—this was a contentious decision and a crucial key point. Promptly following the announcement, Elon Musk, who believes AI is “potentially more dangerous than nukes,” left OpenAI, issuing a tweet, “I didn’t agree with some of what OpenAI team wanted to do.“ Cue the shedding of its employees, who felt duped by the idealistic turnaround, a billion-dollar Microsoft investment, and OpenAI lost its It girl mojo. No longer the research institute that promised to be “open,” profit growth became the company’s legal responsibility. Its fans in Silicon Valley, which championed AGI oversight in the non-profit context, lost faith, while investors’ profit-Pavlovian response induced financial drool. So, OpenAI needed an image overhaul, and DALL-E became their fun re-entry into good graces. DALL-E was silly, hip, mind-boggling, incredibly viral—and only available to people who respected its processing powers. In its beginning, users weren’t allowed to sell generated artwork, and the underlying model was no longer learning (aka collecting images we created as data). But, as a for-profit, its business model was not sustainable to generate the necessary capital returns and was subject to change—an evolution already underway. Since July, DALL-E 2 has transitioned from token-based to free, and has lifted its ban on selling generative artwork. Updated information sessions have been released to users, and the code of conduct has loosened, leading up to a public release of the generator, adding millions more users and generations. Though its “free” version may seem counterintuitive to profitability, as Google’s former design ethicist and Co-Founder of the Center for Humane Technology, Tristan Harris, states, “if you’re not paying for the product, then you are the product.” This precedent has never benefitted users.

The recent changes in DALL-E 2 present a few ethical problems. First, by becoming the product, collecting data on the artist’s brain’s deconstructing human experience as part of the organization’s ambitions to develop AGI is never forthrightly explained. Second, to generate work “in the style of” and then sell the work is ethically dubious and potentially fatal for artists and artists’ estates. The search capacity “in the style of” has dramatic implications regarding ownership and intellectual property, and the roughly 200 employees at OpenAI lack the human power to address this concern. If an artist owns a creation, do they own the creative derivative? How can artists protect their intellectual property if there is no correspondence between OpenAI and the artwork they’ve used? What kind of checks-and-balances system exists? Worse, what kind of behavior will emerge if partnerships begin?

head person face
“Giraffe Mustache.” Bill Powers.

"These technological advances allow us to bypass imagination by way of searching for the perfect image delivered right to one's desktop at the behest of an algorithm."

If you generate “A cat riding a monster in the style of Basquiat,“ OpenAI does not cut a check to the Basquiat estate. But what if they did? A commercial rights management layer could provide control and compensation to “In the style of” (ITSO), but this too may have unintended consequences. What does it take to establish ITSO? How narrow are the boundaries of the style? How large is the artistic style space? How many distinct styles could exist? Suppose individual styles are bounded broadly, to make the space of possible styles finite and limited. In that case, this could precipitate a mad rush to establish precedent and stake claim to a particular part of the style space. Artists may then have two distinct phases in their careers: First, they will need to generate enough novel work, have it ingested into the training set of Dall-E 2, and apply a singular style. Then the second phase of their career will be to monetize their style by using the system to generate new materials using their style vector, or allowing others to use it to create novel art and be compensated passively.

If ITSO can be bounded ethically, compensation will be necessary if DALL-E 2 implements a rights management system. Any financial approach will need artist consultation, of which there seems to be little, if any. Even in its current state, ITSO is being abused by users, leaving artists in the lurch. Artists should have a myriad of creative and commercial options if the ITSO feature persists, including setting usage fees or rationing off finite ITSO uses through auctions. Some artists may require an approval workflow, in which artists approve the artwork, and the user will pay a significant fee. Finally, an artist may restrict the use of ITSO to themselves and use it to scale their own output. This too presents its own set of problems.

finger person hand body part baby face head eating food
“A baby but a man eating a ham.” Austyn Weiner.

Artists using the technology to increase output isn’t an unfounded quandary, as gallery schedules and art fairs demand that artists constantly produce work. It’s an ugly and often undiscussed truth leading to a particularly bleak moment in art. Such pressure may lead artists to use DALL-E 2 to meet deadlines, despite the generator having no subjectivity, no living experience, and using underlying captions not pertinent to the work and, therefore, radically disconnected. Would there be any pushback? Any need for transparency? In a hot market, if we continue to put out what sells, does anyone ask the necessary questions? Although it may be a more distinctive approach to “searching,” the age-old practice of “finding” and “ruminating” will become increasingly rare. Indeed, the speed of accumulating a visual vocabulary is not a problem to be fixed but an issue needing deceleration to ensure art’s legacy.

Whereas the Master imagined a clock dripping off the table, we may only need to input it into Google, Pinterest, DALL-E 2, or worse, its free open-source shootoff, Stability Diffusion. These technological advances allow us to bypass imagination by way of searching for the perfect image delivered right to one’s desktop at the behest of an algorithm. This is despite an AI having no understanding of human experience, despite the programmer’s best attempts to replicate creative singularity (assuming that effort is prioritized in Silicon Valley’s ethos). It’s not contentious to assert that text-to-image generators aim to replace creation for a data-collection technique, meaning they’re not adhering to the ethical and ethereal boundaries art and humanity have institutionalized. With no ITSO protection, blindly contributing to AGI seems misleading and dangerous. In that sense, DALL-E 2 only amplifies an issue contributing to art’s current commercialized flop era, restricted to meeting gallery deadlines and lacking baseline respect for the creative process. The innovation is profoundly problematic and, therefore, should not be taken lightly, despite being incredibly fun. This last point is a notable crux in this argument because, as nihilistic as DALL-E 2 can make us, it’s an absolute hoot to play with.

Tags

Recommended posts for you