Using Midjourney's Describe Feature
Welcome to our  creative studio’s journal and response to the meteoric collision of AI tools into their commercial creative workflow.
This week we’re in the sandbox with the text-to-image generation tool, Midjourney. We’ll be sharing other experiments and learnings that are helpful for creatives to navigate how AI can be used as a creative tool and perhaps begin to drift away from the doom and gloom of AI potentially replacing all of our jobs in the near future.

A photo of Sasha the dog, being described by Midjourney v4
This week was dedicated to exploring the “/describe” feature in Midjourney, and I’m realizing that this was perhaps the large language model's (LLM) most underrated tool when it came to better understanding and directing its output more succinctly.
Let’s look at how Midjourney describes one of my favorite design photos from the 1990s. I uploaded a studio photograph of a 1996 VW Harlequin Golf. It’s a real car that was made for silly reasons. Here’s the description it produced:

Midjourney Bot describing a studio portrait of a Harlequin model of VW’s 1996 Golf
Midjourney evaluates this image differently than you or I might; the model believes that this image of a multicolored car pulls from the same visual realm as Venetian 1500s painter Giorgione, 1600s Dutch painter Hendrick Corneliz Vroom, and simultaneously (and slightly more understandably) contemporary painter Frank Stella.

Giorgione, "Laura" (1506), oil on wood (Kunsthistorisches Museum, Vienna/Bridgeman Images)

Dutch Ships Ramming Spanish Galleys off the Flemish Coast in October 1602, 1617 (Wikicommons)

Chocorua by Frank Stella (Whitney Museum of Art)
Midjourney describes the image of the car as having the same lighting qualities as Venetian paintings while depicting what it thinks is a Neo-Dadaist artifact—a color-blocked VW Golf.
Midjourney’s functionality also makes it easy to “/imagine” the descriptions of the images into newly generated images. Below, you can see what is generated when the prompts are fed back into Midjourney.
Images generated by Midjourney v5, generated from the described prompts of the original image of the VW Harlequin Golf.

prompt: “a multicolored vw golf that is displayed, in the style of neo-geo minimalism, trompe-l'oeil technique, bold block prints, cinquecento --ar 2:1 --v 5 --s 250”

prompt: “a multicolored vw golf that is displayed, in the style of neo-geo minimalism, trompe-l'oeil technique, bold block prints, cinquecento --ar 2:1 --v 5 --s 250”

prompt: “volkswagen golf the colorful car, in the style of 1990s, color-blocked shapes, frank stella, furaffinity, giorgione, staining, neo-dada --ar 2:1 --v 5 --s 250”
While there is a drastic style difference from the original photo, the images retain a comparable lighting style, and it’s undeniably a 1990s-era VW Golf.
As I delved deeper into this AI experiment, I began to appreciate the vast potential lying within the "/describe" feature. It was not only a tool for superficially labeling or categorizing images; rather, it had the capacity to explore the intricate web of visual influences that shape our perception.
This ability is media communication gold for creative directors who pride themselves on knowing their references. It’s a special skill to be able to quickly shape and communicate ideas for themselves and for clients. Knowing your creative history provides a roadmap for navigating the labyrinth of cultural symbols and visual cues that bombard us daily. I welcome this set of well-trained and ambivalent eyes that are drawing my visual research to new places.
Further, by using the /describe feature in this way, over time, we begin to glean a better understanding of what kind of syntax structure Midjourney prefers, and repetitive keywords across the multiple descriptions it outputs make it clear as to the obvious visual categories it’s been trained on.

Running a recursion test, feeding an image generated by Midjourney back into Midjourney for it to describe.
FWIW, and a small grain of salt
This process is still incredibly imprecise. It’s a gamble, even with the most descriptive keywords, when you’re working with text-to-image generation.
Even with AI's incredible strides, it's human input that continues to fuel these advancements. We painstakingly provide AI models with a wealth of labeled images and data. Whether it's Midjourney, Runway, or StableDiffusion, all these platforms rely heavily on the troves of images and detailed descriptions we supply for their training.
Recursive prompt: “polo vw toy by frank bode idontcare, in the style of illusory tessellations, 1970–present, bold palette, 32k uhd, iridescent, punctured canvases, dramatic diagonals --ar 2:1 --v 5.1”
These AI models are quite adept at picking up patterns, colors, textures, and shapes, thanks to the labels we've associated with these elements. This acquired proficiency is what powers their impressive ability to generate new media assets. But let's not be mistaken; this process isn't entirely automated. It involves continuous fine-tuning and adjustments by human researchers to ensure each platform is at its optimum performance level.
It's tempting to think that as AI sophistication grows and platforms like Midjourney become more adept, the demand for direct human intervention will diminish. However, this assumption might be oversimplifying things. It's likely that the feedback loop phase – that intricate dance between AI and humans – will persist longer than we currently anticipate. The technology hasn't yet reached a user base that makes it universally accessible or appealing. So, the human element in AI generation is, for the foreseeable future, here to stay.
If you're looking to get started with Midjourney for the first time - we've put together a little primer that caters to folks completely new to the platform and and how to get started.
 
            
          
        
Leave a comment
Please note, comments must be approved before they are published