Is the AI Tool the Problem, or Is It Your Prompt? 6 Questions to Ask

This summer, Daniel Hook discovered an interesting issue with Midjourney V5.2. When trying to generate an image of a single banana, an image he intended to send to a friend as a joke, he discovered that Midjourney refused to generate anything other than bunches of bananas.

Daniel’s blog post was picked up by a team of researchers from the University of Sydney Business School, whose paper What the Lone Banana Problem reveals about the Nature of Generative AI was presented at the Australasian Conference on Information Systems in New Zealand. Quite the fancy place for a conversation about a banana intended for a joke.

Their paper has a great way of thinking about how these AI models encode the world: we should think of these tools as categorizing the whole world within styles, including all objects, properties, and appearances.

But that's not what caught my attention. In the paper, they tried improving on Daniel’s original prompt with "a banana," "photo of a single banana," and this doozy, "photo of a single banana, just one banana, not many, ONE" instead of Hook's original prompt.

My first thought: Maybe the problem isn't the AI, maybe the problem is your prompt.

The Lone Banana Prompt Problem

This is what I call the "The Lone Banana Prompt Problem." There are so many different AI models out there, with updates coming out almost constantly. When you're having problems with a tool, how do you know whether there’s an issue with the model or there's an issue with your prompt?

To illustrate: After the release of Midjourney's Alpha of their V6 model just six months later, Daniel’s original prompt of "A single banana casting a shadow on a grey background" pretty consistently generates an image of a single banana, while "photo of a single banana, just one banana, not many, ONE" still struggles with the ask.

Here’s what generates from "A single banana casting a shadow on a grey background."

Picture of "a single banana casting a shadow on a grey background" generated by Midjourney V6. — "A single banana casting a shadow on a grey background" in Midjourney V6.

And here’s "photo of a single banana, just one banana, not many, ONE." It still doesn't work well in the new and improved version of Midjourney.

"Photo of a single banana, just one banana, not many, ONE" generated by Midjourney V6. — "Photo of a single banana, just one banana, not many, ONE" in Midjourney V6.

Why does the Lone Banana Problem happen?

If you're interested in how AI works, it's worth it to take a little detour and learn about the Lone Banana Problem.

The paper's authors shared Daniel’s surprise at Midjourney's struggle to generate the lone banana:

"... the lone banana problem appears as a surprising problem. A single banana should be the simple case, from which more complex cases are combined, e.g. by adding more bananas to make bunches."

But, they say, maybe we misunderstand how AI models see our world.

"What if these networks did not encode any understanding of objects at all? What if they had a very alien way of appropriating our world, a way that does not have a concept of objects, or a distinction between objects and their properties? And what if, similarly, LLMs did not encode knowledge in ways we as humans conceptually make sense of the world? What if these networks did something not comparable at all to how we understand the world?"

(If you’re surprised that an academic paper is asking questions about how LLMs might work rather than explaining how they work, well, that’s AI for you. It’s a black box; even the people who design these systems don’t know precisely how they arrive at their answers.)

If you think about AI in that way, the Lone Banana Problem makes sense:

"Neural networks do not possess an object ontology; this is not how ‘their world’ is encoded, they do not possess the capacity to grasp the concept of an object or a single banana as such; it encodes banana-ness. But if banana-ness is taken to be a style, instead of an object, Midjourney’s output makes more sense. Average banana-ness would then be bunches of banana, as this is how ‘banana’ mostly occurs."

I love this conclusion. It's something I talked about during my early experiments in computer vision, and worth really thinking about.

How do I solve the Lone Banana Prompt Problem? 6 questions to ask

But on to the Lone Banana Prompt Problem: Is it me or is it the AI that's the issue?

Here's how to make that call.

1. Are you following best practices?

It's almost always better to follow prompt best practices when you are working with any AI tools. Most of the time, the tool will have a FAQ that will tell you how to structure your prompt so that you get the results you want. Most tools also have communities that discuss best practices and tips.

There are also some general rules to follow, regardless of the tool:

Be clear in your request
Be as succinct as possible
Be explicit in any constraints
Add any context that would matter in the result
Try to avoid negatives (like asking for “no [something]”), especially in image generation (Large language models seem to deal with this quirk better)

2. Are you using the right tool?

I recently wrote about what ChatGPT's new image analysis features can and can't do, and my experiments showed that sometimes, other AI analysis tools perform better at tasks like image recognition.

Every AI tool has its own quirks. Take the time to read up on its limitations and functionality, and have a look to see what data it's trained on. If you’re struggling with getting what you want, try a different tool. And keep in mind that sometimes AI just isn’t right for a particular task.

3. Are you iterating with the tool?

Don't give up! If you're not getting the results you want right away, keep requesting revisions and try different ways of phrasing your request.

Tip: For language generation tools, you can also ask the tool itself to generate questions to help you refine your prompt. But be aware that sometimes the tool's suggestions don't align with best practices.

4. Do you have the right vocabulary?

Especially with image generation tools, the specific terms you use matter a lot. It pays to learn artistic terminology. This is even true for language tools, where being specific about the tone or style you want it to write in will help you get something you can more easily work with.

5. Are you asking for something obscure?

I'm a math nerd so I wanted to see what would happen when I asked the AI image generator tools to generate an image of a zenzizenzizenzic—an archaic word that means "to the eighth power."

Midjourney gave me something delightfully abstract—it had nothing to do with math, but looked cool.

Meanwhile, ChatGPT and DALL-E did capture the concept. Why? Because ChatGPT used its language magic to come up with a descriptive prompt:

"A visually abstract representation of the mathematical concept of zenzizenzizenzic, which is the eighth power of a number. The image should depict a number 2 transforming into a complex, multi-layered geometric structure, symbolizing its progression to 2 to the eighth power (256). The structure should be intricate, with cubes and other geometric shapes, and have a digital, mathematical aesthetic. Colors should be vibrant, with shades of blue, green, and purple, adding to the abstract and surreal visual interpretation of this mathematical concept."

Zenzizenzizenzic generated by Midjourney — Midjourney had no idea what a zenzizenzizenzic was, so gave me something random.

Zenzizenzizenzic generated by ChatGPT and DALL-E — ChatGPT + DALL-E's conception of a zenzizenzizenzic. I'm making a t-shirt out of that.

The point is that the farther you get outside the realm of the normal, the more scarce the relevant data will be in the training set, and the more randomness you will introduce in the output. So if you're asking for something way outside the norm, be prepared to add context or examples, like what ChatGPT did for the prompt that generated this image.

6. Is your prompt too complicated?

Whenever I see a prompt that's more than 4 lines long, I worry. If you make your prompt too complicated, like asking several questions at once or giving the model too much information to reliably parse through, you're going to get into trouble. Clear and concise is almost always the best way to go. See if you can simplify your prompt without losing its essence.

7. Is your prompt too simple?

You do need enough information for the tool to work, so make sure you add any necessary contextual information or constraints. If your prompt is too simple, the tools will default to their averages because they’re built on probability, so you’ll probably get whatever that average is.

For example, Midjourney loves mountains and rolling hills, so if you just ask for a landscape it will give you that.

landscape generated by Midjourney — What Midjourney generates from the prompt "landscape."

8. Is there a reason the AI might be bad at this?

While it's easy to feel like AI can do anything, it’s good to remember that competence is sometimes an illusion.

For instance, AI language tools still struggle with math and logic (but they’re getting better fast). Consider whether there's a reason the tool might not be able to do what you’re asking, like a biased training set (this is why Midjourney is the king of photorealism and DALL-E is best for abstract patterns) or limitations of the tool (like ChatGPT's smaller context window).

Sometimes, it really is the AI tool

A couple years ago, Midjourney founder David Holz caused a minor sensation in the AI image generation community when asked whether they would fix the "finger issue" in Midjourney—the fact that the tool constantly struggled to put the right number of fingers on a human hand. In essence, he told the audience, "No, that issue will fix itself with more training data."

Sometimes, you’ve just got to wait for something better to come out. Either another tool will become available, or there will be improvements on the tool's current offerings.

(We're still waiting.)

Is the AI tool the problem, or is it your prompt? 6 questions to ask

Is the AI tool the problem, or is it your prompt? 6 questions to ask

What type of content do you primarily create?

What type of content do you primarily create?

The Lone Banana Prompt Problem

Why does the Lone Banana Problem happen?