Building an AI drawing app, finding valuable problems to work on, hosting a creative tech show & tell

Recent projects in the generative AI space

Mar 08, 2023

Here’s a recap of thoughts and things I’ve made over the last month. I’m interested in sharing what I’ve learned, and also excited to hear about ideas, conversations, and opportunities related to what I’m working on. If you know folks who would be into this, I’d appreciate it if you share this with them!

TLDR;
- What are the valuable problems to work on leveraging generative AI?
- Doodle & diffuse is a new AI collaborative drawing prototype I made. I’d love for you to see it. Watch the 30s demo, or the more detailed one.
- I’m currently in SF and planning a generative AI show & tell in the next couple weeks.

Doodle & diffuse: collaborative drawing with AI

Kicking off with an idea, and an ask — something that’s been on my mind recently, followed by an inquiry where I’d love to hear your thoughts.

An idea: Get into problem discovery.

When exploring new technologies it’s exciting to start with the tools and see what’s possible. It’s fun to wrap your head around something and push it in different directions. That’s happening in the genAI space, especially with GPT-3 and the new ChatGPT API. And now also with Whisper you can include additional features that leverage audio to text. Possibilities open up like taking a youtube video and transcribing it (using the Whisper API), finding the sentiment, and summarizing it (using ChatGPT).

When a ton of people are taking this approach with the same tools, many of the ideas begin to feel similar. At that point the game-changer becomes making the most compelling experience, and either way, the space is pretty saturated.

What’s overlooked is the connection to other industries. The biggest value will come from collaborating with folks outside the software space who are AI-curious. There are problems out there that these tools are well suited for, but the connections haven’t been made yet.

So there’s a balance. On one hand start with the tool and explore it, understand it, and see what’s possible. And on the other hand, start with people and industries and research, see what problems are out there, have more conversations, and see where there’s unique connections that few are exploring. Those intersections will be far less explored, less saturated, and you can make something uniquely valuable.

An ask: How do you find valuable problems?

You talk to people and see what frustrates them. But the who here is important. How do you get outside your bubble, and grow an awareness of the wide variety of industries and problems out there? You can do market research and read a ton of articles, finding common threads. Curious how you’ve approached this! If you’ve started a company, what was the path in which you found the problem to focus on? Reply with ideas, stories, musings, all of the above.

For some background, I started focusing on problem discovery after talking about creating value with the Creative Coach AI. And since then I’ve had a bunch of fascinating conversations with friends on how they stumbled on valuable problems.

Building things

Doodle & diffuse

Of all the projects this month, I’m most excited about this one coming together. You can collaboratively draw with AI, by doodling and adding successive prompts, sending and receiving back generated images. Check out the quick 30s demo here, or watch the detailed demo here.

It’s the most beautiful feeling to go from something that’s just an idea to making it real. There’s a rush in it. I just want to keep wondering and making. Then there’s little surprises too, like I always imagined the process would be I submit a doodle and get back an image. But the way this app works, I could then doodle on the returned image and resubmit it. That creates a pretty cool back and forth with the AI. I found myself using the following workflow:

Generate a background I’d like with a prompt and no input image.
Draw on the background, and then describe what I drew as the prompt.
Generate an image with the input image being my doodle on the background, alongside the prompt.

The key here is that I didn’t have to start with a blank page. I could generate a background, and begin with a context!

The process to make it looked like this:

Idea. Can I doodle something and then get an image of the thing? I want to draw an apple and then get a photorealistic one. I love drawing and this would be awesome.
Research. I read a bunch of posts and tutorials seeing what tools are out there, understanding some of the technology behind them.
Notebook. When I found stable diffusion img2img, I set out to get it working in a colab notebook (that’s my current method to access a GPU).
Server. Then I got stable diffusion working on a server, where I could send images and receive processed ones.
App. Finally, I built a frontend with React and P5 to draw on the input image for the image transform.

There were a bunch of smaller projects on the way to building this.

I generated a bunch of sea creatures with my nephew, using one of his drawings as the input. Read more about it in “Using doodles and diffusers to expand your imagination.” Here’s a video varying the strength of the image generation (it’s kind of spooky), and another creating a bunch of monsters at constant strength.
Exploring img2img more, I was curious what would happen with the same image and different prompts in “Image input and prompt explorations with stable diffusion img2img”
While building the app, I found I could rerun stable diffusion on the output image, by using it as the input image in the next round. I played a drawing game with this, resulting in a video warping across different subjects. You can see all the prompts in this post.

Everything chatbots

Interacting with the Creative Coach AI I made last month helped me get started on projects, and generally got me unblocked when I needed it. For the most part, the advice from Creative Coach AI was intuitive, but it was shared at the right moment when I needed it. It was a reminder of information I already knew, but I just wasn’t thinking about — a gentle reframing. I started to note other patterns of thinking I was interested in reframing, because it was clear that talking to the chatbots could help support certain behavior changes. I made a few more chatbots based on behaviors I want to internalize:

Gratitude bot helps me be more aware of things to be thankful for.
Enthusiasm bot helps me get excited about things I want to do but feel hesitant about.
Meditation bot helps me chat and settle my thoughts and become more mindful.

I updated the bots to use streaming, which made them feel much faster, and displayed a video of several discussions with them at an art gallery in DC.

What’s next

Show & tell. I’m currently in San Francisco and in the works planning a generative AI show & tell in a couple weeks. Think happy hour / science fair / interactive art show. The energy around AI here is incredible. This past weekend I attended a ChatGPT hackathon and was invited to judge, seeing a range of prototypes and pitches in the space.
Chatbot updates. The release of the new ChatGPT and Whisper APIs means updating my chatbots! I want them all to stream the ChatGPT API, and I’m curious to test out Whisper and see how that feels.
Drawing updates. ControlNet is a tool that helps control the output of diffusion models like Stable Diffusion. It has a scribble2image model I’d love to dive into.
User Interviews. I’d like to have more conversations to discover valuable problems to jump into for my next prototypes.

Thanks for reading! You can follow what I’m up to by subscribing here. If you know anyone that would find it interesting, I’d really appreciate it if you forward it to them! And if you’d like to jam more on any of this, I’d love to chat here or on twitter.

Kawandeep’s Newsletter

Discussion about this post