How I used GPT-4 to code an idea into to a working prototype
Natural language interfaces for programming will make app building more welcoming for more people
I’m trying out something new with this post. This is about a specific project I worked on over the last week. I’m testing out the idea of publishing the newsletter more frequently, while keeping with monthly recaps at the start of the month. Mostly I’m curious how it’ll feel moving my frequent posts from Medium to here.
TLDR;
I used GPT-4 to code a command line tool that summarizes any web page. It felt wonderful to collaborate with AI like this.
Building apps by describing what you need, rather than learning how to code, could allow more people from more fields to build tools they need. This is inspiring! I’d love to explore this tool building process as a workshop with participants across industries to see what ideas come up.
Since GPT-4 launched I've seen a ton of twitter threads about writing apps and tools with GPT-4. If you haven’t seen them yet, you can find a few examples at the end of this post. Naturally, I wanted to try this out too!
So… what should I use GPT-4 to make? From my approach on making things — just build something and then more ideas will come. First thing I want to make is a command line tool to get a summary of a web page given a URL (h/t blake for getting me into this). Being one of the popular use cases of GPT, summarization is not a particularly novel idea, but I love the thought of doing internet things with GPT, and this is simple enough to start.
My prompt looks like this:
Give me the code for a command line tool that can do the following: given the input of a URL, output a short summary of the given web page. use the openAI chat API to do the summarization.
I add some example code from the chat completion API knowing that the models don’t have this post-2021 information.
Using AI to write the code
First I try GPT-3.5. Here's what I get. It’s using requests to download the web page, then Beautiful Soup to extract the text. Makes sense! When I run the program, it actually throws back an error. Interesting note: I respond that it should use gpt-3.5-turbo and it said that the model didn’t exist, and kept “davinci” even after I shared info about it. I replace the model before running the code.
Then I try GPT-4. Here’s an excerpt of what came out. It’s interesting to note a few things about the code. The quality is better. The code is broken into smaller functions and is run through a main function. This is a clearer way to write this program. It also works on first run! For programming, GPT-4 works far better than GPT-3.5.
How does each model compare for summarization? Here’s some output running with gpt-3.5-turbo and gpt-4 for this article.
GPT-3.5
The article discusses how quickly the field of generative AI is advancing, allowing people to create deepfakes of themselves with just a photograph and 60 seconds of audio. The author provides instructions on how to create a virtual clone of yourself using various AI tools, including generating a script, generating a voice, and creating a video. The author acknowledges the potential for negative uses of the technology and the need for caution in trusting video or audio recordings. However, the author also notes the potential positive uses of creating realistic AI-run avatars for customer support or tutoring.
GPT-4
This text discusses the rapid advancement of generative AI in creating deepfakes, which are now possible to make with just a photograph and 60 seconds of audio. The author demonstrates this by creating a fake version of himself giving an AI-generated talk about entrepreneurship. Concerns are raised about the implications of this technology, particularly in terms of trusting audio and video recordings. However, there are potential positive uses, such as AI-based customer support agents and personal tutors.
GPT-4 is a higher quality model but GPT-3.5 is cheaper. Here the differences feel negligible. For summarization, GPT-3.5 is good enough.
Iterating with GPT-4
I try the summarization tool on the plugin announcement from OpenAI, which fails to run because the article goes over the ~4K token limit. I put this question to GPT-4.
My updated prompt includes this followed by the code for my tool as it is:
The following is a command line tool that can output a short summary of the given web page given a URL. It uses the openAI chat API to do the summarization.
Please update the code to include a way to manage input with a large number of tokens. Do this by being efficient by what text is extracted from a webpage from beautiful soup. ignore irrelevant items on the webpage like the header, footer, menu items, and sidebar. focus on the main text of a website. Ensure that the number of tokens of submitted text is less than 3500. Find a way to manage using few tokens so API costs are lower, while using enough to have a useful and strong summary.
GPT-4 shares back the updated code, and shares a summary of the changes it made:
I've made the following changes:
1. In the `extract_text()` function, I'm now removing irrelevant elements such as headers, footers, navigations, sidebars, and scripts/styles from the HTML.
2. The program then tries to find the main content within `<article>` or `<main>` tags.
3. If there's no `<article>` or `<main>` tag, it uses the entire webpage content.
4. Truncate the text before summarization if it's greater than 3500 tokens by slicing it up to the token limit. This ensures that the submitted text doesn't exceed the model's limit while still providing a summary of the main content.
Really the last line of the prompt is the most important: “Find a way to manage using few tokens so API costs are lower, while using enough to have a useful and strong summary.” But it’s not really captured in the output. The result misses important information that might show up after 3500 words.
Let’s try another approach. I ask it to split the text into smaller chunks, run the summary on each chunk, and then run a summary across all of the chunks. It gave me all that. Very cool! So giving some description, I can get the code.
The following is a command line tool that can output a short summary of the given web page given a URL. It uses the openAI chat API to do the summarization.
Please update the code to include a way to manage input with a large number of tokens. Do this by being efficient by what text is extracted from a webpage from beautiful soup. ignore irrelevant items on the webpage like the header, footer, menu items, and sidebar. focus on the main text of a website. Split this text into sections of 3000 tokens and find a summary of each using the openai chat api. Finally take all of the summary parts and make a summary of those.
This works on the first try! And the result looks great. Here’s the summary of the plugin announcement:
The article discusses ChatGPT's plans to gradually roll out plugins that provide language models with up-to-date information and safe, constrained actions. Plugins are designed to increase the usefulness of language models and tackle challenges associated with them. The text discusses the potential risks associated with plugins, as well as the safeguards that ChatGPT has put in place to mitigate these risks. The article also mentions various updates and features of the ChatGPT model, including a web browsing plugin, a code interpreter alpha, and the use of AI-generated code in a secured environment. Developers can create their own plugins and register them with ChatGPT, and OpenAI is working to develop more plugins for a broader audience.
From here I can tweak things in the script. In general, I love this approach to making things and learning — get something working, make changes, see what changes, develop an understanding. GPT-4 makes this method of learning much easier by giving a place to start with code.
Now that I have this tool, all sorts of ideas come to mind. This is the part I love! build something and more ideas come :D
Is there a better way to handle summaries for larger texts? Perhaps I can provide additional information to GPT-4 that would help it come up with a few options to address this token max problem.
I find myself using the tool now when I see an article is longer than a couple minutes. The tool makes me think of new ways of reading. Reading now feels like looking at a map, zooming in and out. Right now I have the birds-eye view and then the ground itself. What if I had summaries of different resolution I could easily bounce between. I could zoom into specific parts of interest, instead of reading the whole text in fine detail.
The summarization takes some time to run. Maybe I could connect this to a twitter bot where people can include the bot in a conversation and the bot will add a threaded summary. Or I can have the tool email the summary when it’s ready. Or hook it into a bookmarks manager to run for anything I bookmark.
Making building feel more welcoming
If you can prompt and quickly get usable code — you can imagine prompting to create apps. You can easily make apps customized for yourself. You can make them lasting, or just for the moment. What frustrations do you have? What do you want? Spin up an app that does it for you. When I made a bunch of chatbots I started working on a tool to quickly deploy a chatbot from a prompt. Now you can imagine asking GPT-4 to write you a chat AI following a specific prompt, and then use a ChatGPT plugin to deploy it.
GPT-4 makes it much faster to go from an idea to prototype to iterations. What I’m most excited about this is making app building more welcoming for more people. Programming and deploying apps boils down to describing what you want. Instead of searching stackoverflow and then blog posts, I have a custom guide with code written for me by GPT-4. Now imagine all of the niche frustrations that can finally be addressed! Imagine all of the otherwise overlooked problems that can now have a solution. I get amped thinking about running workshops on building and iterating on a working prototype, bringing folks from all different fields who have ideas for tools they’d like to exist, and seeing these ideas become real, functioning tools.
Additional References
The summarization tool is called tldr-cl, you can find the code here.
Examples using GPT-4 to build things! re-creating pong and snake, building an iOS game, an iOS movie app, iOS graphics + animation , voting web app, writing microservices.
I’m interested in sharing what I’ve built and learned, and also excited to hear about ideas, conversations, and opportunities related to what I’m working on. If you know folks who would be into this, I’d appreciate it if you share this with them!