Building with open source large language models
Also: Visiting the AI Scene in New York, Exploring diverse industries, Consulting
Over the last month I spent some time in NY and dove into building with open LLMs. TLDR;
I implemented a couple of my previous projects using open LLMs running locally, inspired by the recent growing discourse around these models.
There’s a thriving AI scene in NY, my recommendations are the AI x UX and MLOps meetups.
I’m visiting Boston at the end of June, throwing a creative tech show & tell.
Since my last update, I’ve had dozens of conversations in a range of sectors to open up my head to more problems. This led to a direction around consulting too.
LFBuild 🚀🚀🚀
So far my text and voice based AI projects have been built using OpenAI’s Whisper, completion, and chat APIs. Over the last few months there’s been a burst of activity in the space of open source LLMs, starting with a bunch of projects built on Meta’s llama. Check out last month’s “Google "We Have No Moat, And Neither Does OpenAI"” to get a sense of the hype around open source and generative AI. Threads like this get me excited to see the future of open research and products in this space.
One thing I’ve been curious of is what it would be like to replace the models I’m using with open ones running locally.
The advantages include not having to send personal identifying info (PII) to a third party, and not having to pay for API usage. In thinking of business applications, keeping PII in house is a requirement.
The disadvantages are numerous. The models are slower, especially if I’m trying to run them from my laptop. Else, I have to set up a GPU server and run the models myself, which comes with its own challenges. While open models are improving, GPT-4 is still the state of the art.
I updated a couple of my past projects to use open LLMs, check it out:
Transcribing youtube videos with local Whisper
OpenAI already open-sourced Whisper, its speech to text and translation API. For my youtube transcript tool I use this API. Recent open-source projects have re-implemented Whisper with faster inference, including whisper.cpp. This includes the ability to run these models on the CPU as well, instead of needing the GPU. What this means is I can run the models locally on my laptop, instead of getting a GPU server. I used faster-whisper in python, and rewrote the youtube transcript tool to use Whisper. Check out the project on github here. It runs slower than using the API, but works! For me, I see it as a stepping stone to try out more efficient models, and think of other use cases where speed isn’t as crucial.
Building and AI chat with open LLMs
The space of open LLMs is developing rapidly, with so many new models and variations of them coming out that it’s dizzying. Check out this list here. Llama.cpp has been an incredible innovation recently in this space, it’s a port of Meta’s llama in C/C++ with faster inference.
I’m using llama.cpp with python via llama-cpp-python. When you use llama.cpp you must download a pre-trained model, and the question comes up — which one should you use?
Here’s where things get interesting. Every now and then you see beautiful things on the internet and this is one of them. If you go on TheBloke’s huggingface you see a ton of models you can download, and these are actively being updated. I wasn’t sure what to go with, so I sorted by “likes” to get a feel for which ones were used more. From there, I picked a model with fewer parameters, the 7B one, to start. I’m just running on my laptop after all. And then I picked a GGML which means faster inference, and in some cases being able to run the model on the CPU. I settled with this one at 7B and GGML, wizardLM-7B-GGML.
Now if you want to get an idea of what the deal is with a model, you can google it, and for most of these there will be a reddit thread. Here’s one for a similar model to the one I used, and here’s the most popular model from TheBloke and associated thread on reddit. What becomes apparent is that there’s this amazing collaboration between ehartford and TheBloke. Eric trains and makes the models available, and then theBloke makes them more efficient by quantizing them. This collaboration helps the open LLM community by supporting devs who are curious to run these models, folks like me! It’s fun to get a peek into a collaboration and community with all this enthusiasm and generosity.
Using llama-cpp-python and wizardLM-7B-GGML I wrote a command-line version of Gratitude GPT that uses llama. I’ve been wanting to use langchain for projects, which makes it particularly easy to switch out models. Check out the repo to see examples of llama-cpp-python, langchain, and the gratitude bot.
Visiting AI NY
I spent a few weeks in New York getting to know the AI scene in the city. It’s not just SF, happening here too. Any given week you can find a handful of events in the space. I would boil down the events to two types of meetups: networking/happy hours, and project sharing. The project sharing ones are more valuable and interesting because I’d meet other builders and could ask questions around how they solved specific problems I was coming across, or even just hear more about how they built what they’re sharing. Highlights:
AI x UX is a wonderful monthly meetup with a dozen or so quick presentations of projects, and afterwards you can talk to the speakers and see demos. You’ll see more experimental and creative interfaces. One thing I loved was walking around and seeing demos of projects that I’d seen viral tweets of, like this one. You could meet the makers right there and talk about their projects!
MLOps NYC hosted “Building with LLMs — Open House with AI Builders” at South Park Commons. The format was a science fair of genAI projects. I’ve often used this format for events, and I find it energizing. This is another monthly series I’d keep an eye on. A few areas I was curious about: genAI in medtech, and also how to think about fine tuning a model and creating an embedding for ‘storing’ information. In the context of personalizing a chatbot — fine tuning helps cultivate a particular style of chatting, and the embedding helps with retaining information. One builder also mentioned that SF is all b2b and NY is the place for consumer.
WordHack is a monthly meetup of creative tech projects at Wonderville. This isn’t specifically AI focused, but the first talk by Angie Waller was and it was fantastic. Think playful, personal, expressive, and critical projects built on genAI. I haven’t seen many projects like these, and I hope to see more. Of all the events I went to, I would come back to this every month. And it was also my first time at Wonderville, which is one of my favorite bars now.
Conversations on Problem Discovery
One priority over the last month was to have more conversations about problems in different sectors. Most of my experience is in media and tech, and I wanted to open up my mind to other industries. These led to dozens of conversations in medical admin, pharma, real-estate, climate tech, sales, NGO initiatives, and community development. In some cases this included also jamming on genAI possibilities. Some broad insights:
Most businesses in the current economy are focusing on cost reduction. If you’re looking to build something for businesses, it’s highly likely you’ll have more success focusing on cost-reduction instead of value add. On the other hand, since most businesses are focusing on reducing costs, if you have the runway, it can be a great time to innovate and have less competition.
Enterprise can seem attractive when looking at b2b but generating leads and building relationships there is challenging. An advantage with b2c is it’s much easier to reach your customers and start talking to them.
Some problems that came up feel like looming waves — underserved areas that are growing in severity. Here’s one from the health sector. The generation of baby boomers is reaching an age where they need an increasing amount of care. The number of people who can offer case is going down. Burnout for care workers in this industry is high, and many caregivers are unpaid. How might we provide better care for elder populations? How might we better serve caregivers?
Consulting came up in a few conversations to put to practice what I’m learning and also go deep on a range of problems. I dipped my toes into it this past month, and I’m interested in balancing that work with building my own projects, in a way that both support each other.
What’s Next
I’m visiting Boston at the end of June, throwing a creative tech show and tell and likely a generative AI meetup/hackday. More info to come, but get in touch if either of these sound interesting to you.
My plan this year was to spend the first half building projects and the next months picking one to go deeper on. I don’t feel conviction on a particular area, so I’ll continue to have conversations and build until I find a project that compels me. I’m beginning to consult to see how it is to work alongside these side projects.
CMU is hosting a series of genAI talks called “Generative AI Innovation Incubator” through June, July, and August. These include tutorials and hackathons. What I love in particular is the range of industries explored in the context of genAI — including education, future of work, finance, econ, public health, and speculative fiction. I’m planning on checking out a few of these. Let me know if any jump out to you too!
Thanks for reading! You can follow what I’m up to by subscribing here. If you know anyone that would find this post interesting, I’d really appreciate it if you forward it to them! And if you’d like to jam more on any of this, I’d love to chat here or on twitter.