ModelMetrics.ai demo video: It's 10PM, do you know what your AI app is doing?
The monitoring + optimization platform every AI-powered software will need
[Disclaimer: This post is best enjoyed by founders, product managers, and engineers at software companies that are integrating AI-powered actions into their apps. I’ll get back to posting general content soon.]
Why we built this
We’ve built 6 AI-powered apps since starting our Magnetic Ventures AI venture studio, and we’ve definitely taken shortcuts:
We hardcoded the OpenAI prompts into our source code or dumped them into JSON files in the git repo.
We wrote the prompts in OpenAI’s Playground… tested them against one or two sample user inputs… and then just deployed them, hoping the prompts would apply to most user data.
We deployed the prompt to production, waited for users to email us with issues, and then reviewed OpenAI’s GPT completions looking for problem patterns.
We’re not alone. Many coders, founders, and product teams rushed to implement AI projects.
AI was magic dust. We could sprinkle it around to the awe and amazement of users and investors alike. The AI features didn’t need to perform perfectly… they just needed a “wow” factor to get press coverage and organic shares.
AI has matured from magic fairy dust to stable + robust actions
Today, users rightfully expect AI features to work consistently. Product managers want visibility into the performance and stability of their prompts. CFOs want to understand why their monthly OpenAI bill has ballooned to $35,000 (and what they can do to reduce the bleed).
The entire team wants to optimize generative AI features based on real-user feedback.
The vision for ModelMetrics.ai
Above is the video walkthrough of ModelMetrics: the central platform to design, manage, monitor, and optimize Large Language Model actions.
We have opened the Prompt Designer feature free of charge for everyone. You can write a prompt with placeholders for user input and then run the prompt against any number of test cases. Then accept/reject the outputs, update the prompts and model settings, and try again. It’s a handy way to iterate towards the perfect prompt for your LLM action. Go to modelmetrics.ai to try it completely for free (you’ll need to add your OpenAI API key to run the tests on your prompts).
The video also shows our vision for analytics and optimization. Reach out if you’d like to incorporate this robust platform to manage AI actions for your team.
p.s. Huge shout-out to Ian, the lead engineer on this project… who helped tame our wild wireframes and turn them into a beautiful app.