Leah’s ProducTea

Share this post

The Power of Training Language Models on Internal Data

www.leahtharin.com

The Power of Training Language Models on Internal Data

We are on the brink of solving one of the biggest problems in the digital age.

Leah Tharin
Mar 14
14
3
Share this post

The Power of Training Language Models on Internal Data

www.leahtharin.com

Getting rid of the absolute black hole that lost information is

“I cannot find my s***!” I heard this one over and over when investigating customers’ problems. It turned out to be one of the biggest opportunities which we had absolutely no solution for. Maybe we do now?

Let’s train a language model on internal data and let the magic begin

Share

After training your data you need to train your brain. The best way to do that is to subscribe to my weekly newsletter.

Put your data to work

Software and data sprawl has become a significant challenge for many organizations, as the rapid growth and adoption of numerous tools and applications create a complex web of technology. This sprawling landscape often leads to inefficiencies, duplicated efforts, and miscommunication.

We seem to be good at documenting our stuff and even more excellent in losing it after.

Industry spending on data-related costs is expected to increase, on average, by nearly 50 percent over 2019–21, versus 2016–18.
We are and have always been inefficient in managing data. Source

But what if we could combat this issue by training language models on internal data?

The benefits of training language models on internal data

  • Personalized support and assistance: With a thorough understanding of your data, language models can provide personalized support and guidance to employees, helping them navigate through complex information and complete tasks more efficiently. This is especially the case for new joiners who often have a completely different context than everyone else.

  • Streamlined knowledge management: By consolidating your knowledge into a single AI language model, you can eliminate the need for multiple tools and platforms, simplifying knowledge management and making it more accessible for everyone. Or treat it as.. “it really doesn’t matter where you store your data”, as long as we can scrape it.

    • A/B experiment databases can especially benefit from training language models on internal data. By incorporating experiment details, hypotheses, and results into the model, it can provide recommendations on experiment design and help identify potential pitfalls, saving time and effort for product and growth teams. The model can analyze past A/B tests and identify from their context whether similar hypotheses have already been tested. Experiment interpretation is another exciting possibility: by comparing them against prior experiments, provides data-driven insights that you might have missed otherwise.

  • Improved understanding of company-specific jargon: By training AI models on your internal data, they can better understand the unique terminology and acronyms used within your organization. This deeper understanding allows the model to provide more accurate and relevant responses.

    • I’ve just started to set up recently an internal Wiki since our area (ML end-to-end weather predictions are so inherently complex) I’d be more than happy if the model could have taken care of that.

  • Identifying through context conflicting information

    • It happens all the time, information is duplicated and not maintained. A trained model should be capable of extracting information from context and figuring out which ones are outdated.

Information types to train - give me thy lost souls

There is enough data around, it’s just spread everywhere:

  1. Internal communication: Emails (that one could be icky for privacy reasons), instant messages, and meeting transcripts can provide valuable insights into company culture, language, and collaboration patterns.

  2. Technical documentation: Training the model on internal technical documents, such as user manuals and API documentation, and then exposing it to customers is making onboarding to APIs as easy as it ever has been.

  3. Project / Product management data: Including information from project management tools, such as task descriptions, timelines, and progress updates.

  4. Knowledge base articles: Incorporating content from internal wikis and knowledge bases

  5. Customer support interactions: Training the model on customer support tickets, chat logs, and resolution notes can significantly drive down the time it takes to service customers and also increase the quality of the interaction assuming that 90% of the customers probably have the same problems.
    Add automation into the mix once an issue has been identified and you have a winner. It’s chatbots on steroids but I believe that the sweet spot will be with AI-assisted support agents.

Conclusion

The often-invisible opportunity cost is ever so present and I could see that the same happens with Slack-like tools. I had candidates ask me in interviews whether we use Slack or communicate by Email.

The motivation behind the question is that they value the easiness of communication with a Slack-like tool over the slog that Email represents. The same could happen in this case to finding and interacting with information in the future. It might become an argument for someone to join you because you just don’t deal with these ancient problems anymore.


Elena Verna
's Scoop

“Imagine a chatbot that can answer any question about your company's history, summarize the state of current projects, recap discussions, or surface common questions across the entire organization. This will be the next collaboration unlock and table stakes for every business”
More from Elena:

Elena's Growth Scoop

Leah’s ProducTea is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

3
Share this post

The Power of Training Language Models on Internal Data

www.leahtharin.com
Previous
Next
3 Comments
Dan
Mar 15Liked by Leah Tharin

Indeed, we've had multiple conversations around this. But, it's not entirely clear if once we put our data in, if it'll get spat out to a competitor. Also, as we're UK based, what about GDPR and the right to erasure.

So... since we're an MSoft company anyway, we're going to wait and see how MS responds with sharepoint intergration

Expand full comment
Reply
2 replies by Leah Tharin and others
2 more comments…
TopNewCommunity

No posts

Ready for more?

© 2023 Leah Tharin
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing