Member-only story

PART 1: How to enhance your fine-tuning journey using GPT-4 and BARThez for summarization?

8 min readJun 16, 2023

In the bustling world of the 21st century, the information highway has led us to a crossroads where time has become the most valuable commodity. Emails, one of the chief modes of communication in this digital age, often take a toll on this precious resource. For professionals dealing with hundreds of emails every day, the task of parsing through them can be daunting, time-consuming, and frequently counterproductive. What if there was a way to streamline this process, making it more efficient and less time-intensive?

GPT-4 seems to be good for summarizing. However, this model is not open-source, and it could lead to high bills using it daily. A cool solution to this problem could be open-source models, using GPT-4 as a training set generator.

In this article, we’re gonna try to fix these problems:

Is GPT-4 suitable to create a fine-tuning dataset?
Is it worth it to fine-tune an already fine-tuned model?

Goals

Use Gmail API to retreive emails
Generate training set using GPT-4 API
Summarize French emails by Fine Tuning BARThez

PART 1: How to enhance your fine-tuning journey using GPT-4 and BARThez for summarization?

Goals

Requirements

Create an account to read the full story.

Written by Pydathon

Responses (1)

More from Pydathon

How to hide (almost) perfectly selenium to avoid Captcha and master web scraping.

Go further in your practice of web scraping

How to deploy your Pytorch model: Scaleway Cloud VS VPS

Everything is ready, and your model is shining, but now you want to share it.

How to build your own virtual assistant with SiriControl?

I have always been interested in IoT and virtual assistants and dreamed about having mine. I recently discovered SiriControl and…

Recommended from Medium

AI Agent: Workflow vs Agent (Part-5)

Discover AI agents, their design, and real-world applications.

Fine Tune Large Language Model (LLM) on a Custom Dataset with QLoRA

The field of natural language processing has been revolutionized by large language models (LLMs), which showcase advanced capabilities and…

Don’t Sell AI Agents, Sell AI Infrastructures Instead — The Billion-Dollar Opportunity

The AI Mirage — And the Fortune Few See Coming

Testing 18 RAG Techniques to Find the Best

crag, HyDE, fusion and more!

You’re Doing RAG Wrong: How to Fix Retrieval-Augmented Generation for Local LLMs

How To Set Up RAG Locally, Avoid Common Issues, and Improve RAG Retrieval Accuracy.

OpenAI’s New Tools vs. Manus and OpenManus

How OpenAI’s Responses API and agent tools compare to emerging agentic AI solutions and their open-source alternatives