Skip Go Content
AI translate this article from Japanese
Read in Japanese
This article dey for Public Domain (CC0). Feel free to use am anyhow you like. CC0 1.0 Universal

Machine Learning Wey Dem Dey Use Natural Language Do

Machine learning wey dem dey do before-before, na say computers, wey sabi calculate numbers well-well, go learn from numbers dem, and dem go get numerical parameters.

But for we side, we fit learn not only with numbers, but also with language. We dey arrange and write down wetin we don experience as words, then we dey remember or read those words to use them.

Large language models fit describe knowledge with words like that too, and dem fit use words by reading them.

If we use large language models, wey be natural language processors, then machine learning wey dey based on natural language go fit happen, instead of just machine learning wey dey based on numbers.

Because of this, since large language models don show face, e don open new area: natural language machine learning.

The pre-training of large language models na the old-school numerical machine learning. The natural language machine learning wey dem dey talk about here na new kind of machine learning wey dey use large language models wey dem don train already.

Natural Language Machine Learning Basic Model

Natural language machine learning get things wey resemble and things wey no resemble traditional numerical machine learning at all.

First, to make una understand wetin natural language machine learning be like, we go explain the parts wey resemble traditional numerical machine learning as a basic model.

From now on, we go dey call pre-trained large language model "LLM". Make una note say the parameters of the LLM no go change at all during this learning process.

The basic model na supervised learning, wey dey target classification problem.

Dem go prepare plenty pairs of input sentences and their classifications as correct answers for training data.

For example, make we say one company get General Affairs Department and Administrative Affairs Department.

These two departments get how dem divide work. For input sentences like "The office light bulb is out," "I forgot my access card," or "I want to reserve the main hall at headquarters," the classification go show whether na General Affairs Department or Administrative Affairs Department dey responsible.

From this training data, dem go only remove the input sentences and put them into the LLM.

Here, as a system prompt, we go intentionally limit the answer by telling am, "Abeg answer which department, General Affairs or Administrative Affairs, dey responsible for this inquiry. No put any character apart from 'General Affairs' or 'Administrative Affairs' inside your answer."

Initially, the LLM go generate answers without knowing anything about this company. Naturally, some answers go be wrong, while others fit just be correct by luck.

For each answer, the teacher system go determine whether it is correct or incorrect. Then, the combination of the input sentence, the LLM's answer, and the judgment result go save inside a knowledge base.

This process go repeat for about half of the training data.

For the remaining half of the training data, the same process go happen, but this time, all the information wey dem record inside the knowledge base go add to the system prompt for the LLM.

At this point, the knowledge base don get information about how General Affairs and Administrative Affairs departments divide work for this company, so the chance of getting correct answers suppose high pass the first half of the data.

Like this, a system wey combine the LLM and the knowledge base fit learn how General Affairs and Administrative Affairs departments divide work for this company.

The learning mechanism itself resemble traditional numerical machine learning. The difference na say the learning results dey show inside the knowledge base, not inside the parameters of the neural network inside the LLM. And, na natural language, no be numbers, dem dey record inside the knowledge base.

This na the basic model of natural language machine learning.

How Real The Basic Model Be

Anybody wey don use LLM go quickly see say this basic model no too real.

This na because, instead of person to dey worry imself to make a teacher system dey tell wetin correct and wetin no correct, you fit just put the training data itself inside the system prompt from the start.

But, if we use this basic model and change the story small, e go come get sense.

For example, make we say General Affairs and Administrative Affairs departments join hand open one inquiry desk, and human beings dey share incoming inquiries one-by-one to the correct department.

Dem fit create simple system to add these inquiries and their results of routing to a knowledge base.

Then, with this knowledge base, the LLM fit take over from human beings to route new inquiries to the departments.

For this case, if the LLM mistakenly send an inquiry wey suppose go Administrative Affairs to General Affairs, the person for General Affairs go send the inquiry back to Administrative Affairs. This information about sending it back go also dey recorded for the knowledge base.

This simple way of recording routing logs, when dem combine am with the LLM and knowledge base system, go come be a realistic supervised model for natural language machine learning.

The main point here, again, na say the neural network parameters inside the LLM no dey change at all. And the results of the learning wey dem get back no be numbers, but rather plenty natural language sentences.

Wetin more, this system na machine learning for sure, no be human learning system.

So, this na a new way of machine learning: machine learning through natural language.

Wetin Make Natural Language Machine Learning Strong

Unlike numerical machine learning, natural language learning get plenti advantages.

To put am for one word, wetin make am special na say e dey learn with very high efficiency.

Numerical machine learning normally need plenty training data and e go dey repeat learning. Also, dem go need to pre-process the training data.

Dem need plenty training data because the features wey person want learn no dey just one data, but e dey spread across plenty data.

Because of this, training data wey dey like the square of the dimension of the real features wey dem want, na wetin dem need.

Dem need to repeat learning because the way the parameters change during one feedback loop must dey small, so that the neural network parameters go learn well-well without going into wrong places.

Pre-processing of training data, like making it normal and finding the edges, na important to make the real features wey dem want come out well. This pre-processing also dey take serious effort.

For example, if dem wan learn how administrative and general affairs departments divide work using traditional neural network, and its features be 50-dimensional, dem go need at least 1000 or more training data points. Apart from that, those 1000+ data points fit need to dey repeated around 100 times to get good learning accuracy.

Wetin more, if these 1000 data points get extra words, different ways of spelling words, or different ways words dey arranged and grammar, then the learning efficiency go reduce, or e go learn features wey no dey related.

So, pre-processing to remove extra words, make vocabulary uniform to remove variations, and make word order and grammar the same, na very important.

But for the other hand, natural language machine learning need less training data, no need to repeat with the same training data, and for many cases, no need pre-processing.

If the features of the division of duties between the administrative and general affairs departments be 50-dimensional, 50 pieces of information wey match each dimension often dey enough.

Wetin more, this no mean say dem need 50 separate sentences.

One single sentence like "Duties related to A, B, C, and D are handled by the administrative department" fit get information for four dimensions inside.

Also, by making language simpler, information from different dimensions fit come together. One single sentence like "The administrative department is responsible for building consumables and equipment maintenance" dey bring together information from wide range of dimensions, including changing light bulb and automatic door spoiling.

This simplification dey use the LLM's knowledge wey e don learn before and its ability to reason, wey dey reduce the amount of training data wey dem need.

And, fundamentally, natural language learning no need to repeat learning. Once the sentence wey dem talk before don enter the knowledge base, the learning don complete.

Wetin more, no need to pre-process the knowledge. Even if descriptions of the administrative or general affairs departments mix with different other sentences, dem fit still use am as knowledge.

Or, raw data, like logs of inquiries and assignments as for the example before, fit immediately use as training data without pre-processing.

Like this, natural language machine learning fit learn way more efficiently pass numerical machine learning.

Last Talk

If you compare computer wey dey calculate numbers fast-fast, large language models wey dey process natural language no dey fast like that, dem dey slow small.

But, natural language machine learning fit learn beta and faster pass numerical machine learning.

This way wey e take dey learn fast pass the difference between how computer dey calculate fast and how natural language processing dey slow.

Apart from that, large language models, wey don develop well-well through numerical learning, resemble say dem don dey reach their limit for how dem fit improve just by making them bigger, according to wetin scaling laws talk.

If that one happen, e fit be say the main focus go shift to improving how dem dey work through natural language machine learning.