Skip to Content
This article has been translated from Japanese using AI
Read in Japanese
This article is in the Public Domain (CC0). Feel free to use it freely. CC0 1.0 Universal

Natural Language Machine Learning

Traditional machine learning operates within a paradigm where computers, adept at numerical computation, learn using numerical data and acquire quantified parameters.

However, humans are capable of learning not only through numerical mechanisms but also through language. We organize and record experiences in words, and then recall, read, and utilize those words.

Large Language Models (LLMs) can similarly describe knowledge in words and utilize knowledge by reading words.

By leveraging LLMs as natural language processors, natural language-based machine learning becomes possible, rather than solely numerical-based machine learning.

For this reason, the advent of LLMs has opened up a new field: natural language machine learning.

The pre-training of LLMs is a form of traditional numerical machine learning. The natural language machine learning discussed here refers to a new type of machine learning that utilizes pre-trained LLMs.

Basic Model of Natural Language Machine Learning

Natural language machine learning possesses aspects that are similar to conventional numerical machine learning, as well as aspects that are entirely different.

To first grasp the concept of natural language machine learning, let's describe a basic model focusing on the parts that resemble traditional numerical machine learning.

From here on, a pre-trained Large Language Model will be referred to as LLM. Note that the LLM's parameters do not change at all during this learning process.

The basic model is a supervised learning model, targeting classification problems.

For the learning data, multiple pairs of input sentences and their classifications are prepared as correct answers.

For example, let's say a company has a General Affairs Department and an Administrative Affairs Department.

These two departments have distinct roles. For input sentences such as "The office light bulb is out," "I forgot my access card," or "I want to book the main hall at headquarters," the classification indicates which department, General Affairs or Administrative Affairs, is responsible.

From this training data, only the input sentences are extracted and fed into the LLM.

Here, we intentionally restrict the response via a system prompt such as, "Please state whether the responsible department for this inquiry is General Affairs or Administrative Affairs. Do not include any characters other than 'General Affairs' or 'Administrative Affairs' in your answer."

Initially, the LLM generates a response without knowledge of this company. Naturally, it might be incorrect, or occasionally correct by chance.

For each response, a teaching system determines whether it's correct or incorrect. Then, the combination of the input sentence, the LLM's response, and the judgment result is saved to a knowledge base.

This process is repeated for about half of the training data.

For the remaining half of the training data, all the information recorded in the knowledge base is added to the system prompt for the LLM, and the same process is performed.

At this point, the knowledge base contains information about the division of duties between the General Affairs and Administrative Affairs departments of this company, so the likelihood of a correct answer should be higher than with the first half of the data.

In this way, a system combining an LLM and a knowledge base can learn the division of duties for a company's General Affairs and Administrative Affairs departments.

The learning mechanism itself is similar to traditional numerical machine learning. The difference is that the learning results are reflected in the knowledge base, not in the parameters of the neural network within the LLM. Furthermore, the knowledge base records natural language, not numerical values.

This is the basic model of natural language machine learning.

Reality of the Basic Model

As those who utilize LLMs will quickly realize, this basic model lacks realism.

This is because there's no need to go through the trouble of having a teaching system determine correct/incorrect judgments; one could simply input the training data itself into the system prompt from the beginning.

However, by applying the basic model and slightly altering the scenario, it gains realism.

For instance, imagine that the General Affairs Department and Administrative Affairs Department jointly create an inquiry desk, and a human manually assigns each incoming inquiry to the appropriate department.

A simple system is built to add these inquiries and their assignment results to a knowledge base.

Then, using this knowledge base, the LLM can take over from humans and assign new inquiries to the departments.

In this case, if the LLM incorrectly assigns an inquiry meant for Administrative Affairs to General Affairs, the General Affairs staff will re-assign the inquiry back to Administrative Affairs. This re-assignment information is also recorded in the knowledge base.

This simple mechanism for recording assignment logs, combined with an LLM and a knowledge base, would constitute a realistic supervised natural language machine learning model.

The key point here, to reiterate, is that the parameters of the neural network within the LLM do not change at all. Moreover, the feedback learning result is a collection of natural language sentences, not numerical values.

And, without a doubt, this system involves machine learning, not human learning.

Therefore, this is a new form of machine learning: natural language machine learning.

Strengths of Natural Language Machine Learning

Unlike numerical machine learning, natural language learning offers many advantages.

In a word, its defining characteristic is overwhelmingly high learning efficiency.

Numerical machine learning generally requires a large amount of training data and iterative learning. Furthermore, pre-processing of the training data is also necessary.

A large amount of training data is needed because the features to be learned are not contained within a single piece of data but are distributed among a vast quantity of data.

For this reason, training data on the order of the square of the dimensionality of the truly desired features is required.

Iterative learning is necessary to ensure that the neural network's parameters are learned appropriately without falling into local minima, which requires keeping the parameter change small with each feedback.

Pre-processing of training data, such as normalization and edge extraction, is needed to highlight the truly desired features. This pre-processing also demands significant effort.

For example, if the division of duties between the Administrative Affairs Department and the General Affairs Department were to be learned using a traditional neural network, and its features were 50-dimensional, at least approximately 1,000 or more training data instances would be required. In addition, these 1,000+ data instances might need to be iteratively learned about 100 times to achieve appropriate learning accuracy.

Furthermore, if this set of 1,000 training data instances contains extraneous words, variations in spelling, or a variety of word orders and sentence structures, learning efficiency decreases, and unrelated features may be learned.

Therefore, pre-processing to remove extraneous words, standardize terminology to eliminate variations, and unify word order and syntax is indispensable.

In contrast, natural language machine learning requires less training data, no iteration with the same training data, and often no pre-processing.

If the features for the division of duties between the Administrative Affairs Department and the General Affairs Department are 50-dimensional, 50 pieces of information corresponding to each dimension are sufficient.

Moreover, this does not mean that 50 separate sentences are required.

A single sentence like "Duties related to A, B, C, and D are handled by the Administrative Affairs Department" can encompass four dimensions of information.

Furthermore, by abstracting language, information from multiple dimensions can be aggregated. A sentence such as "Maintenance of building consumables and facilities is the responsibility of the Administrative Affairs Department" aggregates a wide range of dimensional information, including light bulb replacement and automatic door malfunctions.

This abstraction can be said to reduce the training data by leveraging the LLM's pre-trained knowledge and reasoning capabilities.

And, fundamentally, natural language learning does not require iterative learning. Once the aforementioned sentence is added to the knowledge base, learning is complete.

Additionally, pre-processing of knowledge is unnecessary. Even if explanations of the Administrative Affairs Department or General Affairs Department are mixed within various texts, they can still be utilized as knowledge.

Or, as in the previous example, raw data such as inquiry and assignment records can be immediately used as training data without pre-processing.

Thus, natural language machine learning can learn far more efficiently than numerical machine learning.

Conclusion

Compared to the high-speed numerical computation capabilities of computers, the natural language processing ability of large language models is quite slow.

However, natural language machine learning allows for efficient learning, far exceeding the gap between high-speed numerical computation and slow natural language processing.

Furthermore, large language models, which have made astonishing progress through numerical learning, seem to be approaching the limits of performance improvement through simple scaling up, according to scaling laws.

In such a scenario, it is highly plausible that the focus will shift towards enhancing capabilities through natural language machine learning.