Traditional machine learning operates within a paradigm where computers, adept at numerical computation, learn from numerical data and acquire numerical parameters.
On the other hand, we are capable of learning not only through numerical mechanisms but also through language. We organize and record experiences as words, then recall or read those words to utilize them.
Large language models can similarly describe knowledge using words and utilize words by reading them.
By leveraging large language models, which are natural language processors, natural language-based machine learning becomes possible, rather than just numerical-based machine learning.
Consequently, the advent of large language models has opened up a new field: natural language machine learning.
The pre-training of large language models is traditional numerical machine learning. The natural language machine learning described here refers to a new form of machine learning that utilizes pre-trained large language models.
Basic Model of Natural Language Machine Learning
Natural language machine learning possesses aspects that are similar to, and entirely different from, traditional numerical machine learning.
First, to grasp an image of natural language machine learning, we will explain the parts that are similar to traditional numerical machine learning as a basic model.
From this point on, we will refer to a pre-trained large language model as LLM. Please note that the parameters of the LLM do not change at all during this learning process.
The basic model is supervised learning, targeting a classification problem.
Multiple pairs of input sentences and their classifications are prepared as correct answers for training data.
For example, let's say a company has a General Affairs Department and an Administrative Affairs Department.
These two departments have a division of duties. For input sentences such as "The office light bulb is out," "I forgot my access card," or "I want to reserve the main hall at headquarters," the classification indicates whether the General Affairs Department or the Administrative Affairs Department is responsible.
From this training data, only the input sentences are extracted and fed into the LLM.
Here, as a system prompt, we intentionally restrict the answer by stating, "Please answer which department, General Affairs or Administrative Affairs, is responsible for this inquiry. Do not include any characters other than 'General Affairs' or 'Administrative Affairs' in your answer."
Initially, the LLM will generate answers without any knowledge of this company. Naturally, some answers will be incorrect, while others might be correct by chance.
For each answer, the teacher system determines whether it is correct or incorrect. Then, the combination of the input sentence, the LLM's answer, and the judgment result is saved in a knowledge base.
This process is repeated for about half of the training data.
For the remaining half of the training data, the same process is performed, but this time all the information recorded in the knowledge base is added to the system prompt for the LLM.
At this point, the knowledge base contains information about the division of duties between the General Affairs and Administrative Affairs departments of this company, so the probability of getting correct answers should be higher than with the first half of the data.
In this way, a system combining the LLM and the knowledge base can learn the division of duties between the General Affairs and Administrative Affairs departments of this company.
The learning mechanism itself is similar to traditional numerical machine learning. The difference is that the learning results are reflected in the knowledge base, not in the parameters of the neural network within the LLM. And, natural language, not numbers, is recorded in the knowledge base.
This is the basic model of natural language machine learning.
Reality of the Basic Model
As anyone leveraging LLMs will quickly realize, this basic model lacks realism.
This is because, instead of going through the trouble of having a teacher system determine correct and incorrect answers, one could simply input the training data itself into the system prompt from the beginning.
However, by applying the basic model and slightly altering the scenario, it gains realism.
For example, suppose the General Affairs and Administrative Affairs departments collaboratively establish an inquiry desk, and humans individually triage incoming inquiries to the appropriate department.
A simple system can be created to add these inquiries and their routing results to a knowledge base.
Then, using this knowledge base, the LLM can take over from humans in routing new inquiries to the departments.
In this case, if the LLM incorrectly routes an inquiry meant for Administrative Affairs to General Affairs, the person in charge at General Affairs will re-route the inquiry back to Administrative Affairs. This re-routing information is also recorded in the knowledge base.
This simple mechanism for recording routing logs, combined with the LLM and knowledge base system, would become a realistic supervised model for natural language machine learning.
The key point here, again, is that the neural network parameters within the LLM do not change at all. And the feedback learning results are not numerical values, but rather collections of natural language sentences.
Moreover, this system is unequivocally a machine learning, not a human learning, system.
Therefore, this is a new form of machine learning: machine learning through natural language.
Strengths of Natural Language Machine Learning
Unlike numerical machine learning, natural language learning offers many advantages.
In a word, its defining characteristic is its overwhelming learning efficiency.
Numerical machine learning generally requires a large amount of training data and iterative learning. Pre-processing of the training data is also necessary.
A large amount of training data is needed because the features one wants to learn are not contained within a single piece of data, but are distributed across a large volume of data.
For this reason, training data on the order of the square of the dimension of the truly desired features is required.
Iterative learning is necessary because the change in parameters during a single feedback loop must be small to ensure that the neural network parameters are learned appropriately without falling into local optima.
Pre-processing of training data, such as normalization and edge extraction, is necessary to highlight the truly desired features. This pre-processing also requires significant effort.
For example, if the division of duties between the administrative and general affairs departments were to be learned using a traditional neural network, and its features were 50-dimensional, at least 1000 or more training data points would be required. In addition, these 1000+ data points might need to be iterated through approximately 100 times to achieve adequate learning accuracy.
Furthermore, if these 1000 data points contain extraneous words, variations in word spellings, or a variety of word orders and syntaxes, learning efficiency will decrease, or irrelevant features will be learned.
Therefore, pre-processing to remove extraneous words, standardize vocabulary to eliminate variations, and unify word order and syntax is indispensable.
On the other hand, natural language machine learning requires less training data, does not require iteration with the same training data, and in many cases, does not require pre-processing.
If the features of the division of duties between the administrative and general affairs departments are 50-dimensional, 50 pieces of information corresponding to each dimension are often sufficient.
Moreover, this does not mean that 50 separate sentences are required.
A single sentence like "Duties related to A, B, C, and D are handled by the administrative department" can include information for four dimensions.
Furthermore, by abstracting language, information from multiple dimensions can be aggregated. A single sentence like "The administrative department is responsible for building consumables and equipment maintenance" aggregates information from a wide range of dimensions, including light bulb replacement and automatic door malfunctions.
This abstraction leverages the LLM's pre-trained knowledge and reasoning capabilities, thereby reducing the amount of training data needed.
And, fundamentally, natural language learning does not require iterative learning. Once the aforementioned sentence is added to the knowledge base, learning is complete.
Furthermore, pre-processing of the knowledge is not necessary. Even if descriptions of the administrative or general affairs departments are mixed in with various other sentences, they can still be used as knowledge.
Alternatively, raw data, such as logs of inquiries and assignments as in the previous example, can be immediately utilized as training data without pre-processing.
In this way, natural language machine learning can learn far more efficiently than numerical machine learning.
Conclusion
Compared to the high-speed numerical computation capabilities of computers, the natural language processing capabilities of large language models are quite slow.
However, natural language machine learning allows for more efficient learning compared to numerical machine learning.
This efficiency far outweighs the gap between high-speed numerical computation capabilities and slow natural language processing capabilities.
Furthermore, large language models, which have undergone astonishing evolution through numerical learning, seem to be approaching a limit in capability improvement through simple scaling up, according to scaling laws.
In that case, it is highly conceivable that the focus will shift to improving capabilities through natural language machine learning.