Artificial intelligence gains intelligent behavior through a technology called machine learning.
While this learning is carried out according to procedures developed by humans, why intelligence emerges from these procedures and the structure of artificial intelligence has not yet been explained.
In this article, I will explore the reasons why intelligence arises by considering the essence of learning itself.
And as we delve deeper into the concept of learning, we arrive at the idea that both artificial intelligence and our brains possess an innate tendency to learn how to learn.
This suggests the existence of a mechanism that can be called a "natural born frameworker."
Learning Through the Body vs. Learning Through Language
We learn about the world around us and expand our capabilities by seeing things with our eyes and moving our bodies.
This is also a form of learning, which can be called learning through the body.
On the other hand, when people generally speak of learning, they likely imagine increasing knowledge by reading textbooks or listening to a teacher's explanations.
In addition to such curriculum-based learning, we also acquire various knowledge from conversations with friends, online news, and so on.
This type of learning is not about memorizing images visually or learning by moving one's body; it is learning through language.
Sub-physical Learning and Metaphysical Learning
Among learning through language, there are cases where information can only be memorized through repeated repetition, and cases where it can be memorized after hearing it once or a few times.
Alternatively, there is knowledge that, even if the details are not remembered, can be used by retrieving them from a bookshelf or the internet at the necessary moment.
In the sense of acquiring knowledge and utilizing it appropriately when needed, both of these patterns can be called learning.
Of these, knowledge that can only be memorized through repeated repetition can be called sub-physical knowledge. The learning process for this is sub-physical learning, which involves memorizing the concepts themselves.
This is similar to physical learning, where one repeatedly learns by seeing objects with one's eyes or moving one's body. These can also be classified as sub-physical learning.
On the other hand, the acquisition of knowledge that can be memorized with fewer repetitions, or looked up and used on the spot, can be called metaphysical learning.
In this case, pre-learned concepts acquired through sub-physical learning can be utilized to learn knowledge as types of those concepts or as combinations of concepts.
Since concepts already acquired through sub-physical learning can be utilized, metaphysical learning does not require repetition.
Natural Language Machine Learning
Let's apply this to machine learning in artificial intelligence.
Generally, neural networks used in machine learning perform sub-physical learning, which involves repeatedly learning concepts.
On the other hand, large language models, capable of natural language processing similar to humans, can perform learning through language.
During the pre-training and fine-tuning of large language models, sub-physical learning through language takes place.
Furthermore, a pre-trained large language model can answer by utilizing the knowledge contained in the input sentence, thus performing immediate metaphysical learning.
Thanks to this ability of metaphysical learning through language, large language models can utilize new knowledge without repetitive learning.
This can be called natural language machine learning, in contrast to traditional numerical machine learning that iteratively adjusts model parameters.
Natural Language as the Metaphysical Interface
Natural language is situated at the interface that separates sub-physical and metaphysical learning.
The fascinating aspect of natural language is that it can be acquired through sub-physical learning, and on top of that, it enables metaphysical learning.
Metaphysical Interfaces Other Than Natural Language
In reality, even in physical learning, both sub-physical and metaphysical learning exist. For instance, someone skilled in sports can quickly adapt to a new game they encounter for the first time.
Similarly, someone knowledgeable in biology can immediately understand the characteristics of a new species when they see it.
Thus, in physical learning as well, there exist metaphysical interfaces that hold a similar position to natural language.
Frameworks
At these interfaces are frameworks that, distinct from elemental concepts or knowledge, define their relationships and structures, or enable new structuring.
As a variety of sub-physical knowledge is acquired through sub-physical learning, it may be possible to learn the framework at the metaphysical interface from the connections between the pieces of sub-physical knowledge.
Frameworks acquired through physical learning enable new knowledge to be immediately learned metaphysically after acquisition. However, it is not easy to convey the knowledge gained through this metaphysical learning to others.
On the other hand, the framework acquired through learning by language is natural language itself.
Therefore, knowledge acquired through metaphysical learning, after learning the natural language framework, can be directly inputted into other people's learning by language.
This applies not only to knowledge where learning through language, such as textbooks or online news, is fundamental.
An experienced soccer player, playing baseball for the first time, might be able to convey the metaphysical knowledge acquired about baseball to other soccer players through words. This means that if people share the same sub-physical knowledge, so-called "tips" or know-how can be communicated verbally.
Furthermore, one could share knowledge about a newly discovered species they witnessed with other biologists through words.
Thus, natural language is revealed to be a very powerful framework at the metaphysical interface.
Virtual Frameworks
Above natural language, one can acquire other frameworks.
These are domain-specific frameworks or formal frameworks.
Within various academic fields, business sectors, and daily life, there are diverse domain-specific frameworks.
Scholars, operating within the framework of their specialty, can make new discoveries and easily convey that knowledge to other scholars who possess the same framework.
The framework itself can sometimes be expressed in natural language, in which case it can be learned and understood by people or large language models that possess the natural language framework.
Business models and cooking recipes are also examples of such domain-specific frameworks that can be expressed in natural language.
Furthermore, mathematical formulas, programming languages, and business analysis frameworks are formal frameworks.
These too can have their frameworks expressed or explained in natural language.
These domain-specific and formal frameworks built upon natural language can be called virtual frameworks.
This is easy to understand if you imagine a virtual machine running a different OS on a physical computer. Another framework functions on top of the foundational framework of natural language.
Native Frameworks
Furthermore, while these virtual frameworks initially need to be understood via natural language, as one becomes accustomed to them, they begin to bypass natural language explanation and understanding, directly functioning as a metaphysical interface framework built upon sub-physical knowledge.
This can be called a native framework.
Natural language is, in a sense, also a native framework, but only concerning one's mother tongue. Generally, languages other than one's mother tongue are acquired as virtual frameworks. As proficiency increases, they approach becoming native frameworks.
The same applies to domain-specific and formal frameworks. Mathematicians can communicate natively using mathematical formulas, and programmers can understand each other's intentions solely through source code without comments.
This suggests that the progression from virtual to native frameworks can also be applied to large language models.
The idea of detecting frequently used virtual frameworks, generating a large amount of example data using those frameworks, and then fine-tuning them to become native frameworks is worth trying immediately.
Natural Born Frameworkers
Considering this, one realizes that during the pre-training of large language models, not just fine-tuning, there's a possibility that they are also learning domain-specific and formal frameworks.
And in that process, it's conceivable that instead of natively learning domain-specific or formal frameworks from the start, they first learn the natural language framework, and then, either during or after mastering it, they learn domain-specific and formal frameworks, making them native.
Delving deeper into this stepwise framework learning, it's also conceivable that natural language learning itself is a parallel pipeline of very fine-grained, stepwise framework learning.
In other words, from the massive amount of text provided as training data during pre-training, large language models might not only learn individual concepts but also a few very simple rules of natural language as a framework. Then, using these simple frameworks as a foundation, they repeatedly learn slightly more complex rules.
This would allow them to progress from a stage where they initially learned word concepts to memorizing compound words and basic grammar, and then to understanding sentences, and learning complex things like writing and expression techniques.
This can be understood as a model where they learn frameworks in a stepwise and complex manner, using one framework as the foundation for learning the next.
This highlights large language models as "natural born frameworkers," possessing a mechanism for learning frameworks from the very beginning.
Attention Mechanism
The technology that actualizes the natural-born frameworker is the attention mechanism.
The attention mechanism is akin to selecting relevant tokens from a context. It clarifies the relationships between tokens. This is precisely the nature of a framework: abstracting by retaining important concepts while clarifying the relationships between them.
By switching this selection for each token, it enables dynamic switching of frameworks.
This allows us to explain why the attention mechanism is a technology that determines the evolution of large language models, using the natural-born frameworker model.
Conclusion
If this mechanism is indeed occurring during the pre-training process of large language models, then the previously mysterious mechanisms of large language models can be explained.
These include the sub-physical and metaphysical learning discussed here, frameworks as metaphysical interfaces, natural language enabling learning through language and virtual frameworks, and the attention mechanism realizing the natural-born frameworker.
Furthermore, two additional points are suggested from this.
First, natural language has a structure highly suited for progressively internalizing complex frameworks from simpler ones.
If natural language initially appeared in human society in a simple form and gradually grew to possess a more complex and rich structure, this is a natural consequence.
Moreover, it would be advantageous for it to be structured in a way that allows for rapid learning. Assuming that multiple societies with different natural languages were competing, the hypothesis that natural languages better suited for learning are currently surviving is easily formed.
Reflecting on this nature of natural language leads to the second suggestion: that we humans are also natural-born frameworkers.
Even if the specific underlying foundations and mechanisms differ, our brains must also be equipped with a mechanism, similar to the attention mechanism, that allows for the stepwise learning and flexible adaptation of frameworks.