Current generative AI is an AI technology that has blossomed thanks to the invention of Transformers, which was a major breakthrough.
The Attention Mechanism is what characterizes the Transformer in a single phrase. This is succinctly expressed in the title of the paper announcing the Transformer: "Attention is All You Need."
This has its roots in the fact that AI researchers at the time were making various efforts and trials to enable AI to handle natural language as skillfully as humans, naming and publishing papers on various successful methods.
Many researchers believed that by combining these multiple well-functioning mechanisms in diverse ways, AI that could handle natural language like humans would gradually emerge. They were thus working on finding new mechanisms that could function in combination with other mechanisms, and discovering the optimal combinations of these mechanisms.
However, the Transformer overturned this conventional wisdom. The message that it's unnecessary to combine various mechanisms, and that all that's needed is the attention mechanism, is expressed in the paper's title.
Of course, the Transformer itself incorporates various mechanisms, but there is no doubt that among them, the attention mechanism was particularly groundbreaking and distinctive.
Overview of the Attention Mechanism
The attention mechanism is a system that, during the process of treating natural language word by word, can learn which of the many preceding words in a sentence it should "pay attention to" when processing a given word.
This allows it to accurately understand what words like "this," "that," or "the aforementioned" (which refer to words contained in previous sentences), or phrases like "the opening sentence," "the second example listed," or "the previous paragraph" (which indicate positions in the text), are referring to.
Moreover, it can correctly interpret words even when modifiers are distant within a sentence, and even when a text becomes long, it can interpret without losing the context of the current word among other sentences.
This is the utility of "attention."
Conversely, this also means that when interpreting the word currently being processed, unnecessary words are masked and removed from the interpretation.
By retaining only the words necessary for the interpretation of a given word and removing irrelevant ones, the set of words to be interpreted remains limited to a small number, no matter how long the text, thereby preventing the interpretation density from becoming diluted.
Virtual Intelligence
Now, changing the subject slightly, I've been thinking about the concept of virtual intelligence.
Currently, when using generative AI for business, if you consolidate all information within a company and provide it as knowledge to the generative AI, the sheer volume of knowledge can actually make it difficult for the AI to handle it appropriately.
For this reason, it works better to divide knowledge by task, preparing AI chats for each task or creating AI tools specialized for specific operations.
This implies that for complex tasks, it becomes necessary to combine these segmented knowledge-based AI chats and tools.
This is a current limitation when using generative AI, but even with future generative AI, for specific tasks, focusing only on the knowledge required for that task should yield higher accuracy.
Instead, I believe that future generative AI will be able to internally switch between necessary knowledge sets depending on the situation, even without humans having to segment the knowledge.
This capability is virtual intelligence. It's like a virtual machine that can run multiple different operating systems on a single computer. It means that within one intelligence, multiple virtual intelligences with different specializations can function.
Even current generative AI can already simulate discussions among multiple people or generate stories featuring multiple characters. Therefore, virtual intelligence is not a special ability, but rather an extension of current generative AI.
Micro Virtual Intelligence
The mechanism of virtual intelligence, which narrows down the necessary knowledge according to the task, performs something similar to the attention mechanism.
That is, it is similar to the attention mechanism in that it focuses only on relevant knowledge depending on the task currently being processed.
Conversely, the attention mechanism can be said to be a mechanism that realizes something like virtual intelligence. However, the virtual intelligence I am considering is a mechanism that selects relevant knowledge from a set of knowledge, whereas the attention mechanism operates on a set of words.
For this reason, the attention mechanism can be called a micro virtual intelligence.
Explicit Attention Mechanism
If we view the attention mechanism as micro virtual intelligence, then conversely, the virtual intelligence I mentioned earlier can be realized by constructing a macro attention mechanism.
And this macro attention mechanism does not need to be added to the internal structure of large language models or involve neural network training.
It can simply be an explicit sentence written in natural language, such as "When performing Task A, refer to Knowledge B and Knowledge C."
This clarifies the knowledge needed for Task A. This sentence itself is a kind of knowledge.
This could be called an explicit attention mechanism. This sentence can be described as attention knowledge, which explicitly states the knowledge that should be focused on when performing Task A.
Furthermore, this attention knowledge can be generated or updated by generative AI.
If a task fails due to a lack of knowledge, then as a lesson learned, the attention knowledge can be updated to include additional knowledge that should be referenced for that task.
Conclusion
The attention mechanism has dramatically improved the capabilities of generative AI.
It wasn't merely a mechanism that happened to work well; as we've seen here, the very mechanism of dynamically narrowing down the information to refer to in each situation seems to be the essence of advanced intelligence.
And like virtual intelligence and explicit attention knowledge, the attention mechanism is also key to recursively advancing intelligence at various layers.