Di generative AI wey we get now na AI technology wey blow up because dem invent Transformers, and na big breakthrough dat one be.
Di "Attention Mechanism" na im make Transformer special, if you wan describe am with one sentence. E clear well well for di title of di paper wey announce Transformer: "Attention is All You Need."
Dis tin start because AI researchers for dat time dey try different different things and experiments to make AI fit handle natural language as smart as human beings, and dem dey name and publish papers on different successful methods.
Many researchers believe say if dem combine all dis different different mechanisms wey dey work well for different ways, AI wey fit handle natural language like human beings go just dey appear small small. So dem dey work to find new mechanisms wey fit work with other mechanisms, and dem dey discover di best combinations of dis mechanisms.
But, di Transformer just scatter dis old belief. Di message wey e carry say no need to combine different different mechanisms, say all you need na di attention mechanism, na im di paper title talk.
Of course, di Transformer itself get different different mechanisms inside am, but no doubt say among all of dem, di "attention mechanism" na im really break ground and e special.
Wetin Be Attention Mechanism
Di "attention mechanism" na system wey, as e dey treat natural language word by word, e fit learn which of di many words wey don pass for one sentence e suppose "pay attention to" when e dey process one particular word.
Dis one make am fit understand well well wetin words like "dis," "dat," or "di one wey dem don mention before" (wey dey refer to words wey don dey inside previous sentences), or phrases like "di first sentence," "di second example wey dem list," or "di paragraph wey pass" (wey dey show positions for di text), dey refer to.
Plus, e fit interpret words correct even when di words wey dey describe dem far inside di sentence, and even if di text long, e fit interpret without losing di main meaning of di word wey e dey process now, even among other sentences.
Na dis one be di usefulness of "attention."
On di other hand, dis one still mean say when e dey interpret di word wey e dey work on now, e go cover up and comot unnecessary words from di interpretation.
By just keeping di words wey dey necessary for di interpretation of one particular word and comot di ones wey no concern am, di group of words wey e go interpret go remain small, no matter how long di text be, and dis one go stop di interpretation from scattering.
Virtual Intelligence
Ok, make we change topic small, I don dey think about dis idea of "virtual intelligence."
For now, when dem dey use generative AI for business, if you gather all di information for inside one company and give am to di generative AI as "knowledge," di whole plenty plenty knowledge fit make am hard for di AI to handle am well.
Because of dis, e dey work better if dem divide di knowledge according to tasks. Dem go prepare AI chats for each task or create AI tools wey specialize for particular operations.
Dis one mean say for complicated tasks, you go need to combine dis divided knowledge-based AI chats and tools.
Na one limitation dis one be for now when we dey use generative AI, but even with future generative AI, for specific tasks, focusing only on di knowledge wey dat task need suppose give you better accuracy.
Instead, I believe say future generative AI go fit internally switch between di knowledge wey e need depending on di situation, even without human beings dividing di knowledge for am.
Dis ability na "virtual intelligence." E be like virtual machine wey fit run different different operating systems on top one computer. E mean say inside one intelligence, multiple virtual intelligences with different specializations fit dey function.
Even di generative AI wey we get now fit already pretend say multiple people dey discuss or generate stories wey get multiple characters. So, "virtual intelligence" no be any special power, but rather na just extension of di generative AI wey we get now.
Small Small Virtual Intelligence (Micro Virtual Intelligence)
Di way virtual intelligence dey work, wey be say e dey reduce di knowledge wey e need based on di work wey dey hand, e dey do something similar to "attention mechanism."
Wetin I mean be say, e resemble "attention mechanism" because e just focus on di relevant knowledge depending on di task wey e dey process for dat time.
On di other hand, you fit say "attention mechanism" na mechanism wey dey bring out something like virtual intelligence. But, di virtual intelligence wey I dey consider na mechanism wey dey select relevant knowledge from a group of knowledge, while di "attention mechanism" dey operate on a group of words.
Because of dis, dem fit call "attention mechanism" "micro virtual intelligence."
Clear-Clear Attention Mechanism
If we see "attention mechanism" as small-small virtual intelligence, then on di other hand, di virtual intelligence wey I mention before fit happen if we build a big "attention mechanism."
And dis big "attention mechanism" no need to dey add am to di inside structure of big language models or involve neural network training.
E fit just be a clear sentence wey dem write for normal language, like "When you dey do Task A, check Knowledge B and Knowledge C."
Dis one make di knowledge wey Task A need clear. Dis sentence itself na type of knowledge.
Dem fit call dis one "explicit attention mechanism." You fit describe dis sentence as "attention knowledge," wey dey clearly state di knowledge wey dem suppose focus on when dem dey do Task A.
Still yet, dis "attention knowledge" fit dey generated or updated by generative AI.
If one task no work because of lack of knowledge, then as lesson, dem fit update di "attention knowledge" to include extra knowledge wey dem suppose check for dat task.
Final Talk
Di "attention mechanism" don seriously ginger di power of generative AI.
E no just be mechanism wey just happen to work well; as we don see for here, di way e dey dynamically reduce di information wey e go refer to for each situation just be like di main point of high-level intelligence.
And just like virtual intelligence and clear "attention knowledge," di "attention mechanism" still be di main key to make intelligence dey advance layer by layer.