De normal machine learning wey we sabi dey work with one kind system wey computers, wey sabi calculate numbers well well, dey learn with number data and dem go get parameters wey dem don quantify as numbers.
But, as human beings, we fit learn no be only with how numbers dey work, but with language too. We dey arrange and write down wetin we don experience as words, and den we go remember, read, and use those words.
Large Language Models (LLMs) fit also describe knowledge with words and use knowledge by reading words.
By using LLMs as natural language processors, natural language-based machine learning don become possible, instead of just machine learning wey dey based on numbers.
Because of dis, as LLMs don come, e don open new field: natural language machine learning.
De pre-training of LLMs na one kind normal numerical machine learning. But de natural language machine learning wey we dey talk about here mean one new kind of machine learning wey dey use LLMs wey dem don pre-train already.
Basic Model of Natural Language Machine Learning
Natural language machine learning get some things wey be like normal numerical machine learning, and e get some things wey dey totally different.
To first understand natural language machine learning, make we talk about a basic model wey focus on di parts wey be like traditional numerical machine learning.
From now on, we go dey call a pre-trained Large Language Model "LLM". Make una note say di LLM's parameters no go change at all during dis learning process.
De basic model na a supervised learning model, and e dey target classification problems.
For de learning data, dem dey prepare many pairs of input sentences and their classifications as correct answers.
For example, make we say one company get General Affairs Department and Administrative Affairs Department.
Dis two departments get different different work. For input sentences like "De office light bulb no dey work again," "I forget my access card," or "I want to book de main hall for headquarters," de classification dey show which department, General Affairs or Administrative Affairs, dey in charge.
From dis training data, dem go only extract de input sentences and feed dem into de LLM.
Here, we intentionally limit de answer with a system prompt like, "Please tell us if de department wey dey responsible for dis inquiry na General Affairs or Administrative Affairs. No add any other character apart from 'General Affairs' or 'Administrative Affairs' for your answer."
At first, de LLM go generate an answer without knowing anything about dis company. Na normal ting, e fit be wrong, or sometimes e fit just happen to be correct by luck.
For each answer, one teaching system go determine if e correct or incorrect. Den, de combination of de input sentence, de LLM's answer, and de judgment result go be saved to one knowledge base.
Dem go repeat dis process for about half of de training data.
For de remaining half of de training data, all de information wey dem don record for de knowledge base go be added to de system prompt for de LLM, and dem go do de same process.
At dis point, de knowledge base don contain information about how de General Affairs and Administrative Affairs departments of dis company dey share work, so de chance of gettin' a correct answer suppose dey higher dan with de first half of de data.
Like dis, a system wey combine an LLM and a knowledge base fit learn how de General Affairs and Administrative Affairs departments of a company dey share work.
De learning mechanism itself be like traditional numerical machine learning. De difference na say de learning results dey show for de knowledge base, no be for de parameters of de neural network inside de LLM. Wetin more, de knowledge base dey record natural language, no be numerical values.
Dis na de basic model of natural language machine learning.
Wetin Dey Real About di Basic Model
As anybody wey dey use LLMs go quickly notice, dis basic model no really dey real.
Dis na because, no need to dey worry yourself to get one teaching system to say if something correct or incorrect; person fit just put de training data itself inside de system prompt from de start.
However, if we use dis basic model and just change de scenario small, e go come get realness.
For example, imagine say General Affairs Department and Administrative Affairs Department join hand create one inquiry desk, and one human dey assign each incoming inquiry manually to de correct department.
Dem go build one simple system to add dis inquiries and their assignment results to one knowledge base.
Den, by using dis knowledge base, de LLM fit take over from humans and assign new inquiries to de departments.
For dis case, if de LLM mistakenly assign an inquiry wey suppose go to Administrative Affairs go to General Affairs, de General Affairs staff go re-assign de inquiry back to Administrative Affairs. Dis re-assignment information go also be recorded for de knowledge base.
Dis simple system for recording assignment logs, combined with an LLM and a knowledge base, go make up a realistic supervised natural language machine learning model.
De main point here, to talk am again, na say de parameters of de neural network inside de LLM no dey change at all. Wetin more, de feedback learning result na a collection of natural language sentences, no be numerical values.
And, no doubt, dis system involve machine learning, no be human learning.
So, dis na a new kind of machine learning: natural language machine learning.
How Natural Language Machine Learning Better Pass Other One
Unlike numerical machine learning, natural language learning get plenty good good points.
In one word, wetin make am special na say e sabi learn very fast.
Numerical machine learning normally need plenty training data and e go dey learn am over and over. Plus, dem go still need to arrange de training data first.
Dem need plenty training data because de features wey dem want learn no just dey inside one data alone, e dey scattered among plenty plenty data.
Because of dis, dem go need training data wey plenty like de square of de dimension of de features wey dem really want.
Dem go need to learn am over and over to make sure say de neural network's parameters learn well without dey stuck for one wrong place, and dis one mean say dem go keep de parameter change small for each feedback.
Dem go need to arrange de training data first, like normalization and edge extraction, to make de features wey dem really want learn dey clear. Dis pre-processing too dey require plenty effort.
For example, if dem want use a normal neural network to learn how Administrative Affairs Department and General Affairs Department dey share work, and de features get 50 dimensions, dem go need at least around 1,000 or more training data. Plus, dem fit need to learn dis 1,000+ data over and over about 100 times to get correct learning accuracy.
Wetin more, if dis 1,000 training data get extra words, different spellings, or different word orders and sentence structures, de learning efficiency go drop, and e fit learn features wey no even relate.
So, pre-processing to remove extra words, make all terminology de same to avoid different spellings, and make word order and syntax de same na very important.
But for natural language machine learning, e no need plenty training data, no need to learn de same training data over and over, and most times, no need for pre-processing.
If de features for how Administrative Affairs Department and General Affairs Department dey share work get 50 dimensions, 50 pieces of information wey dey match each dimension don do.
Wetin more, dis no mean say dem need 50 separate sentences.
One single sentence like "Duties wey relate to A, B, C, and D, Administrative Affairs Department dey handle am" fit cover four dimensions of information.
Wetin more, by making language abstract, information from many dimensions fit come together. One sentence like "How dem dey maintain building consumables and facilities na de responsibility of Administrative Affairs Department" go gather plenty dimensional information, including changing light bulb and automatic door wey spoil.
Dis abstraction fit be said to reduce de training data by using de LLM's pre-trained knowledge and reasoning power.
And, fundamentally, natural language learning no need iterative learning. Once dem add de sentence wey we talk about before to de knowledge base, learning don complete.
Also, no need for pre-processing of knowledge. Even if dem mix explanations of Administrative Affairs Department or General Affairs Department inside different texts, dem fit still use am as knowledge.
Or, like de example wey we give before, raw data like inquiry and assignment records fit be used immediately as training data without pre-processing.
So, natural language machine learning fit learn far more efficiently dan numerical machine learning.
Konklusion
If you compare am to how fast computers dey calculate numbers, de way large language models dey process natural language dey quite slow.
However, natural language machine learning dey allow for efficient learning, and e pass well well de difference between fast numerical calculation and slow natural language processing.
Wetin more, large language models, wey don make big progress through learning with numbers, dey seem to dey reach de end of how much dem fit improve their performance by just making dem bigger, according to scaling laws.
For such a situation, e dey very likely say pipo go start to focus on how to make things better through natural language machine learning.