Are you familiar with GitHub, the web service that has been used as a collaborative development platform by open-source software developers?
In recent years, its use as a collaborative workspace has expanded beyond open-source software to include corporate software development and even non-software-related applications.
I also use GitHub to manage my own programs and the drafts of the articles I write for this blog.
In this article, I will explore the possibility that GitHub's use will increasingly extend beyond software, becoming a shared space for open knowledge.
Wiki Site Generation by DeepWiki
Many software development tools using generative AI are designed to assist human programmers. In these tools, humans write the program, and AI provides support.
However, a new type of software development tool is emerging where humans only give instructions, and generative AI takes on the task of creating the program.
One such pioneering tool that garnered attention is Devin. Some have said that introducing Devin is like adding another programmer to the development team. While it is still said that human engineers need to provide detailed support for effective use, such data will undoubtedly be collected and used for further improvements.
The era where a typical software development team consists of one human and AI programmers like Devin as team members is fast approaching.
Cognition, the developer of Devin, has also released a service called DeepWiki.
DeepWiki is a service that automatically generates a wiki site for each software development project on GitHub. This means that an AI like Devin reads and analyzes all programs and related documents of a project, and then creates all the documentation and design specifications.
Cognition reportedly generated wiki sites for over 50,000 of the top major public software development projects on GitHub, which are freely accessible to anyone.
Since these are public projects, there is no issue with doing so. Although wiki sites can be generated automatically, it must have involved numerous generative AIs running at full capacity for an extended period, incurring considerable costs.
By Cognition bearing these costs, a vast number of public projects benefited by acquiring documentation and design specifications for free.
If statistical data shows that these wiki sites are useful for public projects and have a significant impact on quality and productivity improvement, software development companies will likely adopt DeepWiki for their own projects.
Cognition must have invested in generating wiki sites for numerous public projects, believing that this would happen. This demonstrates Cognition's confidence in DeepWiki. And if DeepWiki is adopted, Devin will automatically follow, significantly accelerating the popularization of AI programmers.
GitHub as a Document Sharing Platform
GitHub has become a popular and de facto standard web service for sharing, collaboratively editing, and storing programs for open-source software development.
In recent years, its robust management and security features for enterprises have led to its commonplace use by advanced software development companies.
As a result, GitHub often carries the image of a web service primarily for program storage and sharing. However, in reality, it allows for sharing, collaborative editing, and storage of various documents and materials, completely unrelated to programs.
For this reason, many people use GitHub to manage documents they wish to collaboratively edit widely. These documents can be related to software or entirely unrelated.
Furthermore, blogs and websites are also documents that contain a type of program or are structured by programs to be published.
Therefore, it is not uncommon for individuals and companies to store blog and website content, along with programs for presentation and automatic site generation, together as a single GitHub project.
It is also possible to make such blog and website content public GitHub projects to enable collaborative editing.
Recently, in addition to using generative AI for software development, it is increasingly common to embed generative AI functionalities directly into software.
In such cases, detailed instructions for the generative AI, called prompts, are embedded within the program.
These prompts can also be considered a type of document.
Intellectual Factory
Although I am a software engineer, I also write articles for my blog.
While I want many people to read them, increasing the number of readers is quite challenging.
Of course, I could consider creating articles to attract attention or directly contacting various influential people for advice, putting in effort and ingenuity.
However, considering my personality and the effort and stress involved, I am not enthusiastic about aggressive promotion. Moreover, spending time on such activities would divert time from the core aspects of my work: creating programs, thinking, and writing documents.
Therefore, I recently decided to try a "multimedia" or "omnichannel" strategy to expand the reach of my blog articles by deploying them across various content formats.
Specifically, this involves translating Japanese articles into English and posting them on an English blog site, and creating presentation videos to explain articles and publishing them on YouTube.
Furthermore, beyond publishing on general blog services, I am also considering creating my own blog site with an index of my past articles by category and linking related articles.
If I were to create all these manually each time a new article is added, it would defeat the purpose. Therefore, all tasks except writing the initial Japanese article are automated using generative AI. I call this an Intellectual Factory.
I need to develop programs to realize this system.
Currently, I have already created programs that can fully automate translation, presentation video generation, and YouTube uploads.
Now, I am in the process of creating basic programs for categorizing and linking existing blog articles.
Once that is complete, and I create a program to generate my custom blog site and automatically deploy it to a web server, the initial concept of my Intellectual Factory will be fully realized.
Intellectual Factory in a Broader Sense
The drafts of my blog articles, which serve as raw materials for this Intellectual Factory, are also managed as GitHub projects. Currently, they are not publicly disclosed as private projects, but I am considering making them public projects in the future, along with the Intellectual Factory's programs.
Furthermore, the categorization of blog articles, the linking of articles, and the video explanations of blog articles that I am currently developing share the same underlying concept as DeepWiki.
Using generative AI, original creative works are used as raw materials to produce various content. In addition, information and knowledge within this content can be connected to create what can be called a knowledge base.
The only difference lies in whether the raw material is a program or a blog article. And for DeepWiki and my Intellectual Factory, powered by generative AI, this difference is largely insignificant.
In other words, if the term "Intellectual Factory" is interpreted in a general, broader sense, not limited to my specific programs, DeepWiki is also a type of Intellectual Factory.
Moreover, what an Intellectual Factory produces is not limited to translated articles in other languages, presentation videos, or self-made blog and wiki sites.
It will likely be capable of converting content into every conceivable medium and format, such as short videos, tweets, manga and anime, podcasts, and e-books.
Furthermore, the content within these media and formats can also be diversified to suit various audiences, including broader multilingualization, versions for experts or beginners, and versions for adults or children.
Ultimately, even on-demand generation of customized content will be possible.
GitHub as an Intellectual Mine
The raw materials for an Intellectual Factory can, in principle, be stored anywhere.
However, considering that GitHub has become the de facto standard for sharing, collaborative editing, and storing programs for open-source projects, and that various people—not just myself—use GitHub as a document storage location, it becomes apparent that GitHub has the potential to become the primary source of raw materials for Intellectual Factories.
In other words, GitHub will become an Intellectual Mine shared by humanity, supplying raw materials to Intellectual Factories.
The term "shared by humanity" here echoes the idea that open-source projects are a shared software asset for humanity.
The open-source philosophy that has underpinned GitHub will also fit well with the concept of open documents.
Furthermore, a culture of managing copyright information and licenses for each document, similar to programs, could emerge. Content automatically generated from source documents can easily be assigned the same license or comply with rules stipulated by the license.
From the perspective of creating an Intellectual Factory, the consolidation of raw material documents on GitHub is ideal.
This offers two advantages: the benefit of development efficiency, as it simply requires connecting GitHub to the Intellectual Factory, and the ability to effectively demonstrate the functions and performance of one's own Intellectual Factory to publicly available documents, much like DeepWiki.
In the future, as various Intellectual Factories are developed and connected to GitHub, and more individuals and companies manage documents on GitHub for processing by Intellectual Factories, GitHub's position as an Intellectual Mine should become firmly established.
Humanity's Shared Public Knowledge Base
With GitHub at the core, serving as an Intellectual Mine, and Intellectual Factories producing a wide variety of content and knowledge bases, this entire ecosystem will create a public knowledge base shared by humanity.
Moreover, this will be a dynamic, real-time knowledge base that automatically expands as the number of documents published on GitHub increases.
While this complex, enormous knowledge base, containing vast amounts of knowledge, will be beneficial to humans, fully extracting its potential value will likely be challenging for us.
However, AI will be able to fully leverage this publicly shared knowledge base of humanity.
Veins of Public Knowledge
When such an ecosystem is realized, various public information will naturally converge on GitHub.
This will not be limited to drafts of personal blogs or corporate websites.
Academic insights and data, such as pre-print papers, research ideas, experimental data, and survey results, will also gather there.
This will attract not only those who wish to contribute knowledge, ideas, and data for the benefit of all humanity, but also those who seek to rapidly disseminate discoveries to gain recognition.
Even academics and researchers may find value in having their work validated for validity, novelty, and impact by AI, expressed through various content formats, and recognized by "going viral," rather than waiting for the lengthy, time-consuming peer-review process for papers.
Alternatively, if their work catches the attention of other researchers or companies in this manner, leading to collaborative research or funding, there are tangible benefits.
Furthermore, there will be a recirculation of AI's own knowledge.
While generative AI acquires vast amounts of knowledge through pre-training, it does not actively learn by exploring unexpected connections or similar structures among that enormous body of knowledge.
The same applies to new insights that emerge from connecting different pieces of knowledge.
On the other hand, when discussing such similarities and connections with a pre-trained generative AI, it can evaluate their value quite accurately.
Therefore, by inputting various pieces of knowledge to generative AI, comparing them randomly or exhaustively, it is possible to discover unexpected similarities and valuable connections.
Of course, given the immense number of combinations, covering everything is impractical. However, by appropriately streamlining and automating this process, it becomes possible to automatically unearth useful knowledge from existing knowledge.
By achieving such automatic knowledge discovery and storing the discovered knowledge on GitHub, this loop could seemingly continue indefinitely.
Thus, within this Intellectual Mine, numerous undiscovered veins exist, and it will become possible to excavate them.
Conclusion
As a de facto standard, humanity's shared knowledge base, like GitHub, takes shape in this manner, it will likely be utilized for pre-training generative AI and for knowledge retrieval mechanisms such as RAG.
In such a scenario, GitHub itself will function like a massive cerebrum. Generative AIs will then share this cerebrum, distributing and expanding knowledge.
The knowledge additionally recorded there will not merely be factual records, new data, or classifications. It will also include knowledge that acts as a catalyst, promoting the discovery of other knowledge and new combinations.
I refer to such knowledge with a catalytic effect as an Intellectual Crystal, or a crystal of knowledge. This includes, for example, new frameworks of thought.
When frameworks are newly discovered or developed, and Intellectual Crystals are added, their catalytic effect enables new combinations and structuring of knowledge that were previously impossible, leading to an increase in new knowledge.
Sometimes, these may contain yet another Intellectual Crystal, which then further amplifies the knowledge.
This type of knowledge is closer to mathematical inquiry, engineering development, or invention, rather than scientific discovery. Therefore, it is knowledge that grows purely through thought, rather than through new observational facts like scientific knowledge.
And GitHub, as an Intellectual Mine, along with countless generative AIs utilizing it, will accelerate the growth of such knowledge.
This rapidly discovered knowledge, far exceeding the pace of human-scale discovery, will be provided in an easily understandable format by Intellectual Factories.
In this way, knowledge that can be explored purely through thought will be rapidly unearthed.