Are you familiar with GitHub, the web service that has been used as a collaborative development platform among open-source software developers?
In recent years, its use as a platform for collaborative work has expanded, not only for open-source software but also for corporate software development and even for non-software-related purposes.
I also use GitHub to manage my own programs and the drafts of articles I write for this blog.
In this article, I will explore the possibility that GitHub's use will increasingly extend beyond software development in the future, becoming a place for open knowledge sharing.
Wiki Site Generation by DeepWiki
Many software development tools that use generative AI are designed to assist human programming tasks. Humans write the programs, and AI provides support.
On the other hand, a new type of software development tool is emerging where humans only give instructions, and generative AI takes over the task of creating programs.
Devin is one such tool that became a pioneer and garnered attention. Some people even said that introducing Devin was like adding one more programmer to the development team. Although it is still said that human engineers need to provide detailed support for it to be used effectively, such data will surely be collected and used for improvement.
The era where software development teams consisting of one human and AI programmers like Devin as team members become commonplace is just around the corner.
Cognition, the developer of Devin, has also released a service called DeepWiki.
DeepWiki is a service that automatically generates a wiki site for each software development project on GitHub. This means that an AI, similar to Devin, reads and analyzes all programs and related documents of that project and creates all manuals and design documents.
Cognition reportedly created wiki sites for over 50,000 major public software development projects on GitHub that are freely accessible to anyone, using DeepWiki.
Since these are public projects, there is absolutely no problem with doing so. Although wiki sites can be generated automatically, it must have required numerous generative AIs to run at full capacity for a long period, and the cost must have been considerable.
By bearing these costs, Cognition has provided a great benefit to a vast number of public projects, allowing them to obtain explanations and design documents for free.
If statistical data shows that these wiki sites are useful for each public project and have a significant effect on improving quality and productivity, then software development companies will adopt DeepWiki for their own projects.
Cognition must have invested in generating wiki sites for a vast number of public projects, believing that this could happen. This demonstrates Cognition's confidence in DeepWiki. And when DeepWiki is adopted, Devin will automatically follow, significantly increasing the likelihood of widespread adoption of AI programmers.
GitHub as a Document Sharing Platform
GitHub has become a popular and de facto standard web service for sharing, co-editing, and storing programs for open-source software development.
In recent years, its management and security features for enterprises have been enhanced, making it a common tool in advanced companies that develop software.
For this reason, GitHub strongly evokes the image of a web service for storing and sharing programs. However, in reality, it can be used to share, co-edit, and store various documents and materials, completely unrelated to programs.
Therefore, not a few people use GitHub to manage documents they wish to co-edit broadly. These can be documents related to software or entirely unrelated ones.
Moreover, blogs and websites are also documents that contain a type of program or are structured and published by programs.
Because of this, it is not uncommon for individuals and companies to store the content of blogs and websites, along with the programs that make them easy to view and the programs for automatic site generation, together as a single project on GitHub.
It is also possible to make such blogs and websites public projects on GitHub for co-editing their content.
Furthermore, recently, generative AI is not only used for software development but also often integrated into software.
In this case, instruction sentences called prompts, which give detailed instructions to the generative AI, are embedded within the programs.
These prompts can also be considered a type of document.
Intellectual Factory
Although I am a software development engineer, I also write articles for my blog.
While I want many people to read them, it is quite difficult to increase the number of readers.
Of course, one could consider creating articles to garner attention or actively contacting influential individuals for advice, among other efforts and ingenuity.
However, considering my personality and the effort and stress involved, I am reluctant to engage in aggressive promotion. Furthermore, spending time on such activities would detract from the core of my work, which involves programming, contemplating ideas, and documenting them.
Therefore, I recently decided to try a strategy known as multimedia or omnichannel, which involves expanding the reach of my blog posts by developing them into various forms of content.
Specifically, this includes translating Japanese articles into English and posting them on English blog sites, and creating presentation videos to explain articles and publishing them on YouTube.
Furthermore, in addition to publishing on general blog services, I am also considering creating my own blog site that lists and categorizes my past blog posts and links related articles.
If I were to spend time creating these every time a new article is written, it would be counterproductive. Therefore, all tasks other than writing the initial Japanese article are automated using generative AI. I call this an intellectual factory.
I need to develop programs to implement this mechanism.
Currently, I have already created programs that can fully automate translation, presentation video generation, and uploading to YouTube.
I am now in the process of creating basic programs for categorizing and linking existing blog posts.
Once that is complete, and I create a program to generate my own blog site and automatically reflect it on a web server, the initial concept of my intellectual factory will be complete.
Intellectual Factory in a Broad Sense
The drafts of my blog posts, which serve as raw material for this intellectual factory, are also managed as a GitHub project. For now, they are private and not publicly available, but I am considering making them public projects along with the intellectual factory programs in the future.
And the categorization of blog posts, the linking of articles, and the explanation of video-transformed blog posts, which I am currently developing, share the same underlying concept as DeepWiki.
Using generative AI, various contents are produced from original creative works as raw materials. In addition, it can connect information and knowledge within them, effectively creating a knowledge base.
The only difference is whether the raw material is a program or a blog post. And for DeepWiki and my intellectual factory powered by generative AI, that difference is almost meaningless.
In other words, if the term "intellectual factory" is interpreted in a general, broader sense, not limited to my program, then DeepWiki is also a type of intellectual factory.
And what intellectual factories produce is not limited to translated articles in other languages, presentation videos, self-made blog sites, or wiki sites.
They will likely be able to convert content into every conceivable medium and format, such as short videos, tweets, comics, animation, podcasts, and e-books.
Furthermore, the content within these media and formats can also be diversified to suit the recipient, such as broader multi-language support, versions for experts or beginners, and versions for adults or children.
Moreover, even on-demand generation of customized content is achievable.
GitHub as an Intellectual Mine
The raw materials for an intellectual factory can fundamentally be located anywhere.
However, considering that GitHub has become the de facto standard for sharing, co-editing, and storing open-source project programs, and that many people, not just myself, use GitHub as a document storage location, it becomes apparent that GitHub has the potential to become a primary source of raw materials for intellectual factories.
In other words, GitHub will become a shared intellectual mine for humanity, supplying raw materials to intellectual factories.
The term "shared by humanity" here echoes the idea that open-source projects are a shared software asset for humanity.
The open-source philosophy that has supported GitHub will also fit well with the concept of open documents.
Furthermore, a culture of managing copyright information and licenses for each document, similar to programs, could emerge. Content automatically generated from source documents can be easily assigned the same license, or comply with rules stipulated by the license.
From the perspective of developing an intellectual factory, having the raw material documents centralized on GitHub is ideal.
This offers two benefits: improved development efficiency by simply connecting GitHub with the intellectual factory, and the ability to effectively demonstrate the functions and performance of one's own intellectual factory using publicly available documents, similar to DeepWiki.
In the future, as various intellectual factories are developed and become connectable to GitHub, and as more people and companies manage documents on GitHub and process them with intellectual factories, GitHub's position as an intellectual mine should become firmly established.
Humanity's Shared Public Knowledge Base
With GitHub at the center as an intellectual mine, and various contents and knowledge bases produced by intellectual factories, this entire ecosystem will create a public knowledge base shared by humanity.
Moreover, it is a dynamic and real-time knowledge base that will automatically expand as the number of documents published on GitHub increases.
While this vast and complex knowledge base, containing immense knowledge, will be useful to humans, it will be difficult to fully extract its potential value.
However, AI will be able to fully utilize this public knowledge base, shared by all of humanity.
Veins of Public Knowledge
If such an ecosystem is realized, various public information will naturally converge on GitHub.
This won't be limited to drafts of personal blogs or corporate websites.
Academic insights and data, such as pre-publication papers and research ideas, experimental data, and survey results, will also accumulate.
This will attract not only those who want to use knowledge, ideas, and data for the benefit of all humanity, but also those who wish to quickly disseminate their discoveries and gain recognition.
Even for scholars and researchers, many would find value in having the validity, novelty, and impact of their work verified by AI, expressed through various content, and recognized in a way that goes viral, without having to wait for the lengthy peer-review process.
Alternatively, if their work catches the eye of other researchers or companies in this manner, leading to collaborative research or funding, there are practical benefits too.
In addition, there will likely be a return flow of AI's own knowledge.
Generative AI acquires vast amounts of knowledge through pre-training, but it doesn't actively explore unexpected connections or similar structures between that vast knowledge during learning.
The same applies to new insights that emerge from connecting different pieces of knowledge.
On the other hand, when explaining such similarities and connections during conversations with a pre-trained generative AI, it can quite accurately assess their value.
Therefore, by randomly or exhaustively comparing and connecting various pieces of knowledge and inputting them into a generative AI, it is possible to discover unexpected similarities and valuable connections.
Of course, since there are an enormous number of combinations, it's unrealistic to cover all of them. However, by appropriately streamlining and automating this process, it becomes possible to automatically discover useful knowledge from existing knowledge.
By achieving such automatic knowledge discovery and storing the discovered knowledge on GitHub, it seems possible to repeat this loop indefinitely.
In this way, numerous undiscovered veins of knowledge exist within this intellectual mine, and it will become possible to excavate them.
Conclusion
As a de facto standard, shared human knowledge base like GitHub is established, it will likely be utilized for pre-training generative AI and for knowledge retrieval like RAG.
In that scenario, GitHub itself will function like a gigantic cerebrum. And generative AI will share this cerebrum, distributing and expanding knowledge while sharing it.
The knowledge additionally recorded there will not only include records of facts, new data, or classifications. It may also include catalytic knowledge that promotes the discovery of other knowledge or new combinations.
I call such knowledge with a catalytic effect "intellectual crystals" or "knowledge crystals." This includes, for example, new frameworks for thinking.
When a framework is newly discovered or developed and an intellectual crystal is added, its catalytic effect enables different combinations and structuring of knowledge than before, leading to the growth of new knowledge.
Among these, there may be other knowledge crystals. This, in turn, will further increase knowledge.
Such knowledge is not a scientific discovery but something closer to mathematical inquiry, engineering development, or invention. Therefore, it is knowledge that grows purely through thought, rather than through new observational facts like scientific knowledge.
And GitHub as an intellectual mine, along with countless generative AIs utilizing it, will accelerate the growth of such knowledge.
Knowledge discovered one after another at a pace far exceeding the human scale of discovery will be provided in a form that is easy for us to understand by knowledge factories.
In this way, knowledge that can be explored purely through thought will be rapidly excavated.