Skip Go Content
AI translate this article from Japanese
Read in Japanese
This article dey for Public Domain (CC0). Feel free to use am anyhow you like. CC0 1.0 Universal

Automated Presentation Video Generation from Blog Posts

https://youtu.be/vmt_WVBJMj4?si=OZlzEqfEvWjPakYV

I build one system wey dey use generative AI to automatically make presentation videos from blog articles and upload dem to YouTube.

With some smartness, generative AI no just fit make presentation story, but e fit also create di presentation materials.

Still, if generative AI create script for di presentation and den one text-to-speech generative AI read di script out loud, audio data fit also be generated.

When you combine di presentation materials and audio data, e go produce one video.

By automating dis series of tasks, I don make am possible to automatically generate presentation videos with just one click.

How E Dey Work

Di most important part of di whole process na how di presentation materials dey generated.

Generative AI sabi make images well well, but na mainly for photos or drawings. To create documents wey get plenty text and figures, like presentation materials, na hard work for AIs wey dey generate images.

Because of dat, I dey generate text and figure-focused materials for one text-based format, just like programming language.

Plenty formats fit dey used to create such materials.

At first, I try Marp, na format wey dem make specially for presentation. But di things wey e fit do small. So, I decide to use di more common SVG format, wey be for vector graphics.

With a text-based format like SVG, one normal chat-based generative AI fit create di materials if you tell am say, "Abeg, make presentation materials for SVG format wey go show di content of dis blog article."

Text Overflow Problem

Di wahala here na say, text dey always pass di outer frame of di document or di frames of di figures inside di document.

If na human, as e see di document finish, e go immediately notice text overflow. But, to detect text overflow for di SVG text stage, instead of just looking at di finished document, na hard thing.

Because of dis, chat-based generative AI dey always produce documents wey get plenty text overflow.

Of course, di AI dey generate most of di content well, and I fit just correct di text overflow by hand. But dis one go mean say I go dey do manual work every time.

So, e come necessary to put things in place to prevent text overflow when SVG documents dey generated, and to develop one way to automatically detect if any text overflow dey present for di generated SVG.

To prevent text overflow, I decide to give di generative AI basic rules, how to do di work, and warnings when I dey instruct am to create presentation materials.

As rules, I tell am say make e no use complex figures and make e fix di font size of di text.

Furthermore, I tell am to follow one procedure: count di number of characters for one sentence inside di document, multiply dat by di font size to estimate di width and height, and den pre-confirm say di text no go pass di frame or figures.

During dis process, I tell di AI to record di checked process and results as pre-check comments inside di SVG file.

Adding dis instructions make things small better, but di accuracy for di beginning no satisfy me. So, I repeatedly generate different variations, add common error points as warnings to di instructions, and emphasize rules and instructions repeatedly inside di prompt text if dem no follow dem.

By doing dis prompt improvements over and over, text overflow fit be controlled to some extent.

However, even with all dis effort, perfection no possible, so I decide to do one check for later stage.

For dis post-generation check, I try to use a generative AI wey fit visually inspect images, but e no fit detect text overflow effectively, so I stop dat approach.

Next, I try anoda method: putting di SVG text back into a chat-based generative AI for checking.

Dis method better for detecting text overflow pass di visual inspection AI, but its detection accuracy still no too high. Here again, by iteratively improving di instructions for detecting overflow, I fit achieve a certain level of accuracy, but not a perfect one.

Therefore, I decide to create one program to detect text overflow more carefully. Dis program checks if di text overflows di document frame or internal figures by calculating di width and height from di length of di sentences and di font size for di presentation materials, as I instruct di generative AI.

Creating dis program take plenty effort, but e finally come fit detect accurately.

Apart from text overflow, sometimes di AI go try to create complex charts and produce distorted outputs. For such situations, I still dey use di approach of making di chat-based generative AI perform a rule violation check.

Dis check determines if di AI created figures wey complex pass wetin dem define for di rules, marking dem as unacceptable.

With dis program for overflow checking and di generative AI for rule violation checking, problems fit now largely be detected.

Wetin Go Happen Next

If dem see any rejection during dis checks, dem go throw away di SVG-format material wey dem generate and make am again. Dis na because if you dey point out and correct areas wey get problem, e dey always lead to oda issues, and e go just waste more time for di end.

Once one presentation material wey no get text overflow finish, di next step na to give dis material and di original blog article to di generative AI to create di narration script. No special smartness needed here.

Den, dem go change di narration script into audio data using a text-to-speech generative AI. Again, no special techniques needed for dis one.

Finally, dem go change di SVG-format presentation material into PNG images, and den, using one tool wey dem dey call ffmpeg, dem go change am into an mp4 video with audio. Dis one go finish di process.

Di series of processes after dem don create SVG-format slides fit easily be automated by writing programs while asking generative AI for advice.

Conclusyon

As I don finish build and perfect dis automated presentation video generation system by myself, I begin to publish videos on YouTube last week.

But, shortly after dis system finished, Google own NotebookLM, one AI tool, also get similar feature for automatically generating videos to explain text documents.

So, e dey expected say for future, companies wey dey offer AI services go release similar services, wey go make am unnecessary for individuals to build such systems from scratch.

Nonetheless, to develop one practical program wey dey use generative AI for serious way don be a big achievement, e allow me to understand di main principles of using generative AI effectively.