Generating Audio and Video

When it comes to generating audio and videos, one example is probably worth more than a thousand words. Check out the following podcast episode covering one of my papers. I would argue that the quality matches many human-generated podcasts. All that was needed was to upload a PDF file of the paper, press “generate”, and wait for five minutes. This opens new opportunities for disseminating your research findings.

Such tools can also serve as an additional feedback mechanism to improve your writing. If the generated summary doesn‘t reflect your main ideas well, it could be that the model is just not good enough—but it could also indicate opportunities to make your original materials clearer.

For more flexibility, you could use text-to-voice generating services such as ElevenLabs. This would allow you to generate a script using your preferred LLM, e.g. ChatGPT or Claude, edit it to ensure quality and message clarity, and then generate high-quality narration with ElevenLabs tools. You can even use a clone of your own voice! Here’s how my cloned voice sounds:

Video generation is also possible through services like Synthesia, as you may have noticed in the introductory video to this course. This, however, remains prohibitively expensive to use on a large scale.

The technology even allows for generating your own video clone, though I would say the results aren’t yet realistic enough for professional use. However, given the pace of development in this field, that will likely change in the future.