Seoul National University of Science and Technology Researchers Propose PV2DOC: A Tool to Summarize Presentation Videos into Structured Documents

30.12.24 14:36 Uhr

The software converts presentation videos into searchable, well-organized PDFs with text summaries and relevant images

SEOUL, South Korea, Dec. 30, 2024 /PRNewswire/ -- You have likely encountered presentation-style videos that combine slides and spoken explanations. These videos have become a widely used medium of delivering information, particularly after the COVID-19 pandemic when stay-at-home measures were implemented. While videos are an engaging way to access content, they have significant drawbacks, such as being time-consuming and requiring considerable storage space due to their large file size.

Researchers led by Professor Hyuk-Yoon Kwon at Seoul National University of Science and Technology in South Korea aimed to address these issues with PV2DOC, a software tool that converts presentation videos into summarized documents. Unlike other video summarizers, which require a transcript alongside the video and become ineffective when only the video is available, PV2DOC overcomes this limitation by combining both visual and audio data and converting video into documents.

This paper was made available online on October 11, 2024, and was published in Volume 28 of the journal SoftwareX on December 1, 2024.

"For users who need to watch and study numerous videos, such as lectures or conference presentations, PV2DOC generates summarized reports that can be read within two minutes. Additionally, PV2DOC manages figures and tables separately, connecting them to the summarized content so users can refer to them when needed," explains Prof. Kwon.

For image processing, PV2DOC extracts frames from the video at one-second intervals and uses the structural similarity index method to compare each frame with the previous one and identify unique frames. Objects in each frame, such as figures, tables, graphs, and equations, are then detected by object detection models, Mask R-CNN and YOLOv5. During this process, some images may become fragmented due to whitespace or sub-figures. To resolve this, PV2DOC uses a figure merge technique that identifies overlapping areas and combines them into a single figure, then applies optical character recognition (OCR) using the Google Tesseract engine to extract text from the images. The extracted text is then organized into a structured format, such as headings and paragraphs.

Simultaneously, PV2DOC extracts the audio from the video and uses the Whisper model, an open-source speech-to-text tool, to convert it into written text. The transcribed text is then summarized using the TextRank algorithm, creating a summary of the main points. The extracted images and text are combined into a Markdown document, which can be turned into a PDF file. The final document presents the video's content—such as text, figures, and formulas—in a clear and organized way, following the structure of the original video.

By converting unorganized video data into structured, searchable documents, PV2DOC enhances the accessibility of the video and reduces the storage space needed for sharing and storing the video. "This software simplifies data storage and facilitates data analysis for presentation videos by transforming unstructured data into a structured format, offering better information access and data management of presentation videos," says Prof. Kwon.

The researchers plan to further streamline video content into accessible formats. Their next goal is to train a large language model, similar to ChatGPT, to offer a question-answering service, where users can ask questions based on the content of the videos, with the model generating accurate, contextually relevant answers.

Reference
Title of original paper: PV2DOC: Converting the presentation video into the summarized document

Journal: SoftwareX

DOI: 10.1016/j.softx.2024.101922

About the institute Seoul National University of Science and Technology (SEOULTECH)
Website: https://en.seoultech.ac.kr/

Media Contact:
Eunhee Lim
82-2-970-9166
388109@email4pr.com

Cision View original content to download multimedia:https://www.prnewswire.com/news-releases/seoul-national-university-of-science-and-technology-researchers-propose-pv2doc-a-tool-to-summarize-presentation-videos-into-structured-documents-302340155.html

SOURCE Seoul National University of Science and Technology