
Multimdoal
Encoder
Visualizer
Video Processing
Explanation
App
Demonstration
Above is a video explaining and demonstrating the web application I developed for the research at DSLab that was sponsored by ETRI (Electronics and Telecommunications Research Institute) for the project on Development of Previsional Intelligence Based on Long-term Visual Memory Network.
​
We did a research on multimodal video analysis for social forecasting on famous individuals using the video dataset that we developed. The research is about predicting whether a video would generate a spike in Google Search Volume data of the famous individuals appearing by such analysis. This tool visually demonstrates how the extraction of multimodal features is performed on a video.
Video dataset


Pie chart of the types of famous individuals
LLM Baseline results
Prior to switching to multimodal based inputs, I gave in video transcripts to an LLM (Gemini flash 2.5) to predict public salience of the public figures so that we could verify the feasibility of the research task.

