Ducho is a newly proposed framework for extracting high-level features within multimodal-aware recommendation systems. It comprises three core modules: Dataset, Extractor, and Runner, allowing extensive customization through a configuration component. Practitioners and researchers can utilize Ducho to process and extract features from various modalities such as audio, visual, and textual data. The paper includes demonstrations of the framework’s capabilities in different extraction scenarios and sets the groundwork for future expansion, including support for various backends and coupling with low-level feature extraction methods.
Ducho is a framework aimed at extracting high-level multimodal features, facilitating support for researchers and practitioners in crafting advanced recommender systems.
The extraction pipeline is highly customizable, allowing users to configure modalities, sources of information, and extraction parameters to suit their needs.
Demonstrations of Ducho cover various scenarios, showcasing its ability to extract features from visual/textual, audio/textual, and textual interactions.
Future development plans for Ducho include broader backend support and a universal model interface for simplified multimodal feature extraction.
Collection
[
|
...
]