Persistent URL of this record https://hdl.handle.net/1887/3391031
Documents
-
- Full Text
- under embargo until 2026-06-24
-
- Download
- Title Pages_Contents
- open access
-
- Chapter 2
- under embargo until 2026-06-24
-
- Download
- Bibliography
- open access
-
- Download
- Summary in English
- open access
-
- Download
- Summary in Dutch
- open access
-
- Download
- Curriculum Vitae_Acknowledgements
- open access
-
- Download
- Propositions
- open access
In Collections
This item can be found in the following collections:
Multi modal representation learning and cross-modal semantic matching
also referred to as different channels of information or modals. Considering multiple channels of information, at the same time, is referred to as multimodal and the input as multimedia. By their very nature, multimedia data are complex and often involve intertwined instances of different kinds of information. We can leverage this multimodal perspective to extract meaning and understanding of the
world. This is comparable to how our brain processes these multiple channels, we learn how to combine and extract meaningful information from them. In this thesis, the learning is done by computer programs and smart algorithms. This is referred to as artificial intelligence. To that end, in this thesis, we have studied multimedia information, with a focus on vision and language information representation for semantic...Show moreHumans perceive the real world through their sensory organs: vision, taste, hearing, smell, and touch. In terms of information, we consider these different modes
also referred to as different channels of information or modals. Considering multiple channels of information, at the same time, is referred to as multimodal and the input as multimedia. By their very nature, multimedia data are complex and often involve intertwined instances of different kinds of information. We can leverage this multimodal perspective to extract meaning and understanding of the
world. This is comparable to how our brain processes these multiple channels, we learn how to combine and extract meaningful information from them. In this thesis, the learning is done by computer programs and smart algorithms. This is referred to as artificial intelligence. To that end, in this thesis, we have studied multimedia information, with a focus on vision and language information representation for semantic mapping. The aims of the semantic mapping learning in this thesis are: (1) visually supervised word embedding learning; (2) fine-grained label
learning for vision representation; (3) kernel-based transformation for image and text association; (4) visual representation learning via a cross-modal contrastive
learning framework.
Show less
- All authors
- Wang, X.
- Supervisor
- Verbeek, F.J.
- Co-supervisor
- Du, Y.; Verberne, S.
- Committee
- Plaat, A.; Mentens, N.; Lew, M.S.; Trautman, H.; Guo, Y.
- Qualification
- Doctor (dr.)
- Awarding Institution
- Leiden Institute of Advanced Computer Science (LIACS), Faculty of Science, Leiden University
- Date
- 2022-06-24
- ISBN (print)
- 9789464217773