Internet of Things (IoT) provides streaming, large-amount, and multimodal data (e.g., natural language, speech, image, video, audio, virtual reality, WiFi, GPS, RFID, vibration) over time. The statistical properties of these data are often significantly different by sensing modalities and temporal traits, which are hardly captured by conventional learning methods. Continual and multimodal learning allows integration, adaptation and generalization of the knowledge learnt from previous experiential data collected with heterogeneity to new situations. Therefore, continual and multimodal learning is an important step to improve the estimation, utilization, and security of real-world data from IoT devices.
This workshop aims to explore the intersection and combination of continual machine learning and multimodal modeling with applications in Internet of Things. The workshop welcomes works addressing these issues in different applications and domains, such as natural language processing, computer vision, human-centric sensing, smart cities, health, etc. We aim at bringing together researchers from different areas to establish a multidisciplinary community and share the latest research.
We focus on the novel learning methods that can be applied on streaming multimodal data:
We also welcome continual learning methods that target:
Novel applications or interfaces on streaming multimodal data are also related topics.
As examples, the data modalities include but not limited to: natural language, speech, image, video, audio, virtual reality, biochemistry, WiFi, GPS, RFID, vibration, accelerometer, pressure, temperature, humidity, etc.
Please submit papers using the the IJCAI author kit. We invite papers of varying length from 2 to 6 pages, plus additional pages for the reference; i.e., the reference page(s) are not counted to the limit of 6 pages. The reviewing process is double-blind. The qualified accepted papers will be invited to be extended for a journal submission at Frontiers in Big Data.
Abstract: Graph-structured data are ubiquitous, which have been extensively used in many real-world applications. In this talk, I will present our recent work on graph representation learning with applications in multiple domains. First, we leverage the line graph theory and propose novel graph neural networks, which jointly learn embeddings for both nodes and edges. Second, we investigate how to incorporate commonsense and domain knowledge to graph representation learning, and present several applications in computer vision, natural language processing, and recommender systems. Finally, future work on knowledge-guided graph representation learning will also be discussed.
Bio: Dr. Sheng Li is an Assistant Professor of Computer Science at the University of Georgia (UGA). Before joining UGA in 2018, he was a Data Scientist at Adobe Research. He obtained his Ph.D. degree in computer engineering from Northeastern University in 2017. Dr. Li's research interests include graph-based machine learning, visual intelligence, user modeling, causal inference, and trustworthy artificial intelligence. He has published over 100 papers at peer-reviewed conferences and journals, and has received over 10 research awards, such as the INNS Young Investigator Award, M. G. Michael Award, Adobe Data Science Research Award, Cisco Faculty Award, and SIAM SDM Best Paper Award. He has served as Associate Editor of seven international journals such as IEEE Transactions on Circuits and Systems for Video Technology and IEEE Computational Intelligence Magazine, as an Area Chair of ICLR and ICPR, and as a Senior Program Committee member of AAAI and IJCAI. He is a senior member of IEEE.
Abstract: With the advent of models such as OpenAI CLIP and DALL-E, transformer-based vision-and-language pre-training has become an increasingly hot research topic. In this talk, I will share some of our recent work in this direction and try to answer the following questions. First, how to perform vision-and-language pre-training? Second, how to enhance the performance of pre-trained models via adversarial training? Third, how robust are these pre-trained models? And finally, how can we extend image-text pre-training to video-text pre-training? Accordingly, I will present UNITER, VILLA, Adversarial VQA, HERO, and ClipBERT to answer these questions. At last, I will also briefly discuss the challenges and future directions for vision-and-language pre-training.
Bio: Dr. Zhe Gan is a Principal Researcher at Microsoft. He received the PhD degree from Duke University in 2018. Before that, he received the Master’s and Bachelor’s degree from Peking University in 2013 and 2010, respectively. His current research interests include vision-and-language representation learning, self-supervised pre-training, and adversarial machine learning. He received the Best Student Paper Honorable Mention Award at CVPR 2021 and WACV 2021, and Outstanding Senior Program Committee Member Award at AAAI 2020. He has been regularly serving as an Area Chair for NeurIPS, ICML, ICLR, ACL, and AAAI.
Copyright © All Rights Reserved | This template is made with by Colorlib