Research Goal
Published:
The broad objective of my research is to enhance the inclusion of technology for processing human languages by learning from limited labeled data.
In the modern era of artificial intelligence (AI), developing natural language processing (NLP) systems require large-scale annotated data. However, it is unfortunate that most largescale labeled datasets are only available in a handful of domains and languages; for the vast majority of domains and languages, either a few or no annotations are available to empower automated NLP applications. Hence, one of the focuses of low-resource NLP research is to learn language representation by leveraging resource-rich domain or language corpora and utilize them in low-resource applications. Representation learning has emerged as an indispensable ingredient for natural language understanding. It is utilized to learn notions, such as meanings of words, how the words are combined to form a concept, how concepts are related to a specific NLP task, etc. However, many crucial research questions, including how to bridge the gap between languages (or domains) to learn universal language representations, how well does such representation transfer across languages or domains, and how to utilize learning signals from multiple related tasks or unlabeled resources to learn generalizable representations; remains mostly unsolved. My Ph.D. dissertation seeks to investigate new approaches to learn language representations that can be transferred across languages and domains. My research will benefit billions of users whose native language is resource-scarce and facilitate text processing in essential domains such as public health, scientific literature, security and privacy, in which annotations are expensive.