[paper] [code] [data] Abstract: Causality can help for human reasoning and decision making. We extract casuality(cause-effect pairs) from web corpus and build a web-scale causality network (CausalNet). Backed by this network, we propose a framework for commonsense causal reasoning.Show AllCite: Zhiyi Luo, Yuchen Sha, Kenny Q. Zhu, Seung-won Hwang, Zhongyuan Wang, "Commonsense Causal Reasoning between Short Texts", Proc. of 15th Int. Conf. on Principles of Knowledge Representation and Reasonging (KR'2016), Cape Town, South Africa.
[paper] [code] [data] Abstract: Predicting endings for narrative stories is a grand challenge for machine commonsense reasoning. The task requires accurate representation of the story semantics and structured logic knowledge. Pre-trained language models, such as BERT, made progress recently in this task by exploiting spurious statistical patterns in the test dataset, instead of “understanding” the stories per se. In this paper, we propose to improve the representation of stories by first simplifying the sentences to some key concepts and second modeling the latent relationship between the key ideas within the story. Such enhanced sentence representation, when used with pre-trained language models, makes substantial gains in prediction accuracy on the popular Story Cloze Test without utilizing the biased validation data.Show AllCite: Shanshan Huang, Kenny Q. Zhu*, Qianzi Liao, Libin Shen and Yinggong Zhao. Enhanced Story Representation by ConceptNet for Predicting Story Endings. the Proceedings of
CIKM 2020.
[paper] [code] [data] Abstract: Audio tagging aims at predicting sound events occurred in a recording. Traditional models require enormous laborious annotations, otherwise performance degeneration will be the norm. Therefore, we investigate robust audio tagging models in low-resource scenarios with the enhancement of knowledge graphs. Besides existing ontological knowledge, we further propose a semi-automatic approach that can construct temporal knowledge graphs on diverse domain-specific label sets. Moreover, we leverage a variant of relation-aware graph neural network, D-GCN, to combine the strength of the two knowledge types. Experiments on AudioSet and SONYC urban sound tagging datasets suggest the effectiveness of the introduced temporal knowledge, and the advantage of the combined KGs with D-GCN over single knowledge source.Show AllCite: Zhiling Zhang, Zelin Zhou, Haifeng Tang, Guangwei Li, Mengyue Wu*, Kenny Q Zhu*. Enriching Ontology with Temporal Commonsense for Low-Resource Audio Tagging. the Proceedings of
CIKM 2021.
[paper] [code] [data] Abstract: Convolutional neural networks (CNNs) have met great success in abstractive summarization, but they cannot effectively generate summaries of desired lengths. Because generated summaries are used in difference scenarios which may have space or length constraints, the ability to control the summary length in abstractive summarization is an important problem. In this paper, we propose an approach to constrain the summary length by extending a convolutional sequence to sequence model. The results show that this approach generates high-quality summaries with user defined length, and outperforms the baselines consistently in terms of ROUGE score, length variations and semantic similarity.Show AllCite: Yizhu Liu and Zhiyi Luo and Kenny Q. Zhu. Controlling Length in Abstractive Summarization Using a Convolutional Neural Network. In the Proceedings of
EMNLP 2018
[paper] [code] [data] Abstract: Product descriptions on e-commerce websites often suffer from missing important aspects. Clarification question generation (CQGen) can be a promising approach to help alleviate the problem. Unlike traditional QGen assuming the existence of answers in the context and generating questions accordingly, CQGen mimics user behaviors of asking for unstated information. The generated CQs can serve as a sanity check or proofreading to help e-commerce merchant to identify potential missing information before advertising their product, and improve consumer experience consequently. Due to the variety of possible user backgrounds and use cases, the information need can be quite diverse but also specific to a detailed topic, while previous works assume generating one CQ per context and the results tend to be generic. We thus propose the task of Diverse CQGen and also tackle the challenge of specificity.We propose a new model named KPCNet, which generates CQs with Keyword Prediction and Conditioning, to deal with the tasks. Automatic and human evaluation on 2 datasets (Home & Kitchen, Office) showed that KPCNet can generate more specific questions and promote better group-level diversity than several competing baselines.Show AllCite: Zhiling Zhang and Kenny Q. Zhu. Diverse and Specific Clarification Question Generation with Keywords. In Proceedings of
The Web Conference 2021 (WWW '21)
[paper] [code] [data] Abstract: Paraphrase generation plays key roles in NLP tasks such as question answering, machine translation, and information retrieval. In this paper, we propose a novel framework for paraphrase generation. It simultaneously decodes the output sentence using a pretrained wordset-to-sequence model and a round-trip translation model. We evaluate this framework on Quora, WikiAnswers, MSCOCO and Twitter, and show its advantage over previous state-of-the-art unsupervised methods and distantly-supervised methods by significant margins on all datasets. For Quora and WikiAnswers, our framework even performs better than some strongly supervised methods with domain adaptation. Further, we show that the generated paraphrases can be used to augment the training data for machine translation to achieve substantial improvements.Show AllCite: Zilu Guo, Zhongqiang Huang, Kenny Zhu, Guandan Chen, Kaibo Zhang, Boxing Chen and Fei Huang Automatic Paraphrasing via Sentence Reconstruction and Round-trip Translation. In Proceedings of
IJCAI 2021
[paper] [code] [data] Abstract: Abstractive summarization is useful in providing a summary or a digest of news or other web texts and enhancing users reading experience, especially when they are reading on small displays such as mobile phones. However, existing encoder-decoder summarization models have difficulty learning the latent alignment between source documents and summaries because of their vast disparity in length. In this paper, we propose a extractor-abstractor framework in which the keyword-based extractor selects a few sets of salient sentences from the input document and then the abstractor paraphrases these sets of sentences in parallel, which are more aligned to the summary, to generate the final summary. The new extractor and abstractor are pretrained from a set of ``pseudo summaries'' extracted by specially designed heuristics, and then further trained together in a reinforcement learning framework. The results show that the proposed model generates high-quality summaries with faster training speed and less training memory footprint, and outperforms the state-of-the-art models on CNN/Daily Mail, Webis-TLDR-17, Webis-Snippet-20, WikiHow and DUC-2002 datasets.Show AllCite: Yizhu Liu, Qi Jia and Kenny Q. Zhu. Keyword-aware Abstractive Summarization by Extracting Set-level Intermediate Summaries. In Proceedings of
The Web Conference 2021 (WWW '2021)
[paper] [code] [data] Abstract: Convolutional sequence to sequence (CNN seq2seq) models have met success in abstractive summarization. However, their outputs often contain repetitive word sequences and logical inconsistencies, limiting the practicality of their application. In this paper, we find the reasons behind the repetition problem in CNN-based abstractive summarization through observing the attention map between the summaries with repetition and their corresponding source documents and mitigate the repetition problem. We propose to reduce the repetition in summaries by attention filter mechanism (ATTF) and sentence-level backtracking decoder (SBD), which dynamically redistributes attention over the input sequence as the output sentences are generated. The ATTF can record previously attended locations in the source document directly and prevent the decoder from attending to these locations. The SBD prevents the decoder from generating similar sentences more than once via backtracking at test. The proposed model outperforms the baselines in terms of ROUGE score, repeatedness, and readability. The results show that this approach generates high-quality summaries with minimal repetition and makes the reading experience better.Show AllCite: Yizhu Liu, Xinyue Chen, Xusheng Luo and Kenny Q. Zhu. Reducing Repetition in Convoluaiton Abstractive Summarization.
Natural Language Engineering 2021
[paper] [code] [data] Abstract: Previous length-controllable summarization models mostly control lengths at the decoding stage, whereas the encoding or the selection of information from the source document is not sensitive to the designed length. They also tend to generate summaries as long as those in the training data. In this paper, we propose a length-aware attention mechanism (LAAM) to adapt the encoding of the source based on the desired length. Our approach works by training LAAM on a summary length balanced dataset built from the original training data, and then fine-tuning as usual. Results show that this approach is effective in generating high-quality summaries with desired lengths and even those short lengths never seen in the original training set.Show AllCite: Yizhu Liu, Qi Jia and Kenny Q. Zhu. Length Control in Abstractive Summarization by Pretraining Information Selection. In Proceedings of
ACL 2022
[paper] [code] [data] Abstract: A document can be summarized in a number of ways. Reference-based evaluation of summarization has been criticized for its inflexibility. In this paper, we propose a new automatic reference-free evaluation metric that compares semantic distribution between source document and summary by pretrained language models and considers summary compression ratio. The experiments show that this metric is more consistent with human evaluation in terms of coherence, consistency, relevance, fluency.Show AllCite: Yizhu Liu, Qi Jia and Kenny Q. Zhu. Reference-free Summarization Evaluation via Semantic Correlation and Compression Ratio. In Proceedings of
NAACL 2022
[paper] [code] [data] Abstract: Opinion summarization of multiple reviews suffers from the lack of reference summaries for training. Most previous approaches construct multiple reviews and their summary based on textual similarities between reviews, resulting in information mismatch between the review input and the summary. In this paper, we convert each review into a mix of structured and unstructured data, which we call opinionaspect pairs (OAs) and implicit sentences (ISs). We propose a new method to synthesize training pairs of such mix-structured data as input and the textual summary as output, and design a summarization model with OA encoder and IS encoder. Experiments show that our approach outperforms previous methods on Yelp, Amazon and RottenTomatos datasets.Show AllCite: Yizhu Liu, Qi Jia and Kenny Q. Zhu. Opinion Summarization by Weak-Supervision from Mix-structured Data. In Proceedings of
EMNLP 2022
[paper] [code] [data] Abstract: Matching question-answer relations between two turns in conversations is not only the first step in analyzing dialogue structures, but also valuable for training dialogue systems. This paper presents a QA matching model considering both distance information and dialogue history by two simultaneous attention mechanisms called mutual attention. Given scores computed by the trained model between each non-question turn with its candidate questions, a greedy matching strategy is used for final predictions. Because existing dialogue datasets such as the Ubuntu dataset are not suitable for the QA matching task, we further create a dataset with 1,000 labeled dialogues and demonstrate that our proposed model outperforms the state-of-the-art and other strong baselines, particularly for matching long-distance QA pairs. Show AllCite: Qi Jia, Mengxue Zhang, Shengyao Zhang and Kenny Q. Zhu. Matching Questions and Answers in Dialogues from Online Forums. In the Proceedings of
ECAI 2020.
[paper] [code] [data] Abstract: Multi-turn response selection is a task designed for developing dialogue agents. The performance on this task has a remarkable improvement with pre-trained language models. However, these models simply concatenate the turns in dialogue history as the input and largely ignore the dependencies between the turns. In this paper, we propose a dialogue extraction algorithm to transform a dialogue history into threads based on their dependency relations. Each thread can be regarded as a self-contained sub-dialogue. We also propose Thread-Encoder model to encode threads and candidates into compact representations by pre-trained Transformers and finally get the matching score through an attention layer. The experiments show that dependency relations are helpful for dialogue context understanding, and our model outperforms the state-of-the-art baselines on both DSTC7 and DSTC8*, with competitive results on UbuntuV2.Show AllCite: Qi Jia, Yizhu Liu, Siyu Ren, Kenny Q. Zhu and Haifeng Tang. Multi-turn Response Selection using Dialogue Dependency Relations. In the Proceedings of
EMNLP 2020.
[paper] [code] [data] Abstract: Interpersonal language style shifting in dialogues is an interesting and almost instinctive ability of human. Understanding interpersonal relationship from language content is also a crucial step toward further understanding dialogues. Previous work mainly focuses on relation extraction between named entities in texts or within a single dialogue session. In this paper, we propose the task of relation classification of interlocutors based on their dialogues. We crawled movie scripts from IMSDb, and annotated the relation labels for each session according to 13 pre-defined relationships. The annotated dataset DDRel consists of 6300 dyadic dialogue sessions between 694 pair of speakers with 53,126 utterances in total. We also construct session-level and pair-level relation classification tasks with widely-accepted baselines. The experimental results show that this task is challenging for existing models and the dataset will be useful for future research.Show AllCite: Qi Jia, Hongru Huangand Kenny Q. Zhu. DDRel: A New Dataset for Interpersonal Relation Classification in Dyadic Dialogues. In the Proceedings of
AAAI 2021.
[paper] [code] [data] Abstract: Answering complex questions that involve multiple entities and multiple relations using a standard knowledge base is an open and challenging task. Most existing KBQA approaches focus on simpler questions and do not work very well on complex questions because they were not able to simultaneously represent the question and the corresponding complex query structure. In this work, we encode such complex query structure into a uniform vector representation, and thus successfully capture the interactions between individual semantic components within a complex question. This approach consistently outperforms existing methods on complex questions while staying competitive on simple questions.Show AllCite: Kangqi Luo, Fengli Lin, Xusheng Luo, Kenny Zhu. Knowledge Base Question Answering via Encoding of Complex Query Graphs. In the Proceedings of
EMNLP 2018.
[paper] [code] [data] Abstract: Previous dialogue summarization techniques adapt large language models pretrained on the narrative text by injecting dialogue-specific features into the models. These features either require additional knowledge to recognize or make the resulting models harder to tune. To bridge the format gap between dialogues and narrative summaries in dialogue summarization tasks, we propose to post-train pretrained language models (PLMs) to rephrase from dialogue to narratives. After that, the model is fine-tuned for dialogue summarization as usual. Comprehensive experiments show that our approach significantly improves vanilla PLMs on dialogue summarization and outperforms other SOTA models by the summary quality and implementation costs.Show AllCite: Qi Jia, Yizhu Liu, Haifeng Tang and Kenny Q. Zhu. Post-Training Dialogue Summarization using Pseudo-Paraphrasing. In the Proceedings of Findings of
NAACL 2022.
[paper] [code] [data] Abstract: Existing automatic evaluation systems of chatbots mostly rely on static chat scripts as ground truth, which is hard to obtain, and requires access to the models of the bots as a form of “white-box testing”. Interactive evaluation mitigates this problem but requires human involvement. In our work, we propose an interactive chatbot evaluation framework in which chatbots compete with each other like in a sports tournament, using flexible scoring metrics. This framework can efficiently rank chatbots independently from their model architectures and the domains for which they are trained.Show AllCite: Ruolan Yang, Zitong Li, Haifeng Tang and Kenny Q. Zhu. ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments. In Proceedings of
ACL 2022.
[paper] [code] [data] Abstract: Few-shot relation classification seeks to classify incoming query instances after meeting only few support instances.This ability is gained by training with large amount of in-domain annotated data. In this paper, we tackle an even harder problem by further limiting the amount of data available at training time. We propose a few-shot learning framework for relation classification, which is particularly powerful when the training data is very small. In this framework, models not only strive to classify query instances, but also seek underlying knowledge about the support instances to obtain better instance representations. The framework also includes a method for aggregating cross-domain knowledge into models by open-source task enrichment. Additionally, we construct a brand new dataset: the TinyRel-CM dataset, a few-shot relation classification dataset in health domain with purposely small training data and challenging relation classes. Experimental results demonstrate that our framework brings performance gains for most underlying classification models, outperforms the state-of-the-art results given small training data, and achieves competitive results with sufficiently large training data.Show AllCite: Xiaoqing Geng, Xiwen Chen and Kenny Q. Zhu. MICK: A Meta-Learning Framework for Few-shot Relation Classification with Small Training Data. In the Proceedings of
CIKM 2020.
[paper] [code] [data] Abstract: In this paper, we propose a novel configurable framework to automatically generate distractive choices for open-domain cloze-style multiple-choice questions. The framework incorporates a general-purpose knowledge base to effectively create a small distractor candidate set, and a feature-rich learning-to-rank model to select distractors that are both plausible and reliable. Experimental results on a new dataset across four domains show that our framework yields distractors outperforming previous methods both by automatic and human evaluation. The dataset can also be used as a benchmark for distractor generation research in the future.Show AllCite: Siyu Ren and Kenny Q. Zhu. Knowledge-Driven Distractor Generation for Cloze-style Multiple Choice Questions. In the Proceedings of
AAAI 2021.
[paper] [code] [data] Abstract: Illustration plays a vital role in children's reading process. An appropriate illustration could add both interest and convenience to reading.However, it takes too much work and time to generate illustrations by man. So in this project we propose an automatic illustration model based on image retrieval which can output several illustrations with a consistent style with texts as input. Our model consists of four parts: NER, web crawler, object detection and style transfer. For demonstration we designed a web demo online on which we can simply input stories and get illustrations of a selected style efficiently.Show AllCite:
[paper] [code] [data] Abstract: News recommendation for anonymous readers is a useful but challenging task for many news portals, where interactions between readers and articles are limited within a temporary login session. Previous works tend to formulate session-based recommendation as a next item prediction task, while they neglect the implicit feedback from user behaviors, which indicates what users really like or dislike. Hence, we propose a comprehensive framework to model user behaviors through positive feedback (i.e., the articles they spend more time on) and negative feedback (i.e., the articles they choose to skip without clicking in). Moreover, the framework implicitly models the user using their session start time, and the article using its initial publishing time, in what we call neutral feedback. Empirical evaluation on three real-world news datasets shows the framework's promising performance of more accurate, diverse and even unexpectedness recommendations than other state-of-the-art session-based recommendation approaches.Show AllCite: Shansan Gong and Kenny Q. Zhu. Positive, Negative and Neutral: Modeling Implicit Feedback in Session-based News Recommendation. In
Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2022.