## TED: Teaching AI to Explain its Decisions

Noel C. F. Codella, Michael Hind, Karthikeyan Natesan Ramamurthy, Murray Campbell, Amit Dhurandhar, Kush R. Varshney, Dennis Wei, Aleksandra Mojsilovic

Artificial intelligence systems are being increasingly deployed due to their potential to increase the efficiency, scale, consistency, fairness, and accuracy of decisions. However, as many of these systems are opaque in their operation, there is a growing demand for such systems to provide explanations for their decisions. Conventional approaches to this problem attempt to expose or discover the inner workings of a machine learning model with the hope that the resulting explanations will be meaningful to the consumer. In contrast, this paper suggests a new approach to this problem. It introduces a simple, practical framework, called Teaching Explanations for Decisions (TED), that provides meaningful explanations that match the mental model of the consumer. We illustrate the generality and effectiveness of this approach with two different examples, resulting in highly accurate explanations with no loss of prediction accuracy for these two examples.

## Bio-YODIE: A Named Entity Linking System for Biomedical Text

Genevieve Gorrell, Xingyi Song, Angus Roberts

Ever-expanding volumes of biomedical text require automated semantic annotation techniques to curate and put to best use. An established field of research seeks to link mentions in text to knowledge bases such as those included in the UMLS (Unified Medical Language System), in order to enable a more sophisticated understanding. This work has yielded good results for tasks such as curating literature, but increasingly, annotation systems are more broadly applied. Medical vocabularies are expanding in size, and with them the extent of term ambiguity. Document collections are increasing in size and complexity, creating a greater need for speed and robustness. Furthermore, as the technologies are turned to new tasks, requirements change; for example greater coverage of expressions may be required in order to annotate patient records, and greater accuracy may be needed for applications that affect patients. This places new demands on the approaches currently in use. In this work, we present a new system, Bio-YODIE, and compare it to two other popular systems in order to give guidance about suitable approaches in different scenarios and how systems might be designed to accommodate future needs.

## On the practice of classification learning for clinical diagnosis and therapy advice in oncology

Flavio S Correa da Silva, Frederico P Costa, Antonio F Iemma

Artificial intelligence and medicine have a longstanding and proficuous relationship. In the present work we develop a brief assessment of this relationship with specific focus on machine learning, in which we highlight some critical points which may hinder the use of machine learning techniques for clinical diagnosis and therapy advice in practice. We then suggest a conceptual framework to build successful systems to aid clinical diagnosis and therapy advice, grounded on a novel concept we have coined drifting domains. We focus on oncology to build our arguments, as this area of medicine furnishes strong evidence for the critical points we take into account here.

## Reasoning From Data in the Mathematical Theory of Evidence

Mathematical Theory of Evidence (MTE) is known as a foundation for reasoning when knowledge is expressed at various levels of detail. Though much research effort has been committed to this theory since its foundation, many questions remain open. One of the most important open questions seems to be the relationship between frequencies and the Mathematical Theory of Evidence. The theory is blamed to leave frequencies outside (or aside of) its framework. The seriousness of this accusation is obvious: no experiment may be run to compare the performance of MTE-based models of real world processes against real world data.

In this paper we develop a frequentist model of the MTE bringing to fall the above argument against MTE. We describe, how to interpret data in terms of MTE belief functions, how to reason from data about conditional belief functions, how to generate a random sample out of a MTE model, how to derive MTE model from data and how to compare results of reasoning in MTE model and reasoning from data.

It is claimed in this paper that MTE is suitable to model some types of destructive processes

## Mathematical Theory of Evidence Versus Evidence

This paper is concerned with the apparent greatest weakness of the Mathematical Theory of Evidence (MTE) of Shafer \\cite{Shafer:76}, which has been strongly criticized by Wasserman \\cite{Wasserman:92ijar}.

Weaknesses of Shafer’s proposal \\cite{Shafer:90b} of probabilistic interpretation of MTE belief functions is demonstrated. Thereafter a new probabilistic interpretation of MTE conforming both to definition of belief function and to Dempster’s rule of combination of independent evidence. It is shown that shaferian conditioning of belief functions on observations \\cite{Shafer:90b} may be treated as selection combined with modification of data, that is data is not viewed as it is but it is casted into one’s beliefs in what it should be like.

## Learning data augmentation policies using augmented random search

Mingyang Geng, Kele Xu, Bo Ding, Huaimin Wang, Lei Zhang

Previous attempts for data augmentation are designed manually, and the augmentation policies are dataset-specific. Recently, an automatic data augmentation approach, named AutoAugment, is proposed using reinforcement learning. AutoAugment searches for the augmentation polices in the discrete search space, which may lead to a sub-optimal solution. In this paper, we employ the Augmented Random Search method (ARS) to improve the performance of AutoAugment. Our key contribution is to change the discrete search space to continuous space, which will improve the searching performance and maintain the diversities between sub-policies. With the proposed method, state-of-the-art accuracies are achieved on CIFAR-10, CIFAR-100, and ImageNet (without additional data). Our code is available at this https URL.

## Quantum Reasoning using Lie Algebra for Everyday Life (and AI perhaps…)

We investigate the applicability of the formalism of quantum mechanics to everyday life. It seems to be directly relevant for situations in which the very act of coming to a conclusion or decision on one issue affects one’s confidence about conclusions or decisions on another issue. Lie algebra theory is argued to be a very useful tool in guiding the construction of quantum descriptions of such situations. Tests, extensions and speculative applications and implications, including for the encoding of thoughts in neural networks, are discussed. It is suggested that the recognition and incorporation of such mathematical structure into machine learning and artificial intelligence might lead to significant efficiency and generality gains in addition to ensuring probabilistic reasoning at a fundamental level.

## Reimplementation and Reinterpretation of the Copycat Project

We present the reinterpreted and reimplemented Copycat project, an architecture solving letter analogy domain problems. To support a flexible implementation change and rigor testing process, we propose a implementation method in DrRacket by using functional abstraction, naming system, initialization, and structural reference. Finally, benefits and limitations are analyzed for cognitive architectures along the lines of Copycat.

## Universal Marginalizer for Amortised Inference and Embedding of Generative Models

Robert Walecki, Albert Buchard, Kostis Gourgoulias, Chris Hart, Maria Lomeli, A. K. W. Navarro, Max Zwiessele, Yura Perov, Saurabh Johri

Probabilistic graphical models are powerful tools which allow us to formalise our knowledge about the world and reason about its inherent uncertainty. There exist a considerable number of methods for performing inference in probabilistic graphical models; however, they can be computationally costly due to significant time burden and/or storage requirements; or they lack theoretical guarantees of convergence and accuracy when applied to large scale graphical models. To this end, we propose the Universal Marginaliser Importance Sampler (UM-IS) — a hybrid inference scheme that combines the flexibility of a deep neural network trained on samples from the model and inherits the asymptotic guarantees of importance sampling. We show how combining samples drawn from the graphical model with an appropriate masking function allows us to train a single neural network to approximate any of the corresponding conditional marginal distributions, and thus amortise the cost of inference. We also show that the graph embeddings can be applied for tasks such as: clustering, classification and interpretation of relationships between the nodes. Finally, we benchmark the method on a large graph (>1000 nodes), showing that UM-IS outperforms sampling-based methods by a large margin while being computationally efficient.

## Combining Learned Lyrical Structures and Vocabulary for Improved Lyric Generation

Pablo Samuel Castro, Maria Attarian

The use of language models for generating lyrics and poetry has received an increased interest in the last few years. They pose a unique challenge relative to standard natural language problems, as their ultimate purpose is reative, notions of accuracy and reproducibility are secondary to notions of lyricism, structure, and diversity. In this creative setting, traditional quantitative measures for natural language problems, such as BLEU scores, prove inadequate: a high-scoring model may either fail to produce output respecting the desired structure (e.g. song verses), be a terribly boring creative companion, or both. In this work we propose a mechanism for combining two separately trained language models into a framework that is able to produce output respecting the desired song structure, while providing a richness and diversity of vocabulary that renders it more creatively appealing.

## Differentiating Concepts and Instances for Knowledge Graph Embedding

Xin Lv, Lei Hou, Juanzi Li, Zhiyuan Liu

Concepts, which represent a group of different instances sharing common properties, are essential information in knowledge representation. Most conventional knowledge embedding methods encode both entities (concepts and instances) and relations as vectors in a low dimensional semantic space equally, ignoring the difference between concepts and instances. In this paper, we propose a novel knowledge graph embedding model named TransC by differentiating concepts and instances. Specifically, TransC encodes each concept in knowledge graph as a sphere and each instance as a vector in the same semantic space. We use the relative positions to model the relations between concepts and instances (i.e., instanceOf), and the relations between concepts and sub-concepts (i.e., subClassOf). We evaluate our model on both link prediction and triple classification tasks on the dataset based on YAGO. Experimental results show that TransC outperforms state-of-the-art methods, and captures the semantic transitivity for instanceOf and subClassOf relation. Our codes and datasets can be obtained from https:// github.com/davidlvxin/TransC.

## Navigating Assistance System for Quadcopter with Deep Reinforcement Learning

Tung-Cheng Wu, Shau-Yin Tseng, Chin-Feng Lai, Chia-Yu Ho, Ying-Hsun Lai

In this paper, we present a deep reinforcement learning method for quadcopter bypassing the obstacle on the flying path. In the past study, the algorithm only controls the forward direction about quadcopter. In this letter, we use two functions to control quadcopter. One is quadcopter navigating function. It is based on calculating coordination point and find the straight path to the goal. The other function is collision avoidance function. It is implemented by deep Q-network model. Both two function will output rotating degree, the agent will combine both output and turn direct. Besides, deep Q-network can also make quadcopter fly up and down to bypass the obstacle and arrive at the goal. Our experimental result shows that the collision rate is 14% after 500 flights. Based on this work, we will train more complex sense and transfer model to the real quadcopter.

## Agent Embeddings: A Latent Representation for Pole-Balancing Networks

Oscar Chang, Robert Kwiatkowski, Siyuan Chen, Hod Lipson

We show that it is possible to reduce a high-dimensional object like a neural network agent into a low-dimensional vector representation with semantic meaning that we call agent embeddings, akin to word or face embeddings. This can be done by collecting examples of existing networks, vectorizing their weights, and then learning a generative model over the weight space in a supervised fashion. We investigate a pole-balancing task, Cart-Pole, as a case study and show that multiple new pole-balancing networks can be generated from their agent embeddings without direct access to training data from the Cart-Pole simulator. In general, the learned embedding space is helpful for mapping out the space of solutions for a given task. We observe in the case of Cart-Pole the surprising finding that good agents make different decisions despite learning similar representations, whereas bad agents make similar (bad) decisions while learning dissimilar representations. Linearly interpolating between the latent embeddings for a good agent and a bad agent yields an agent embedding that generates a network with intermediate performance, where the performance can be tuned according to the coefficient of interpolation. Linear extrapolation in the latent space also results in performance boosts, up to a point.

## Time-interval balancing in multi-processor scheduling of composite modular jobs (preliminary description)

The article describes a special time-interval balancing in multi-processor scheduling of composite modular jobs. This scheduling problem is close to just-in-time planning approach. First, brief literature surveys are presented on just-in-time scheduling and due-data/due-window scheduling problems. Further, the problem and its formulation are proposed for the time-interval balanced scheduling of composite modular jobs. The illustrative real world planning example for modular home-building is described. Here, the main objective function consists in a balance between production of the typical building modules (details) and the assembly processes of the building(s) (by several teams). The assembly plan has to be modified to satisfy the balance requirements. The solving framework is based on the following: (i) clustering of initial set of modular detail types to obtain about ten basic detail types that correspond to main manufacturing conveyors; (ii) designing a preliminary plan of assembly for buildings; (iii) detection of unbalanced time periods, (iv) modification of the planning solution to improve the schedule balance. The framework implements a metaheuristic based on local optimization approach. Two other applications (supply chain management, information transmission systems) are briefly described.

## End-to-end Structure-Aware Convolutional Networks for Knowledge Base Completion

Chao Shang, Yun Tang, Jing Huang, Jinbo Bi, Xiaodong He, Bowen Zhou

Knowledge graph embedding has been an active research topic for knowledge base completion, with progressive improvement from the initial TransE, TransH, DistMult et al to the current state-of-the-art ConvE. ConvE uses 2D convolution over embeddings and multiple layers of nonlinear features to model knowledge graphs. The model can be efficiently trained and scalable to large knowledge graphs. However, there is no structure enforcement in the embedding space of ConvE. The recent graph convolutional network (GCN) provides another way of learning graph node embedding by successfully utilizing graph connectivity structure. In this work, we propose a novel end-to-end Structure-Aware Convolutional Networks (SACN) that take the benefit of GCN and ConvE together. SACN consists of an encoder of a weighted graph convolutional network (WGCN), and a decoder of a convolutional network called Conv-TransE. WGCN utilizes knowledge graph node structure, node attributes and relation types. It has learnable weights that collect adaptive amount of information from neighboring graph nodes, resulting in more accurate embeddings of graph nodes. In addition, the node attributes are added as the nodes and are easily integrated into the WGCN. The decoder Conv-TransE extends the state-of-the-art ConvE to be translational between entities and relations while keeps the state-of-the-art performance as ConvE. We demonstrate the effectiveness of our proposed SACN model on standard FB15k-237 and WN18RR datasets, and present about 10% relative improvement over the state-of-the-art ConvE in terms of HITS@1, HITS@3 and HITS@10.

## Explaining Deep Learning Models using Causal Inference

Tanmayee Narendra, Anush Sankaran, Deepak Vijaykeerthy, Senthil Mani

Although deep learning models have been successfully applied to a variety of tasks, due to the millions of parameters, they are becoming increasingly opaque and complex. In order to establish trust for their widespread commercial use, it is important to formalize a principled framework to reason over these models. In this work, we use ideas from causal inference to describe a general framework to reason over CNN models. Specifically, we build a Structural Causal Model (SCM) as an abstraction over a specific aspect of the CNN. We also formulate a method to quantitatively rank the filters of a convolution layer according to their counterfactual importance. We illustrate our approach with popular CNN architectures such as LeNet5, VGG19, and ResNet32.

## A Survey of Mixed Data Clustering Algorithms

Most of the datasets normally contain either numeric or categorical features. Mixed data comprises of both numeric and categorical features, and they frequently occur in various domains, such as health, finance, marketing, etc. Clustering is often sought on mixed data to find structures and to group similar objects. However, clustering mixed data is challenging because it is difficult to directly apply mathematical operations, such as summation, average etc. on the feature values of these datasets. In this paper, we review various types of mixed data clustering techniques in detail. We present a taxonomy to identify ten types of different mixed data clustering techniques. We also compare the performance of several mixed data clustering methods on publicly available datasets. The paper further identifies challenges in developing different mixed data clustering algorithms and provides guidelines for future directions in this area.

## Neural-based Pinyin-to-Character Conversion with Adaptive Vocabulary

Yafang Huang, Zhuosheng Zhang, Hai Zhao

Pinyin-to-character (P2C) conversion is the core component of pinyin-based Chinese input method engine (IME). However, the conversion is seriously compromised by the ambiguities of Chinese characters corresponding to pinyin as well as the predefined fixed vocabularies. To alleviate such inconveniences, we propose a neural P2C conversion model augmented by a large online updating vocabulary with a target vocabulary sampling mechanism. Our experiments show that the proposed approach reduces the decoding time on CPUs up to 50$\\%$ on P2C tasks at the same or only negligible change in conversion accuracy, and the online updated vocabulary indeed helps our IME effectively follows user inputting behavior.

## Towards Governing Agent’s Efficacy: Action-Conditional $\u03b2$-VAE for Deep Transparent Reinforcement Learning

John Yang, Gyujeong Lee, Minsung Hyun, Simyung Chang, Nojun Kwak

We tackle the blackbox issue of deep neural networks in the settings of reinforcement learning (RL) where neural agents learn towards maximizing reward gains in an uncontrollable way. Such learning approach is risky when the interacting environment includes an expanse of state space because it is then almost impossible to foresee all unwanted outcomes and penalize them with negative rewards beforehand. Unlike reverse analysis of learned neural features from previous works, our proposed method \ j{tackles the blackbox issue by encouraging} an RL policy network to learn interpretable latent features through an implementation of a disentangled representation learning method. Toward this end, our method allows an RL agent to understand self-efficacy by distinguishing its influences from uncontrollable environmental factors, which closely resembles the way humans understand their scenes. Our experimental results show that the learned latent factors not only are interpretable, but also enable modeling the distribution of entire visited state space with a specific action condition. We have experimented that this characteristic of the proposed structure can lead to ex post facto governance for desired behaviors of RL agents.

## New Movement and Transformation Principle of Fuzzy Reasoning and Its Application to Fuzzy Neural Network

Chung-Jin Kwak, Son-Il Kwak, Dae-Song Kang, Song-Il Choe, Jin-Ung Kim, Hyok-Gi Chea

In this paper, we propose a new fuzzy reasoning principle, so called Movement and Transformation Principle(MTP). This Principle is to obtain a new fuzzy reasoning result by Movement and Transformation the consequent fuzzy set in response to the Movement, Transformation, and Movement-Transformation operations between the antecedent fuzzy set and fuzzificated observation information. And then we presented fuzzy modus ponens and fuzzy modus tollens based on MTP. We compare proposed method with Mamdani fuzzy system, Sugeno fuzzy system, Wang distance type fuzzy reasoning method and Hellendoorn functional type method. And then we applied to the learning experiments of the fuzzy neural network based on MTP and compared it with the Sugeno method. Through prediction experiments of fuzzy neural network on the precipitation data and security situation data, learning accuracy and time performance are clearly improved. Consequently we show that our method based on MTP is computationally simple and does not involve nonlinear operations, so it is easy to handle mathematically.

## Random Dictators with a Random Referee: Constant Sample Complexity Mechanisms for Social Choice

Brandon Fain, Ashish Goel, Kamesh Munagala, Nina Prabhu

We study social choice mechanisms in an implicit utilitarian framework with a metric constraint, where the goal is to minimize \\textit{Distortion}, the worst case social cost of an ordinal mechanism relative to underlying cardinal utilities. We consider two additional desiderata: Constant sample complexity and Squared Distortion. Constant sample complexity means that the mechanism (potentially randomized) only uses a constant number of ordinal queries regardless of the number of voters and alternatives. Squared Distortion is a measure of variance of the Distortion of a randomized mechanism.

Our primary contribution is the first social choice mechanism with constant sample complexity \\textit{and} constant Squared Distortion (which also implies constant Distortion). We call the mechanism Random Referee, because it uses a random agent to compare two alternatives that are the favorites of two other random agents. We prove that the use of a comparison query is necessary: no mechanism that only elicits the top-k preferred alternatives of voters (for constant k) can have Squared Distortion that is sublinear in the number of alternatives. We also prove that unlike any top-k only mechanism, the Distortion of Random Referee meaningfully improves on benign metric spaces, using the Euclidean plane as a canonical example. Finally, among top-1 only mechanisms, we introduce Random Oligarchy. The mechanism asks just 3 queries and is essentially optimal among the class of such mechanisms with respect to Distortion.

In summary, we demonstrate the surprising power of constant sample complexity mechanisms generally, and just three random voters in particular, to provide some of the best known results in the implicit utilitarian framework.

## Learning Latent Dynamics for Planning from Pixels

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson

Planning has been very successful for control tasks with known environment dynamics. To leverage planning in unknown environments, the agent needs to learn the dynamics from interactions with the world. However, learning dynamics models that are accurate enough for planning has been a long-standing challenge, especially in image-based domains. We propose the Deep Planning Network (PlaNet), a purely model-based agent that learns the environment dynamics from pixels and chooses actions through online planning in latent space. To achieve high performance, the dynamics model must accurately predict the rewards ahead for multiple time steps. We approach this problem using a latent dynamics model with both deterministic and stochastic transition function and a generalized variational inference objective that we name latent overshooting. Using only pixel observations, our agent solves continuous control tasks with contact dynamics, partial observability, and sparse rewards. PlaNet uses significantly fewer episodes and reaches final performance close to and sometimes higher than top model-free algorithms.

## SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient

Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt, Mohammad Emtiyaz Khan

Uncertainty estimation in large deep-learning models is a computationally challenging task, where it is difficult to form even a Gaussian approximation to the posterior distribution. In such situations, existing methods usually resort to a diagonal approximation of the covariance matrix despite, the fact that these matrices are known to give poor uncertainty estimates. To address this issue, we propose a new stochastic, low-rank, approximate natural-gradient (SLANG) method for variational inference in large, deep models. Our method estimates a “diagonal plus low-rank” structure based solely on back-propagated gradients of the network log-likelihood. This requires strictly less gradient computations than methods that compute the gradient of the whole variational objective. Empirical evaluations on standard benchmarks confirm that SLANG enables faster and more accurate estimation of uncertainty than mean-field methods, and performs comparably to state-of-the-art methods.

## ReDecode Framework for Iterative Improvement in Paraphrase Generation

Milan Aggarwal, Nupur Kumari, Ayush Bansal, Balaji Krishnamurthy

Generating paraphrases, that is, different variations of a sentence conveying the same meaning, is an important yet challenging task in NLP. Automatically generating paraphrases has its utility in many NLP tasks like question answering, information retrieval, conversational systems to name a few. In this paper, we introduce iterative refinement of generated paraphrases within VAE based generation framework. Current sequence generation models lack the capability to (1) make improvements once the sentence is generated; (2) rectify errors made while decoding. We propose a technique to iteratively refine the output using multiple decoders, each one attending on the output sentence generated by the previous decoder. We improve current state of the art results significantly – with over 9% and 28% absolute increase in METEOR scores on Quora question pairs and MSCOCO datasets respectively. We also show qualitatively through examples that our re-decoding approach generates better paraphrases compared to a single decoder by rectifying errors and making improvements in paraphrase structure, inducing variations and introducing new but semantically coherent information.

## An Initial Attempt of Combining Visual Selective Attention with Deep Reinforcement Learning

Liu Yuezhang, Ruohan Zhang, Dana H. Ballard

Visual attention serves as a means of feature selection mechanism in the perceptual system. Motivated by Broadbent’s leaky filter model of selective attention, we evaluate how such mechanism could be implemented and affect the learning process of deep reinforcement learning. We visualize and analyze the feature maps of DQN on a toy problem Catch, and propose an approach to combine visual selective attention with deep reinforcement learning. We experiment with optical flow-based attention and A2C on Atari games. Experiment results show that visual selective attention could lead to improvements in terms of sample efficiency on tested games. An intriguing relation between attention and batch normalization is also discovered.

## User Modeling for Task Oriented Dialogues

Izzeddin Gur, Dilek Hakkani-Tur, Gokhan Tur, Pararth Shah

We introduce end-to-end neural network based models for simulating users of task-oriented dialogue systems. User simulation in dialogue systems is crucial from two different perspectives: (i) automatic evaluation of different dialogue models, and (ii) training task-oriented dialogue systems. We design a hierarchical sequence-to-sequence model that first encodes the initial user goal and system turns into fixed length representations using Recurrent Neural Networks (RNN). It then encodes the dialogue history using another RNN layer. At each turn, user responses are decoded from the hidden representations of the dialogue level RNN. This hierarchical user simulator (HUS) approach allows the model to capture undiscovered parts of the user goal without the need of an explicit dialogue state tracking. We further develop several variants by utilizing a latent variable model to inject random variations into user responses to promote diversity in simulated user responses and a novel goal regularization mechanism to penalize divergence of user responses from the initial user goal. We evaluate the proposed models on movie ticket booking domain by systematically interacting each user simulator with various dialogue system policies trained with different objectives and users.

## Optimizing Taxi Carpool Policies via Reinforcement Learning and Spatio-Temporal Mining

Ishan Jindal, Zhiwei Qin, Xuewen Chen, Matthew Nokleby, Jieping Ye

In this paper, we develop a reinforcement learning (RL) based system to learn an effective policy for carpooling that maximizes transportation efficiency so that fewer cars are required to fulfill the given amount of trip demand. For this purpose, first, we develop a deep neural network model, called ST-NN (Spatio-Temporal Neural Network), to predict taxi trip time from the raw GPS trip data. Secondly, we develop a carpooling simulation environment for RL training, with the output of ST-NN and using the NYC taxi trip dataset. In order to maximize transportation efficiency and minimize traffic congestion, we choose the effective distance covered by the driver on a carpool trip as the reward. Therefore, the more effective distance a driver achieves over a trip (i.e. to satisfy more trip demand) the higher the efficiency and the less will be the traffic congestion. We compared the performance of RL learned policy to a fixed policy (which always accepts carpool) as a baseline and obtained promising results that are interpretable and demonstrate the advantage of our RL approach. We also compare the performance of ST-NN to that of state-of-the-art travel time estimation methods and observe that ST-NN significantly improves the prediction performance and is more robust to outliers.

## Langevin-gradient parallel tempering for Bayesian neural learning

Rohitash Chandra, Konark Jain, Ratneel V. Deo, Sally Cripps

Bayesian neural learning feature a rigorous approach to estimation and uncertainty quantification via the posterior distribution of weights that represent knowledge of the neural network. This not only provides point estimates of optimal set of weights but also the ability to quantify uncertainty in decision making using the posterior distribution. Markov chain Monte Carlo (MCMC) techniques are typically used to obtain sample-based estimates of the posterior distribution. However, these techniques face challenges in convergence and scalability, particularly in settings with large datasets and network architectures. This paper address these challenges in two ways. First, parallel tempering is used used to explore multiple modes of the posterior distribution and implemented in multi-core computing architecture. Second, we make within-chain sampling schemes more efficient by using Langevin gradient information in forming Metropolis-Hastings proposal distributions. We demonstrate the techniques using time series prediction and pattern classification applications. The results show that the method not only improves the computational time, but provides better prediction or decision making capabilities when compared to related methods.

## Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

Ryosuke Furuta, Naoto Inoue, Toshihiko Yamasaki

This paper tackles a new problem setting: reinforcement learning with pixel-wise rewards (pixelRL) for image processing. After the introduction of the deep Q-network, deep RL has been achieving great success. However, the applications of deep RL for image processing are still limited. Therefore, we extend deep RL to pixelRL for various image processing applications. In pixelRL, each pixel has an agent, and the agent changes the pixel value by taking an action. We also propose an effective learning method for pixelRL that significantly improves the performance by considering not only the future states of the own pixel but also those of the neighbor pixels. The proposed method can be applied to some image processing tasks that require pixel-wise manipulations, where deep RL has never been applied. We apply the proposed method to three image processing tasks: image denoising, image restoration, and local color enhancement. Our experimental results demonstrate that the proposed method achieves comparable or better performance, compared with the state-of-the-art methods based on supervised learning.

## PolyNeuron: Automatic Neuron Discovery via Learned Polyharmonic Spline Activations

Andrew Hryniowski, Alexander Wong

Automated deep neural network architecture design has received a significant amount of recent attention. However, this attention has not been equally shared by one of the fundamental building blocks of a deep neural network, the neurons. In this study, we propose PolyNeuron, a novel automatic neuron discovery approach based on learned polyharmonic spline activations. More specifically, PolyNeuron revolves around learning polyharmonic splines, characterized by a set of control points, that represent the activation functions of the neurons in a deep neural network. A relaxed variant of PolyNeuron, which we term PolyNeuron-R, loosens the constraints imposed by PolyNeuron to reduce the computational complexity for discovering the neuron activation functions in an automated manner. Experiments show both PolyNeuron and PolyNeuron-R lead to networks that have improved or comparable performance on multiple network architectures (LeNet-5 and ResNet-20) using different datasets (MNIST and CIFAR10). As such, automatic neuron discovery approaches such as PolyNeuron is a worthy direction to explore.

## Learning Shaping Strategies in Human-in-the-loop Interactive Reinforcement Learning

Chao Yu, Tianpei Yang, Wenxuan Zhu, Dongxu wang, Guangliang Li

Providing reinforcement learning agents with informationally rich human knowledge can dramatically improve various aspects of learning. Prior work has developed different kinds of shaping methods that enable agents to learn efficiently in complex environments. All these methods, however, tailor human guidance to agents in specialized shaping procedures, thus embodying various characteristics and advantages in different domains. In this paper, we investigate the interplay between different shaping methods for more robust learning performance. We propose an adaptive shaping algorithm which is capable of learning the most suitable shaping method in an on-line manner. Results in two classic domains verify its effectiveness from both simulated and real human studies, shedding some light on the role and impact of human factors in human-robot collaborative learning.

## Towards Formula Translation using Recursive Neural Networks

Felix Petersen, Moritz Schubotz, Bela Gipp

While it has become common to perform automated translations on natural language, performing translations between different representations of mathematical formulae has thus far not been possible. We implemented the first translator for mathematical formulae based on recursive neural networks. We chose recursive neural networks because mathematical formulae inherently include a structural encoding. In our implementation, we developed new techniques and topologies for recursive tree-to-tree neural networks based on multi-variate multi-valued Long Short-Term Memory cells. We propose a novel approach for mini-batch training that utilizes clustering and tree traversal. We evaluate our translator and analyze the behavior of our proposed topologies and techniques based on a translation from generic LaTeX to the semantic LaTeX notation. We use the semantic LaTeX notation from the Digital Library for Mathematical Formulae and the Digital Repository for Mathematical Formulae at the National Institute for Standards and Technology. We find that a simple heuristics-based clustering algorithm outperforms the conventional clustering algorithms on the task of clustering binary trees of mathematical formulae with respect to their topology. Furthermore, we find a mask for the loss function, which can prevent the neural network from finding a local minimum of the loss function. Given our preliminary results, a complete translation from formula to formula is not yet possible. However, we achieved a prediction accuracy of 47.05% for predicting symbols at the correct position and an accuracy of 92.3% when ignoring the predicted position. Concluding, our work advances the field of recursive neural networks by improving the training speed and quality of training. In the future, we will work towards a complete translation allowing a machine-interpretation of LaTeX formulae.

## Densely Connected Attention Propagation for Reading Comprehension

Yi Tay, Luu Anh Tuan, Siu Cheung Hui, Jian Su

We propose DecaProp (Densely Connected Attention Propagation), a new densely connected neural architecture for reading comprehension (RC). There are two distinct characteristics of our model. Firstly, our model densely connects all pairwise layers of the network, modeling relationships between passage and query across all hierarchical levels. Secondly, the dense connectors in our network are learned via attention instead of standard residual skip-connectors. To this end, we propose novel Bidirectional Attention Connectors (BAC) for efficiently forging connections throughout the network. We conduct extensive experiments on four challenging RC benchmarks. Our proposed approach achieves state-of-the-art results on all four, outperforming existing baselines by up to $2.6\\%-14.2\\%$ in absolute F1 score.

## CAPTAIN: Comprehensive Composition Assistance for Photo Taking

Farshid Farhat, Mohammad Mahdi Kamani, James Z. Wang

Many people are interested in taking astonishing photos and sharing with others. Emerging hightech hardware and software facilitate ubiquitousness and functionality of digital photography. Because composition matters in photography, researchers have leveraged some common composition techniques to assess the aesthetic quality of photos computationally. However, composition techniques developed by professionals are far more diverse than well-documented techniques can cover. We leverage the vast underexplored innovations in photography for computational composition assistance. We propose a comprehensive framework, named CAPTAIN (Composition Assistance for Photo Taking), containing integrated deep-learned semantic detectors, sub-genre categorization, artistic pose clustering, personalized aesthetics-based image retrieval, and style set matching. The framework is backed by a large dataset crawled from a photo-sharing Website with mostly photography enthusiasts and professionals. The work proposes a sequence of steps that have not been explored in the past by researchers. The work addresses personal preferences for composition through presenting a ranked-list of photographs to the user based on user-specified weights in the similarity measure. The matching algorithm recognizes the best shot among a sequence of shots with respect to the user’s preferred style set. We have conducted a number of experiments on the newly proposed components and reported findings. A user study demonstrates that the work is useful to those taking photos.

## Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction

Valts Blukis, Dipendra Misra, Ross A. Knepper, Yoav Artzi

We propose an approach for mapping natural language instructions and raw observations to continuous control of a quadcopter drone. Our model predicts interpretable position-visitation distributions indicating where the agent should go during execution and where it should stop, and uses the predicted distributions to select the actions to execute. This two-step model decomposition allows for simple and efficient training using a combination of supervised learning and imitation learning. We evaluate our approach with a realistic drone simulator, and demonstrate absolute task-completion accuracy improvements of 16.85% over two state-of-the-art instruction-following methods.

## Use of Neural Signals to Evaluate the Quality of Generative Adversarial Network Performance in Facial Image Generation

Zhengwei Wang, Graham Healy, Alan F. Smeaton, Tomas E. Ward

There is a growing interest in using Generative Adversarial Networks (GANs) to produce image content that is indistinguishable from a real image as judged by a typical person. A number of GAN variants for this purpose have been proposed, however, evaluating GANs is inherently difficult because current methods of measuring the quality of the output do not always mirror what is actually perceived by a human. We propose a novel approach that deploys a brain-computer interface to generate a neural score that closely mirrors the behavioral ground truth measured from participants discerning real from synthetic images. In this paper, we first compare the three most widely used metrics in the literature for evaluating GANs in terms of visual quality compared to human judgments. Second, we propose and demonstrate a novel approach using neural signals and rapid serial visual presentation (RSVP) that directly measures a human perceptual response to facial production quality independent of a behavioral response measurement. Finally we show that our neural score is more consistent with human judgment compared to the conventional metrics we evaluated. We conclude that neural signals have potential application for high quality, rapid evaluation of GANs in the context of visual image synthesis.

## Reasoning over RDF Knowledge Bases using Deep Learning

Monireh Ebrahimi, Md Kamruzzaman Sarker, Federico Bianchi, Ning Xie, Derek Doran, Pascal Hitzler

Semantic Web knowledge representation standards, and in particular RDF and OWL, often come endowed with a formal semantics which is considered to be of fundamental importance for the field. Reasoning, i.e., the drawing of logical inferences from knowledge expressed in such standards, is traditionally based on logical deductive methods and algorithms which can be proven to be sound and complete and terminating, i.e. correct in a very strong sense. For various reasons, though, in particular, the scalability issues arising from the ever-increasing amounts of Semantic Web data available and the inability of deductive algorithms to deal with noise in the data, it has been argued that alternative means of reasoning should be investigated which bear high promise for high scalability and better robustness. From this perspective, deductive algorithms can be considered the gold standard regarding correctness against which alternative methods need to be tested. In this paper, we show that it is possible to train a Deep Learning system on RDF knowledge graphs, such that it is able to perform reasoning over new RDF knowledge graphs, with high precision and recall compared to the deductive gold standard.

## Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration

Helge Spieker, Arnaud Gotlieb, Dusica Marijan, Morten Mossige

Testing in Continuous Integration (CI) involves test case prioritization, selection, and execution at each cycle. Selecting the most promising test cases to detect bugs is hard if there are uncertainties on the impact of committed code changes or, if traceability links between code and tests are not available. This paper introduces Retecs, a new method for automatically learning test case selection and prioritization in CI with the goal to minimize the round-trip time between code commits and developer feedback on failed test cases. The Retecs method uses reinforcement learning to select and prioritize test cases according to their duration, previous last execution and failure history. In a constantly changing environment, where new test cases are created and obsolete test cases are deleted, the Retecs method learns to prioritize error-prone test cases higher under guidance of a reward function and by observing previous CI cycles. By applying Retecs on data extracted from three industrial case studies, we show for the first time that reinforcement learning enables fruitful automatic adaptive test case selection and prioritization in CI and regression testing.

## An Overview of Computational Approaches for Analyzing Interpretation

Philipp Blandfort, J\xf6rn Hees, Desmond U. Patton

It is said that beauty is in the eye of the beholder. But how exactly can we characterize such discrepancies in interpretation? For example, are there any specific features of an image that makes person A regard an image as beautiful while person B finds the same image displeasing? Such questions ultimately aim at explaining our individual ways of interpretation, an intention that has been of fundamental importance to the social sciences from the beginning. More recently, advances in computer science brought up two related questions: First, can computational tools be adopted for analyzing ways of interpretation? Second, what if the “beholder” is a computer model, i.e., how can we explain a computer model’s point of view? Numerous efforts have been made regarding both of these points, while many existing approaches focus on particular aspects and are still rather separate. With this paper, in order to connect these approaches we introduce a theoretical framework for analyzing interpretation, which is applicable to interpretation of both human beings and computer models. We give an overview of relevant computational approaches from various fields, and discuss the most common and promising application areas. The focus of this paper lies on interpretation of text and image data, while many of the presented approaches are applicable to other types of data as well.

## Stratified Constructive Disjunction and Negation in Constraint Programming

Arnaud Gotlieb, Dusica Marijan, Helge Spieker

Constraint Programming (CP) is a powerful declarative programming paradigm combining inference and search in order to find solutions to various type of constraint systems. Dealing with highly disjunctive constraint systems is notoriously difficult in CP. Apart from trying to solve each disjunct independently from each other, there is little hope and effort to succeed in constructing intermediate results combining the knowledge originating from several disjuncts. In this paper, we propose If Then Else (ITE), a lightweight approach for implementing stratified constructive disjunction and negation on top of an existing CP solver, namely SICStus Prolog clp(FD). Although constructive disjunction is known for more than three decades, it does not have straightforward implementations in most CP solvers. ITE is a freely available library proposing stratified and constructive reasoning for various operators, including disjunction and negation, implication and conditional. Our preliminary experimental results show that ITE is competitive with existing approaches that handle disjunctive constraint systems.

## Suggesting Cooking Recipes Through Simulation and Bayesian Optimization

Eduardo C. Garrido-Merch\xe1n, Alejandro Albarca-Molina

Cooking typically involves a plethora of decisions about ingredients and tools that need to be chosen in order to write a good cooking recipe. Cooking can be modelled in an optimization framework, as it involves a search space of ingredients, kitchen tools, cooking times or temperatures. If we model as an objective function the quality of the recipe, several problems arise. No analytical expression can model all the recipes, so no gradients are available. The objective function is subjective, in other words, it contains noise. Moreover, evaluations are expensive both in time and human resources. Bayesian Optimization (BO) emerges as an ideal methodology to tackle problems with these characteristics. In this paper, we propose a methodology to suggest recipe recommendations based on a Machine Learning (ML) model that fits real and simulated data and BO. We provide empirical evidence with two experiments that support the adequacy of the methodology.

## A Very Brief and Critical Discussion on AutoML

This contribution presents a very brief and critical discussion on automated machine learning (AutoML), which is categorized here into two classes, referred to as narrow AutoML and generalized AutoML, respectively. The conclusions yielded from this discussion can be summarized as follows: (1) most existent research on AutoML belongs to the class of narrow AutoML; (2) advances in narrow AutoML are mainly motivated by commercial needs, while any possible benefit obtained is definitely at a cost of increase in computing burdens; (3)the concept of generalized AutoML has a strong tie in spirit with artificial general intelligence (AGI), also called “strong AI”, for which obstacles abound for obtaining pivotal progresses.

## Analysis of Fleet Modularity in an Artificial Intelligence-Based Attacker-Defender Game

Because combat environments change over time and technology upgrades are widespread for ground vehicles, a large number of vehicles and equipment become quickly obsolete. A possible solution for the U.S. Army is to develop fleets of modular military vehicles, which are built by interchangeable substantial components also known as modules. One of the typical characteristics of module is their ease of assembly and disassembly through simple means such as plug-in/pull-out actions, which allows for real-time fleet reconfiguration to meet dynamic demands. Moreover, military demands are time-varying and highly stochastic because commanders keep reacting to enemy’s actions. To capture these characteristics, we formulated an intelligent agent-based model to imitate decision making process during fleet operation, which combines real-time optimization with artificial intelligence. The agents are capable of inferring enemy’s future move based on historical data and optimize dispatch/operation decisions accordingly. We implement our model to simulate an attacker-defender game between two adversarial and intelligent players, representing the commanders from modularized fleet and conventional fleet respectively. Given the same level of combat resources and intelligence, we highlight the tactical advantages of fleet modularity in terms of win rate, unpredictability and suffered damage.

## How Do Fairness Definitions Fare? Examining Public Attitudes Towards Algorithmic Definitions of Fairness

Nripsuta Saxena, Karen Huang, Evan DeFilippis, Goran Radanovic, David Parkes, Yang Liu

What is the best way to define algorithmic fairness? There has been much recent debate on algorithmic fairness. While many definitions of fairness have been proposed in the computer science literature, there is no clear agreement over a particular definition. In this work, we investigate ordinary people’s perceptions of three of these fairness definitions. Across two online experiments, we test which definitions people perceive to be the fairest in the context of loan decisions, and whether those fairness perceptions change with the addition of sensitive information (i.e., race of the loan applicants). We find a clear preference for one definition, and the general results seem to align with the principle of affirmative action.

## Stovepiping and Malicious Software: A Critical Review of AGI Containment

Jason M. Pittman, Jesus P. Espinoza, Courtney Soboleski Crosby

Awareness of the possible impacts associated with artificial intelligence has risen in proportion to progress in the field. While there are tremendous benefits to society, many argue that there are just as many, if not more, concerns related to advanced forms of artificial intelligence. Accordingly, research into methods to develop artificial intelligence safely is increasingly important. In this paper, we provide an overview of one such safety paradigm: containment with a critical lens aimed toward generative adversarial networks and potentially malicious artificial intelligence. Additionally, we illuminate the potential for a developmental blindspot in the stovepiping of containment mechanisms.

## A Microprocessor implemented in 65nm CMOS with Configurable and Bit-scalable Accelerator for Programmable In-memory Computing

Hongyang Jia, Yinqi Tang, Hossein Valavi, Jintao Zhang, Naveen Verma

This paper presents a programmable in-memory-computing processor, demonstrated in a 65nm CMOS technology. For data-centric workloads, such as deep neural networks, data movement often dominates when implemented with today’s computing architectures. This has motivated spatial architectures, where the arrangement of data-storage and compute hardware is distributed and explicitly aligned to the computation dataflow, most notably for matrix-vector multiplication. In-memory computing is a spatial architecture where processing elements correspond to dense bit cells, providing local storage and compute, typically employing analog operation. Though this raises the potential for high energy efficiency and throughput, analog operation has significantly limited robustness, scale, and programmability. This paper describes a 590kb in-memory-computing accelerator integrated in a programmable processor architecture, by exploiting recent approaches to charge-domain in-memory computing. The architecture takes the approach of tight coupling with an embedded CPU, through accelerator interfaces enabling integration in the standard processor memory space. Additionally, a near-memory-computing datapath both enables diverse computations locally, to address operations required across applications, and enables bit-precision scalability for matrix/input-vector elements, through a bit-parallel/bit-serial (BP/BS) scheme. Chip measurements show an energy efficiency of 152/297 1b-TOPS/W and throughput of 4.7/1.9 1b-TOPS (scaling linearly with the matrix/input-vector element precisions) at VDD of 1.2/0.85V. Neural network demonstrations with 1-b/4-b weights and activations for CIFAR-10 classification consume 5.3/105.2 $\u03bc$J/image at 176/23 fps, with accuracy at the level of digital/software implementation (89.3/92.4 $\\%$ accuracy).

## Evidence Transfer for Improving Clustering Tasks Using External Categorical Evidence

Athanasios Davvetas, Iraklis A. Klampanos, Vangelis Karkaletsis

In this paper we introduce evidence transfer for clustering, a deep learning method that can incrementally manipulate the latent representations of an autoencoder, according to external categorical evidence, in order to improve a clustering outcome. It is deployed on a baseline solution to reduce the cross entropy between the external evidence and an extension of the latent space. By evidence transfer we define the process by which the categorical outcome of an external, auxiliary task is exploited to improve a primary task, in this case representation learning for clustering. Our proposed method makes no assumptions regarding the categorical evidence presented, nor the structure of the latent space. We compare our method, against the baseline solution by performing k-means clustering before and after its deployment. Experiments with three different kinds of evidence show that our method effectively manipulates the latent representations when introduced with real corresponding evidence, while remaining robust when presented with low quality evidence.

## Learning Semantic Representations for Novel Words: Leveraging Both Form and Context

Timo Schick, Hinrich Sch\xfctze

Word embeddings are a key component of high-performing natural language processing (NLP) systems, but it remains a challenge to learn good representations for novel words on the fly, i.e., for words that did not occur in the training data. The general problem setting is that word embeddings are induced on an unlabeled training corpus and then a model is trained that embeds novel words into this induced embedding space. Currently, two approaches for learning embeddings of novel words exist: (i) learning an embedding from the novel word’s surface-form (e.g., subword n-grams) and (ii) learning an embedding from the context in which it occurs. In this paper, we propose an architecture that leverages both sources of information – surface-form and context – and show that it results in large increases in embedding quality. Our architecture obtains state-of-the-art results on the Definitional Nonce and Contextual Rare Words datasets. As input, we only require an embedding set and an unlabeled corpus for training our architecture to produce embeddings appropriate for the induced embedding space. Thus, our model can easily be integrated into any existing NLP system and enhance its capability to handle novel words.

## Sample-Efficient Policy Learning based on Completely Behavior Cloning

Qiming Zou, Ling Wang, Ke Lu, Yu Li

Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In addition to that, a partially trained policy tends to perform dangerous action to agent and environment. In order to overcome these challenges, this paper proposed a policy initialization algorithm called Policy Learning based on Completely Behavior Cloning (PLCBC). PLCBC first transforms the Model Predictive Control (MPC) controller into a piecewise affine (PWA) function using multi-parametric programming, and uses a neural network to express this function. By this way, PLCBC can completely clone the MPC controller without any performance loss, and is totally training-free. The experiments show that this initialization strategy can help agent learn at the high reward state region, and converge faster and better.

## Deterministic and stochastic inexact regularization algorithms for nonconvex optimization with optimal complexity

S. Bellavia, G. Gurioli, B. Morini, Ph.L. Toint

A regularization algorithm using inexact function values and inexact derivatives is proposed and its evaluation complexity analyzed. This algorithm is applicable to unconstrained problems and to problems with inexpensive constraints (that is constraints whose evaluation and enforcement has negligible cost) under the assumption that the derivative of highest degree is $\u03b2$-H\xf6lder continuous. It features a very flexible adaptive mechanism for determining the inexactness which is allowed, at each iteration, when computing objective function values and derivatives. The complexity analysis covers arbitrary optimality order and arbitrary degree of available approximate derivatives. It extends results of Cartis, Gould and Toint (2018) on the evaluation complexity to the inexact case: if a $q$th order minimizer is sought using approximations to the first $p$ derivatives, it is proved that a suitable approximate minimizer within $\u03b5$ is computed by the proposed algorithm in at most $O(\u03b5^{-\\frac{p+\u03b2}{p-q+\u03b2}})$ iterations and at most $O(|\\log(\u03b5)|\u03b5^{-\\frac{p+\u03b2}{p-q+\u03b2}})$ approximate evaluations. While the proposed framework remains so far conceptual for high degrees and orders, it is shown to yield simple and computationally realistic inexact methods when specialized to the unconstrained and bound-constrained first- and second-order cases. The deterministic complexity results are finally extended to the stochastic context, yielding adaptive sample-size rules for subsampling methods typical of machine learning.