# [ICML 2019] Reading list of accepted papers

# [ICML 2019] Reading list of accepted papers

Time Series:

- Learning Hawkes Processes Under Synchronization Noise
- slides
- Multivariate Hawkes processes are used to model the occurrence of discrete events in continuous time. They are especially relevant when an arrival in one dimension can affect future arrivals in other dimensions (they are self-exciting and mutually exciting). Before this paper, the usual approach considers that observations are noiseless, that is the arrival times of the events are recorded accurately without any delay. Authors introduce a new approach for learning the causal structure of multivariate Hawkes processes when events are subject to random and unknown time shift. Each dimension can have a different but constant time shift of its observations. The idea of the paper is to define a new process, the desynchronized multivariate Hawkes process, which is parametrized by (z, theta), where z is the time shift noise (considered as parameters) and theta the standard parameters of the multivariate Hawkes process. Estimating these parameters using maximum likelihood has its challenges since the objective function is neither smooth nor continuous. To overcome this difficulty, authors propose to smooth the objective function by approximating the kernels (which create the discontinuities) by functions differentiable everywhere. Stochastic gradient descent is then applied to maximize the log likelihood.

- Deep Factors for Forecasting
- A Statistical Investigation of Long Memory in Language and Music
- Weakly-Supervised Temporal Localization via Occurrence Count Learning
- Recurrent Kalman Networks: Factorized Inference in High-Dimensional Deep Feature Spaces
- Imputing Missing Events in Continuous-Time Event Streams
- GitHub and related paper (Neural Hawkes Process), GitHub

Natural Language Processing:

- Analogies Explained: Towards Understanding Word Embeddings
- Parameter-Efficient Transfer Learning for NLP
- Deep Residual Output Layers for Neural Language Generation
- Improving Neural Language Modeling via Adversarial Training
- Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops
- MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization

New statistical distances, and other fancy stat-based methods:

- Optimal Transport for structured data with application on graphs
- Random Matrix Improved Covariance Estimation for a Large Class of Metrics
- Subspace Robust Wasserstein Distances

Clustering

- Supervised Hierarchical Clustering with Exponential Linkage
- COMIC: Multi-view Clustering Without Parameter Selection

Misc:

- Graph Matching Networks for Learning the Similarity of Graph Structured Objects
- The Evolved Transformer
- Efficient Training of BERT by Progressively Stacking
- Similarity of Neural Network Representations Revisited
- Data Shapley: Equitable Valuation of Data for Machine Learning
- This paper proposes to use Shapley values to quantify how valuable a data point is to the machine learning model. Recently, Shapley values have been used to quantify how important a feature is for interpreting black-box machine learning models. The aim here is different. Motivations are of an economic nature: Shapley values could be a way to remunerate ‘fairly’ people (or organizations) for contributing their data; The more useful (according to data Shapley) the data point for the problem and model at hand (given all the other data points already collected), the more money it is worth.

- Topological Data Analysis of Decision Boundaries with Application to Model Selection
- slides
- Another paper motivated by economic applications: matching vendor pre-trained models to customer data. From a technical perspective, to do so, authors extend the standard Topological Data Analysis (TDA) toolkit that works on point clouds of unlabeled data to labeled point clouds. Thus, they can study the complexity of supervised machine learning models decision boundaries. They find that when choosing a pre-trained network, one whose topological complexity matches that of the dataset yields good generalization. Therefore, on a model marketplace, vendors should report the topological complexity measures of their models, and customers should estimate these numbers on their data. Customers should choose the model whose topological complexity measures match the closest.

- Learning to Prove Theorems via Interacting with Proof Assistants
- slides
- GitHub
- Coq is a
*French*(Cocorico!) formal proof management system. It has been used to prove formally the four color theorem in 2005. The system has been developed and maintained by Inria, ENS de Lyon, Ecole Polytechnique since 1984. For those who already used it (or tried to), they know that it is not easy to formally prove theorems (at a very low-level), even helped by the interactive Coq system. It builds a lot on intuition and experience. I only use it for a couple of months at ENS, I was bad at it (unlike some other colleagues!). The paper proposes to remove the need of human experts that have to construct proofs by manually interacting with the proof assistant. To do so, authors propose a deep learning-based model that generates proofs. They leverage a dataset, CoqGym, containing 71K human-written proofs from 123 projects developed with the Coq proof assistant. Their model can effectively prove new theorems that were not provable by previous automated methods.