Interesting research topics from ICML 2018

Chan. Y. PARK

2019.07.29 | Research

full

This is an adaptation of my talks at Seoul National University and Seoul AI meetup, and my article in Kakao AI report. 1

1. ML and security

Security is arguably the top priority in the tech industry. With widespread application of data science and machine learning for big data including private and corporate information, new security issues and their solutions are becoming ever important. At ICML 2018 it was evident that machine learning community shares the same concern.

keynote by D.Song

Dawn Song’s keynote speech, titled “AI and Security: Lessons, Challenges and Future Directions” demonstrated in what way new machine learning technologies can help in solving traditional security concerns with internet-of-things(IoT) device vulnerability detection using deep learning. She also showed what kind of new security problems can arise with examples of adversarial attack on computer vision systems.

full
full
full
full

AI and security a

full

Vulnerability detection in IoT devices 1

full

Adversarial examples a

Among various issues that are raised in the keynote, the most resonant one was that if the capacity of a machine learning model is large enough to memorize all the training data, and also if the training of the model is actually done in the way that it memorizes the data rather than learning meaningfully encoded representation, whether we can pull out the training data out of the model by querying the model or not. Understanding how deep models process input data and store its representations is still very much in progress, and considering many of deep models have the capability to memorize random examples 2, it is easy to see how important it is to answer this question when applying machine learning to sensitive data such as medical data and personal credit information.

Best paper award, N. Carlini

One of the best paper awards 3 is also given to the research on the security of machine learning. During the best paper award talk, Carlini said “the threat model must assume the attacker has read the paper and knows the defender is using those techniques to defend,” which is in other words machine learning security research should be performed as real hackers do their job. Also, his argument that hacks on deep learning models will be done by reading all the research papers on the models, understanding their details, and then attacking their vulnerability was very persuasive.

Debate

One of the Saturday workshops, “ICML 2018: THE DEBATES”, was a series of panel discussions covering various topics in machine learning. Among four propositions, one was about the vulnerability of machine learning systems, and one of the topics surfaced while debating the proposition was whether we should put limited research resources solely to cutting-edge machine learning technologies or put aside from the beginning a part of the resources for enhancing the security of machine learning. Of course a single day of discussion will not suffice to lead to a conclusive agreement, but the very fact that machine learning community shares the importance of security and exchanges ideas on how to cope with the threat shows the healthiness of the community.

2. Fair ML

Machine learning models inevitably learns various social biases from training data unless regularized otherwise. This fact should be considered when applying machine learning to real word problems and has indeed been emphasized since the renaissance of machine learning following the revolutionary progress in deep learning in recent years.

Meanwhile, even before the turning point in machine learning, researchers in social science have been very progressive in utilizing machine learning in their works. For example, in Hanna Wallach’s 2016 Scientific Python Conference (SciPy) keynote, “Machine Learning for Social Science”, she illustrates how machine learning helps the research of social science. In the past, it has been quite difficult to analyze many government documents that have been publicly available for a long time because of the sheer amount of data despite the documents being considered to be highly valuable. But thanks to the development of natural language processing such as Latent Dirichlet Allocation (LDA), nowadays it is much easier to excavate a relevant document from the huge pile. Also, many researchers in econometrics and microeconomics who are actively working at the intersection of social science, mathematics, and data science have benefited from the development of machine learning and deep learning. The importance of machine learning and data science in political science also keeps gaining momentum as data-based campaign got much attention during the 2008 US presidential election.

The issue of how to measure the influence of machine learning onto society and how to alleviate adverse effects while capitalizing on beneficial results goes beyond the scope of machine learning community alone. So it is crucial for machine learning researchers to cooperate with other research communities that have been put much time and effort in those topics. The other best paper of ICML 2018 4, which considers the fairness of machine learning in a quantitative manner, is in line with such a direction. And there was also a workshop titled “Fairness, Accountability, and Transparency in Machine Learning”. All these efforts show positive prospect on harmonious and productive cooperation between social science and machine learning.

full
full
full

Fair ML b

3. Bayesian inference in the era of big data

The frontier of machine learning had been arguably Bayesian inference before the advance of GPGPU(general-purpose computing on graphics processing units) technology and the following renaissance of deep learning. Even nowadays machine learning models based on Bayesian inference, compared to their deep learning counterparts, has the advantage in building an interpretable model including domain expert knowledge with relatively small number of data points. But such models do not scale as the size of the training dataset increases. Still, thanks to its merits Bayesian inference is going strong in various academic researches such as exoplanet discovery, gravitational wave detection, and also in many industrial sectors including search for plane wreckage, traffic analysis, microcredit, wildlife conservation, and jet engine analysis.
One of the tutorials, “Variational Bayes and Beyond: Bayesian Inference for Big Data” by Broderick, was about overcoming such difficulty in Bayesian inference. Instead of doing approximate Bayesian inference, she focused on exact Bayesian inference via effective sampling of data, which she calls Bayesian coreset.
full

Various applications of Bayesian inference c

Max Welling, one of the keynote speakers, has studied various applications of Bayesian inference on deep learning. In his keynote talk, “Intelligence per Kilowatthour”, in order for current deep learning research to be applicable to the real world, he said “value created by AI must exceed the cost to run the service.” And he suggested model compression using Bayesian inference as one of the candidate solutions of the problem.
full

Energy comsumptions of various deep models d

4. Theory of deep learning

When compared with traditional computer algorithms and data structures, recent deep learning models lacks understanding of why they work. In other words, we do not know why deep models learn better from data, what capacity they have, how they can generalize to unseen data, and what architecture they require to learn from data.

To make theoretical analyses for such questions, it is necessary to formulate a simple but mathematically rigorous problem that admits a quantitative study. One of such research efforts from this ICML was about identifying conditions under which the training error of neural networks for binary classification is zero at all local minima 5. The authors concluded that the sufficient conditions are 1) the activation function have to be increasing and strictly convex, 2) the neural network should either be single-layered or is multi-layered with a shortcut-
like connection, and 3) the loss function should be a smooth version of hinge loss. But among deep neural network architectures that are widely used these days, no one does not satisfy all the above conditions. This demonstrates the limit of the current theoretical understanding of deep learning.

Another theoretical research from this ICML was about identifying a connection between a class of neural networks and tropical algebraic geometry 6, which shows that the family of feedforward neural networks with ReLU activation and integer weight parameters are equivalent to the family of tropical rational maps. Albeit a beautiful result, this applies only to very simple models.

In Arora’s tutorial, titled “Toward Theoretical Understanding of Deep Learning”, he pointed out such limitations of current theoretical understanding of deep learning. It should be emphasized, however, that we should not discount those efforts in theoretical studies, as each of them serves as a stepping stone toward a breakthrough of theory of deep learning. But he also stressed on the theoretical research should try to provide model architectures and optimization methods that efficiently utilize training data, rather than just to understand existing ones, which he called “post-mortem” studies. Toward the end of the tutorial, he mentioned that insights from physics such as calculus of variations in Lagrangians and Hamiltonians can play an important role, which was music to my physicist ears.

5. Representation learning of deep learning

One reason behind the current success of deep learning is that, compared to traditional machine learning models whose features should be hand-engineered, deep models can learn features from data during training time via back-propagation, which frees researchers from feature engineering and enables them to focus on designing a model architecture and an objective function. But this poses an issue that such features are often hard to interpret even for someone who designed the model, and it is an active area of research to relate such features to human-interpretable representations.

Deep learning models such as convolutional neural network have been highly successful in learning data like images that are regularly distributed on a Euclidean space. But for deep models to ingest data such as a 2-dimensional surface embedded in 3-dimensional space or a graph, which is one of the important data structures in computer science, it is necessary to extend the domain of a deep model to non-Euclidean space. An umbrella term covering such area of research is called geometric deep learning.7

Two papers regarding embedding data with a hierarchical structure into hyperbolic geometry were presented during this ICML. One was a follow-up of the first deep learning research using hyperbolic geometry 8, which proposes more efficient Riemannian optimization by replacing the Poincare model of a hyperbolic space with the Lorentz model 9. Another was about defining a hyperbolic entailment cone in a closed form and utilizing it in an efficient embedding of directed acyclic graphs 10.

Other interesting papers on geometric deep learning include improving graph convolutional networks 11, 12 , which are deep learning models for graph data, and deep generative model for realistic graphs 13 14.

Another idea of using geometry for better representation learning is preserving geometric symmetries of the features of a model so that the model can learn more information with less number of features. Equivariance plays a central role here, which is mathematically defined like the following.

where is a nonlinear function corresponding to the whole deep neural network or a part of it, is the input of the network, is an element of the symmetry group of the input space, and is the corresponding transformation. In the workshop titled “Towards learning with limited labels: Equivariance, Invariance, and Beyond”, there have been efforts 15 16 to design a model architecture that preserves equivariances as much as possible, and at the same time there was an ongoing research of applying the framework of principal G-bundle that are widely used in differential geometry to develop an overarching theory. All these topics are at their early stages but are expected to evolve into fruitful research areas.

To better understand a generative model, it is important to disentangle its representations in the latent space. There have been several results in this area of research, including a new training method 17, a new model architecture 18, and a new training objective 19. And many researches covering the topic were presented in a workshop titled “Theoretical Foundations and Applications of Deep Generative Models”.

6. Replacing heuristics with machine learning

A Heuristic method is frequently used in computer algorithms and data structures to find the best estimate of an exact answer when it is hard to obtain in practice. In a sense, designing an algorithm or a data structure can be regarded as a heuristic from researchers’ previous experience and insights on the problem that it intends to solve.

Therefore, with the progress of machine learning, it is natural to think of replacing heuristics with learned models, as it enables researchers to put efforts not on the detail of every step of a heuristic but on the design of a suitable training objective, which can reduce the time and effort in trial and error that is often required when finding heuristics.

There have been several interesting research results prior to this ICML in this direction of research. One was about optimizing the partitioning of a computational graph of a deep model and its distribution over multiple GPU nodes to reduce the overall execution time 20. Another was replacing B-tree structures that are widely used in databases and file systems with learned models 21. The second one is especially interesting because the authors compare the pros and cons of the traditional index structure and the learned index structure, and furthermore presents a model architecture that uses both structures thereby suggesting a universal approach to augment an existing data structure with a learned model.

One way to improve heuristics with machine learning is applying imitation learning to train an agent using a heuristic method as a sub-optimal expert. One of this ICML’s tutorials, “Imitation Learning” by Yue, covered various researches on imitation learning and their applications that, when used in a complementary way with reinforcement learning, are expected to result in the improvement of heuristics.

Heuristics exist not only in software but also in various hardwares of modern computer systems. With the end of Moore’s law, it is quite difficult to increase the clock speed of a single CPU core, so finding out a better cache policy by improving the memory hierarchy of a computer system, which greatly affects the performance of the system, is a very promising direction of research albeit many expected difficulties due to latency requirements and OS kernel security issues. One of such research efforts was presented during this ICML, in which the authors consider the problem of cache prediction as an N-gram problem and suggest a model architecture that learns using the deltas of cache miss addresses as inputs of an LSTM 22. Even the author who presented the paper admitted that the results gave more questions and answers, but it surely is a beginning of an interesting direction of research toward combining deep learning and computer hardware.

Using machine learning in improving computer hardware has been actively pursued even before the current wave of deep learning development, for example in 2001 there was a research using neural networks and decision trees in branch predictions 23, and in 2008 there was a very extensive and promising study of improving DRAM memory controller using reinforcement learning 24. But if deep learning models can be successfully plugged into computer hardware, then even more impressive progress will be made.

7. Conclusion

Compared to ICML 2016 that I participated last time, ICML 2018 was reflecting the maturity of machine learning research rather than presenting eye-widening breakthroughs. It will be interesting to see whether NIPS 2018 will also continues the trend, or rather come up with surprising research results.

Reference

1

https://brunch.co.kr/@kakao-it/296

a

AI and Security: Lessons, Challenges and Future Directions, D. Song, ICML, 2018

2

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals, Understanding deep learning requires rethinking generalization, https://arxiv.org/abs/1611.03530

3

Athalye, A., Carlini, N., Wagner, D. (2018). Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:274-283

4

Delayed Impact of Fair Machine Learning, Lydia Liu et al., ICML 2018

b

Liu, L.T., Best Paper Session 2, ICML 2018

c

T. Broderick, Variational Bayes and Beyond: Bayesian Inference for Big Data, ICML 2018

d

M. Welling, Intelligence per Kilowatthour, ICML 2018

5

Liang, S., Sun, R., Li, Y. & Srikant, R.. (2018). Understanding the Loss Surface of Neural Networks for Binary Classification. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:2835-2843

6

Zhang, L., Naitzat, G. & Lim, L.. (2018). Tropical Geometry of Deep Neural Networks. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:5824-5832

7

Bronstein, M. M., Bruna, J., Lecun, Y., Szlam, A., & Vandergheynst, P. (2017). Geometric Deep Learning: Going beyond Euclidean data. IEEE Signal Processing Magazine, 34(4), 18-42. doi:10.1109/msp.2017.2693418

8

Nickel, M., Kiela, D.. Poincaré Embeddings for Learning Hierarchical Representations, NIPS 2017

9

Nickel, M. & Kiela, D.. (2018). Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:3779-3788

10

Ganea, O., Becigneul, G. & Hofmann, T.. (2018). Hyperbolic Entailment Cones for Learning Hierarchical Embeddings. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:1646-1655

11

Chen, J., Zhu, J. & Song, L.. (2018). Stochastic Training of Graph Convolutional Networks with Variance Reduction. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:942-950

12

Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K. & Jegelka, S.. (2018). Representation Learning on Graphs with Jumping Knowledge Networks. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:5453-5462

13

Bojchevski, A., Shchur, O., Zügner, D. & Günnemann, S.. (2018). NetGAN: Generating Graphs via Random Walks. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:610-619

14

You, J., Ying, R., Ren, X., Hamilton, W. & Leskovec, J.. (2018). GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:5708-5717

15

T.S. Cohen, M. Welling, Group Equivariant Convolutional Networks. Proceedings of the International Conference on Machine Learning (ICML), 2016

16

Sabour, S., Frosst, N., Hinton, G.E. (2017). Dynamic Routing Between Capsules. NIPS 2017

17

Kamnitsas, K., Castro, D., Folgoc, L.L., Walker, I., Tanno, R., Rueckert, D., Glocker, B., Criminisi, A. & Nori, A.. (2018). Semi-Supervised Learning via Compact Latent Space Clustering. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:2459-2468

18

Parascandolo, G., Kilbertus, N., Rojas-Carulla, M. & Schölkopf, B.. (2018). Learning Independent Causal Mechanisms. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:4036-4044

19

Kim, H. & Mnih, A.. (2018). Disentangling by Factorising. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:2649-2658

20

Mirhoseini, A., Pham, H., Le, Q.V., Steiner, B., Larsen, R., Zhou, Y., Kumar, N., Norouzi, M., Bengio, S. & Dean, J.. (2017). Device Placement Optimization with Reinforcement Learning. Proceedings of the 34th International Conference on Machine Learning, in PMLR 70:2430-2439

21

Kraska, T., Beutel, A., Chi, E. H., Dean, J., Polyzotis, N. (2018). The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). ACM, New York, NY, USA, 489-504. DOI: https://doi.org/10.1145/3183713.3196909

22

Hashemi, M., Swersky, K., Smith, J., Ayers, G., Litz, H., Chang, J., Kozyrakis, C. & Ranganathan, P.. (2018). Learning Memory Access Patterns. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:1919-1928

23

Jimenez, D. A., Lin, C. Dynamic branch prediction with perceptrons. In HPCA, pp. 197–206, 2001.

24

Ipek, E., Mutlu, O., Martínez, J. F., Caruana, R., Self-Optimizing Memory Controllers: A Reinforcement Learning Approach, 2008 International Symposium on Computer Architecture, Beijing, 2008, pp. 39-50. doi: 10.1109/ISCA.2008.21

박찬연

Chan. Y. PARK

Team Lead