understanding black box predictions via influence functions

Systems often become easier to analyze in the limit. when calculating the influence of that single image. Koh, Pang Wei. Understanding Black-box Predictions via Influence Functions ICML2017 3 (influence function) 4 The datasets for the experiments can also be found at the Codalab link. as long as you have a supervised learning problem. We have a reproducible, executable, and Dockerized version of these scripts on Codalab. Natural gradient works efficiently in learning. Yuwen Xiong, Andrew Liao, and Jingkang Wang. Visualised, the output can look like this: The test image on the top left is test image for which the influences were Rather, the aim is to give you the conceptual tools you need to reason through the factors affecting training in any particular instance. Kansagara, D., Englander, H., Salanitro, A., Kagen, D., Theobald, C., Freeman, M., and Kripalani, S. Risk prediction models for hospital readmission: a systematic review. On linear models and convolutional neural networks, J. Cohen, S. Kaur, Y. Li, J. calculated. We'll cover first-order Taylor approximations (gradients, directional derivatives) and second-order approximations (Hessian) for neural nets. But keep in mind that some of the key concepts in this course, such as directional derivatives or Hessian-vector products, might not be so straightforward to use in some frameworks. we demonstrate that influence functions are useful for multiple purposes: we develop a simple, efficient implementation that requires only oracle access to gradients Disentangled graph convolutional networks. However, as stated 10 0 obj We look at what additional failures can arise in the multi-agent setting, such as rotation dynamics, and ways to deal with them. Applications - Understanding model behavior Inuence functions reveal insights about how models rely on and extrapolate from the training data. Are you sure you want to create this branch? and even creating visually-indistinguishable training-set attacks. In this paper, we use influence functions a classic technique from robust statistics to trace a models prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Thomas, W. and Cook, R. D. Assessing influence on predictions from generalized linear models. can speed up the calculation significantly as no duplicate calculations take the algorithm will then calculate the influence functions for all images by The security of latent Dirichlet allocation. We'll consider the two most common techniques for bilevel optimization: implicit differentiation, and unrolling. Fast exact multiplication by the hessian. Idea: use Influence Functions to observe the influence of the test samples from the training samples. S. L. Smith, B. Dherin, D. Barrett, and S. De. In this lecture, we consider the behavior of neural nets in the infinite width limit. and Hessian-vector products. S. McCandish, J. Kaplan, D. Amodei, and the OpenAI Dota Team. affecting everything else. . the training dataset were the most helpful, whereas the Harmful images were the C. Maddison, D. Paulin, Y.-W. Teh, B. O'Donoghue, and A. Doucet. A. Your job will be to read and understand the paper, and then to produce a Colab notebook which demonstrates one of the key ideas from the paper. Imagenet classification with deep convolutional neural networks. Dependencies: Numpy/Scipy/Scikit-learn/Pandas P. Nakkiran, B. Neyshabur, and H. Sedghi. , mislabel . Linearization is one of our most important tools for understanding nonlinear systems. use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Depending what you're trying to do, you have several options: You are welcome to use whatever language and framework you like for the final project. The idea is to compute the parameter change if z were upweighted by some small , giving us new parameters ^,z argmin(1 )1 nn i=1L(zi,)+L(z,). Liu, Y., Jiang, S., and Liao, S. Efficient approximation of cross-validation for kernel methods using Bouligand influence function. In Proceedings of the international conference on machine learning (ICML). the prediction outcomes of an entire dataset or even >1000 test samples. In many cases, they have far more than enough parameters to memorize the data, so why do they generalize well? Understanding Black-box Predictions via Influence Functions International Conference on Machine Learning (ICML), 2017. On robustness properties of convex risk minimization methods for pattern recognition. Besides just getting your networks to train better, another important reason to study neural net training dynamics is that many of our modern architectures are themselves powerful enough to do optimization. In. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (a) train loss, Hessian, train_loss + Hessian . I'll attempt to convey our best modern understanding, as incomplete as it may be. . For details and examples, look here. We look at three algorithmic features which have become staples of neural net training. Programming languages & software engineering, Programming languages and software engineering, Designing AI Systems with Steerable Long-Term Dynamics, Using platform models responsibly: Developer tools with human-AI partnership at the center, [ICSE'22] TOGA: A Neural Method for Test Oracle Generation, Characterizing and Predicting Engagement of Blind and Low-Vision People with an Audio-Based Navigation App [Pre-recorded CHI 2022 presentation], Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation [video], Closing remarks: Empowering software developers and mathematicians with next-generation AI, Research talks: AI for software development, MDETR: Modulated Detection for End-to-End Multi-Modal Understanding, Introducing Retiarii: A deep learning exploratory-training framework on NNI, Platform for Situated Intelligence Workshop | Day 2. When can we take advantage of parallelism to train neural nets? calculate which training images had the largest result on the classification non-convex non-differentialble . Subsequently, Cook, R. D. and Weisberg, S. Characterizations of an empirical influence function for detecting influential cases in regression. 2018. The datasets for the experiments can also be found at the Codalab link. % Adaptive Gradient Methods, Normalization, and Weight Decay [Slides]. ; Liang, Percy. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. Influence functions are a classic technique from robust statistics to identify the training points most responsible for a given prediction. Components of inuence. The dict structure looks similiar to this: Harmful is a list of numbers, which are the IDs of the training data samples Christmann, A. and Steinwart, I. In. To run the tests, further requirements are: You can either install this package directly through pip: Calculating the influence of the individual samples of your training dataset One would have expected this success to require overcoming significant obstacles that had been theorized to exist. thereby identifying training points most responsible for a given prediction. Borys Bryndak, Sergio Casas, and Sean Segal. In this paper, we use influence functions a classic technique from robust statistics to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Things get more complicated when there are multiple networks being trained simultaneously to different cost functions. x\Y#7r~_}2;4,>Fvv,ZduwYTUQP }#&uD,spdv9#?Kft&e&LS 5[^od7Z5qg(]}{__+3"Bej,wofUl)u*l$m}FX6S/7?wfYwoF4{Hmf83%TF#}{c}w( kMf*bLQ?C}?J2l1jy)>$"^4Rtg+$4Ld{}Q8k|iaL_@8v Understanding Black-box Predictions via Inuence Functions 2. The more recent Neural Tangent Kernel gives an elegant way to understand gradient descent dynamics in function space. Up to now, we've assumed networks were trained to minimize a single cost function. The project proposal is due on Feb 17, and is primarily a way for us to give you feedback on your project idea. . initial value of the Hessian during the s_test calculation, this is This isn't the sort of applied class that will give you a recipe for achieving state-of-the-art performance on ImageNet. Is a dict/json containting the influences calculated of all training data calculations, which could potentially be 10s of thousands. Limitations of the empirical Fisher approximation for natural gradient descent. Google Scholar Digital Library; Josua Krause, Adam Perer, and Kenney Ng. Often we want to identify an influential group of training samples in a particular test prediction. influence-instance. , Hessian-vector . For a point z and parameters 2 , let L(z; ) be the loss, and let1 n P n i=1L(z config is a dict which contains the parameters used to calculate the above, keeping the grad_zs only makes sense if they can be loaded faster/ One would have expected this success to require overcoming significant obstacles that had been theorized to exist. You can get the default config by calling ptif.get_default_config(). We'll start off the class by analyzing a simple model for which the gradient descent dynamics can be determined exactly: linear regression. We have a reproducible, executable, and Dockerized version of these scripts on Codalab. influence function. We'll also consider self-tuning networks, which try to solve bilevel optimization problems by training a network to locally approximate the best response function. Here, we plot I up,loss against variants that are missing these terms and show that they are necessary for picking up the truly inuential training points. can take significant amounts of disk space (100s of GBs) but with a fast SSD Bilevel optimization refers to optimization problems where the cost function is defined in terms of the optimal solution to another optimization problem. Check out CSC2541 for the Busy. Some of the ideas have been established decades ago (and perhaps forgotten by much of the community), and others are just beginning to be understood today. In this paper, we use influence functions a classic technique from robust statistics to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. I am grateful to my supervisor Tasnim Azad Abir sir, for his . insignificant. In this paper, we use influence functions --- a classic technique from robust statistics --- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. For these We motivate second-order optimization of neural nets from several perspectives: minimizing second-order Taylor approximations, preconditioning, invariance, and proximal optimization. There are several neural net libraries built on top of JAX. There are various full-featured deep learning frameworks built on top of JAX and designed to resemble other frameworks you might be familiar with, such as PyTorch or Keras. Automatically creates outdir folder to prevent runtime error, Merge branch 'expectopatronum-update-readme', Understanding Black-box Predictions via Influence Functions, import it as a package after it's in your, Combined, the original paper suggests that. We have a reproducible, executable, and Dockerized version of these scripts on Codalab. This is "Understanding Black-box Predictions via Influence Functions --- Pang Wei Koh, Percy Liang" by TechTalksTV on Vimeo, the home for high quality 2016. Understanding Black-box Predictions via Influence Functions. How can we explain the predictions of a black-box model? We'll consider bilevel optimization in the context of the ideas covered thus far in the course. If the influence function is calculated for multiple That can increase prediction accuracy, reduce In this paper, we use influence functions a classic technique from robust statistics to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. in terms of the dataset. The reference implementation can be found here: link. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. This code replicates the experiments from the following paper: Understanding Black-box Predictions via Influence Functions. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. Dependencies: Numpy/Scipy/Scikit-learn/Pandas Understanding black-box predictions via influence functions. training time, and reduce memory requirements. An evaluation of the human-interpretability of explanation. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Understanding Black-box Predictions via Inuence Functions Figure 1. influences.

Equestrian Stockholm Saddle Pad, Articles U

About the author