PLDI'15 Tutorial: Machine Learning for Code Analytics

June 14, 2015 (Afternoon)
Veselin Raychev, Martin Vechev
3 hours
2pm-3:30pm and 4pm - 6pm
PLDI WT G (Oregon Convention Center)


The increased availability of massive codebases, sometimes referred to as "Big Code", creates a unique opportunity for new kinds of programming tools and techniques based on statistical models. These approaches will extract useful information from existing codebases and will use that information to provide statistically likely solutions to problems that are difficult or impossible to solve with traditional techniques.

The tutorial is self-contained and will include both:
  • Theory: an introduction to several machine learning models suitable for learning from programs, and
  • Practice: a hands-on session showing how to apply the theory for building statistical programming tools.
    For this task, we will use the recently released Nice2Predict framework.

Tutorial Slides

The slides for the tutorial are available here: PDF


By the end of the tutorial, the participant should have:
  • Learned the fundamentals, pros and cons of several machine learning models.
  • Learned how to combine these models with programming languages concepts.
  • Learned how to build a statistical programming tool using the concepts in the tutorial.

Tutorial Outline:

Theory (Part I)

  • Machine Learning Models: Graphical models (e.g., Markov Networks, Conditional Random Fields) and Language models (e.g., n-gram models)
  • Prediction/Inference: MAP inference, Max-marginals, Belief Propagation, Optimal and Approximate algorithms.
  • Training/Learning: Discriminative and generative training; Structured SVM learning, Dealing with partition functions, Asymptotic complexity

Practice (Part II)

  • Learning from Programs with Graphical Models
  • Building a statistical programming tool using the Nice2Predict framework.