Programming Tools with Big Data and Conditional Random Fields
A new research paper showing how to build programming tools based on probabilistic models learned from massive codebases is about to appear at the ACM Principles of Programming Languages Conference, 2015 (ACM POPL’15). The paper presents a machine learning framework for predicting facts about programs based on probabilistic graphical models. The full PDF of the paper can be found here:
Many software developers spend tremendous amounts of time making their code readable, extensible and maintainable by others. Achieving this state in the usual fast pace of software development is difficult and getting there is often considered an art form. Yet, it can be the difference between a product becoming a success or failing miserably in bloat.
How does it work?The general approach (as well as JSNice) is based on state-of-the-art machine learning:
- Conditional Random Fields (CRFs) as a general framework for learning from code. CRFs are graphical models which are tremendously popular in image processing and natural language processing. This work pioneers CRFs in the domain of programs.
- Fast prediction algorithms that take into account the existing names and types in order to predict new names and types. Such algorithms are also known as MAP inference.
- Maximum-margin training based on state-of-the-art efficient learning techniques from Support Vector Machines.
- An efficient, scalable and parallel implementation that learns from massive amounts of code quickly.
More InformationMore information on this line of work, including talks, papers and slides, can be found at: http://www.srl.inf.ethz.ch/spas.php
Feedback / Comments / Questions?
We are working on extending JSNice with other capabilities that make the life of a developer much easier and we are looking forward to your feedback for suggestions. For any comments/question contact: Veselin Raychev and Martin Vechev