My journey with machine learning —10 years and counting
I had my first experience with machine learning 10 years ago when I was part of a team of development managers and test architects. This team was working on a prediction model for estimating testing effort for various types of software defects.
Like any typical systems integrator team, we provided testing services for both planned development releases and unplanned maintenance releases. Our struggle was with the maintenance releases. Customers often requested support for maintenance releases toward the end of the business day, with a window of one or two days for planning and execution. That left us with limited time to analyze the defects (generally more than 50 per release) and estimate the corresponding testing effort. Pulling experts away from other critical work for estimation was not always feasible. We had to bring method to the madness.
We discussed many techniques before shortlisting one that involved statistical analysis of the defect data. We asked ourselves this: Was there a pattern hidden in the defect data that we had to retrieve and use? In retrospect, we were working on very primitive machine learning logic and our algorithm was rudimentary. But at the time, all we knew was that we were trying to make sense out of dumb data.
We took five years’ worth of defect data and ran a few statistical methods on that data by using a MATLAB tool. The tool produced a mathematical formula that allowed us to determine testing effort for a given number of defects and their impact on applications:
Effort = f(n) + f(I) + f(A)
where n = number of defects, I = impact of defect (high, medium, or low) and A = application type.
We then applied the formula to several past releases to validate it and found that it had an accuracy rate of almost 90% when compared to the actual testing effort spent in those releases. This was a significant achievement, considering the herculean task of combing through years of defect data and running a statistical method on it.
This formula subsequently became our holy grail for estimating regression testing effort. We called this tool the Regression Cost and Effort Calculator.
The next time a customer gave us a list of defects, we only had to analyze the defects —the rest was done by the tool, with no manual effort needed. I later published a white paper on this approach at an international testing conference.
In the ensuing 10 years, machine learning has evolved significantly. However, the core principle of machine learning remains unchanged: use the theory of statistics to build mаthеmаtісаl models to make an inference from a sample. You can now create strong algorithms that can learn from data patterns and the environment. These algorithms are also known as intelligent agents because of their learning behavior.
Today, Mindtree is doing some very exciting, innovative work in the machine learning and artificial intelligence (AI) space especially in terms of software testing. As mentioned in my previous article, How is AI helping software testing?, several aspects of testing can be significantly improved by using AI.
Please talk to me if you want to know more about our AI initiatives in testing.