Data privacy with MORPH

When I first started my research into data privacy for software engineering, I focused on the literature for privacy-preserving data publishing and social data privacy work. Three main issues stood out for me:

  1. privacy could not be 100% guaranteed,

  2. privatizing data could damage the usefulness (utility) of the data,

  3. and the question of whether or not the privacy of one matters when the utility of the data provides benefits for all.

On the first point, I've accepted that privatized data will not provide perfect privacy. On the second point, if data could be privatized with minimal or no loss of utility, then we've made a great effort to lessen the issue of the third point about sacrificing one for the benefit of the whole.

Research into solving point 2 was done in conjunction with my advisor Dr. Tim Menzies.

Our first effort was MORPH, an instance mutator for privatizing numerical data. The intuition behind MORPH is to change the data enough to avoid information disclosure but not enough to degrade utility. For this research, utility was defect prediction.

Our conjecture was, if an instance or data point was labeled as defective, it's MORPHed version should not be,

  1. the same as an instance labeled as non-defective,

  2. and closer to any non-defective instances than defective ones - in other words, avoid crossing class boundaries.

We were able to accomplish this with the following equation:

Here xD, is the original data point to be changed, y is the resulting MORPHed data point, and zD is the nearest unlike neighbor of x, that is x's nearest neighbor (via Euclidean Distance) with a different class label than x. Finally, the random number r is calculated with the property:

The result of our work showed MORPH offering 4 times more privacy than the non-privatized data and comparable utility.

This work was published for ICSE 2012 with the title, Privacy and Utility for Defect Prediction: Experiments with "MORPH"

In later work, we paired MORPH with CLIFF an instance pruner and got even better results (TSE).