Much like artists of any kind, be they authors or painters, programmers also have their very distinctive styles they code in. Apparently, the more skilled you are, the easier you become to identify, as your code work becomes more unique.
Rachel Greenstadt, associate professor of computer science at Drexel University, in partnership with Aylin Caliskan, assistant professor at George Washington University have managed to figure out how machine learning can be used to identify the authors of these codes.
Their work can be eventually used in settling plagiarism disputes as well as be a steady crutch to come in the aid of developers who open source their codes.
The cool thing about the algorithm they created is that it does not require large portions of code in order to give accurate results – a few snippets are enough. Around 600 programmers submitted their codes for testing, with 8 samples each and the system managed to correctly identify 83% of them.
Obviously, this system has pros and cons. While it can be useful at dispelling any plagiarism issues, determine if a developer broke an employment contract clause or find out who created a malware or another, the system can also be easily used to trace programmers who wish to remain anonymous. Such would be the case for programmers who operate in countries who are under dictatorships and come up with tools that circumvent censorship, to give an example.
However the system will be implemented though, let’s hope someone will come up with some regulations on how and when it will be used and most importantly, into whose hands it will be given.