How to detect plagiarism in Java source code?

Plagiarism is the act of using words and ideas by someone else without as your creation without permission or giving the due credits to the owner. Many people associate plagiarism with written work and images. Plagiarizing is broad and even takes place even in coding computer programs.

Plagiarism does not just involve copying the source code. Someone who includes comments, interface designs, and program input data is also plagiarizing. This practice has been there for many years since the first time people began expressing concerns about plagiarism in the source code.

Some tools enable users to detect plagiarism in Java source code. In the structure-oriented systems, the measure of similarity between two programs is on the base of a likeness of the structure in two source codes. This approach involves comparing source code in two phases. The first phase works by streaming premium business writing services generated programs while the comparison phase of these token streams with string matching algorithms occurs in the second phase.

SIM

SIM detects plagiarism of codes written in Java, Miranda, Pascal, Lisp, and Modula-2. It also checks a similarity between the plain text files. SIM converts six source codes into strings of a token. It then uses dynamic programming string alignment to compare the strings. This alignment technique also helps in DNA string matching.

MOSS

MOSS supports Java, plain text, C, C++ and Pascal in addition to supporting Windows and UNIX operating systems. It detects plagiarism in coding by converting source code into tokens. It then uses the Robust Winnowing Algorithm to take document fingerprints by choosing a subset of token hashes. From here MOSS creates an inverted index for mapping document fingerprints and their positions in each document. It uses each program file as a query against the index to return a list of documents in a collection with fingerprints that are in common with this query. The result of MOSS is the number of matching fingerprints for each pair of document in the files’ set. These are the results that MOSS sorts to show highest score matches to the user.

JPlag

JPlag checks plagiarism of source code written in Java but also works with Scheme, C, and C++. It parses all source code and transforms it into token strings. JPlag compares the token strings after transformation using the RKR-RGST algorithm. It generates the comparison result showing it in HTML file that users can visit using any browser. The main page file of results in HTML has pairs of programs that could be an act of plagiarizing. Users will separately see results of different pairs. Different fonts in the HTML result file portray different things. Different pairs who have a similar code have a different font from the other pairs to simplify result analysis.

The design for the above approaches is to test plagiarism on source code writing in Java as well as other programming languages. Each of the above approaches focuses on specific characteristics of a code designed to check plagiarism of source code developed by different programming languages.