General discussions: Suggestion for problem

Goodwin Lu 2015-12-10 01:39:15

I was on CodeEval when I suddenly came upon... an UNSOLVABLE program! No one on that website has been able to solve it! https://github.com/codeeval/Code-Plagiarism-Challenge/blob/master/Code-Plagiarism-Input.txt I thought it would be interesting if this site also included it; it would be cool to see if any top player on this site is able to solve the "impossible", or so it seems. [Perhaps instead of importing files, it just tests plagiarism between the text instead?] And also, since this seems so difficult to solve, I propose a reward of 20 blessing. Fair enough?

Rodion (admin) 2015-12-10 15:46:49

Hi! I'm sorry for not replying to you in time (I've got your mail all right).

There is a trouble with that problem. I do not know whether it is "unsolvable" - they say they simply do not grant regular points for sending solution but will award in some other way.

Well, there are several troubles with the problem. But the first thing I want to tell - I myself am very interested in solution for we already have plagiarism here at CodeAbbey :(

Now the troubles are like this:

the problem is not well determined, it is not possible to put straight border between what is plagiarism, and what definitely is not - there is a vague "seems like plagiarism";
also it is simply tough text-processing problem;
and to add to all this - there are many different languages and solution should be specific to many cases;

From all of this it follows that:

it would be hard to create clear test-cases (you know we use randomized test-cases so that people would not solve it manually);
I do not expect that many people would be interested to try.

However I'm still thinking whether we can in some way simplify the conditions to make better (though more abstract) problem of this.

Thank you for your suggestions anyway! :)

Goodwin Lu 2015-12-11 12:17:01

Well, as far as we're concerned, the plagiarism problem has been solved by NOBODY. The website says 100% of people failed it. If it was as simple as manually going through and seeing just how much of a program is copied in the other program, then surely at least one of the thousand users would have been able to do it. There's 18 test cases, and while the problem already gives you the first three answers (100, 95, 0), the others are incredibly difficult as they have different variables and different program names which give out the same function. I think it would be good to ask what CodeEval thinks as "plagiarism". I think it would be defined as how much of a program is the same. For example, I believe it'd be a good start to calculate how many bytes of character are similar: a= 1 b=2 c=1+2 print (c)

d=1 e=2 f=1+2 print (f)

As you can see the second program is only changed variable names, one of the test cases (although far simpler than the actual program, obviously). I believe in this case you'd have to count just how much bytes (D E F) are compared to the total amount of bytes in the file, and then output that percentage number. So basically, I believe the problem is that how similar two text files are, except the two text files are programs. That's how I interpret the problem.