Back to General discussions forum
I am new to data science and I am trying to create an algorithm for the DBSCAN. I can label each point as core-border-noise. But after this step I am stuck. How can I seperate the density reachable cores and create clusters from these points ?
Distance <= epsilon makes a neighbour's relationship. This relationship converts dataset into the graph-like structure. Then you have to apply kind of Breadth-first_search algorithm for this graph. To exclude processed point apply "visitor" pattern of some kind. Make several iterations of it. Add reading files/input, conversions and other algorithms for your taste and you will do it!
Thanks for your answer, however It seems that the algorithm structure is above my knowledge. Is there a simpler way to do it ?
Well, dont be afraid of all this buzz words.
To encourage you I can tell that my solution for task #205 consist of 91 lines of Python code.
63 lines is spend on reading, making stats, scaling and marking stars. All this things you have already done.
25 lines goes to clustering algo part (with all this buzz words :) ).
And 3 lines to prepare output and print.
Because of little size of clustering code I can nott tell you more of it construction. Otherwise it will become direct assistance, and unfair to others.
dont be afraid
Yep, these are the words of wisdom. Usually things are not as difficult as it sounds first.
We have problem about Breadth-First Search which may help...
Is there a simpler way to do it ?
It's the general situation with almost all methods in data science and anything - they are not very complicated, but they rely upon some "well known" algorithms to help. So we usually need to know these "well known" algorithms or their analogs... Still don't hesitate. You already have a good progress so I believe there are no real problems to stop you. Just chosing proper order may be important...
I dont know much algorithms in general..thats the real problem indeed.
I ll try to solve the Breadth-First Search problem and later on come back to this problem.