Thanks to my colleague Zhanna Khaymedinova for the idea of this exercise!
Nowadays there are tons of information on the internet. And no wonder that such information even if targeted to humans, is often collected and processed by robots.
In this task you are to write a small program which collects data over the social network. Start from here:
John Doe at Fake Social Network
You see that each page represents a person with different name, date of birth and net worth
. Also each page provides
links to few other people somehow related to given one, so that from John Doe you can navigate to Dan Wagner
(via "Friends") and from here to Dave Johnson (via messages on the "Wall").
The goal is to sum up net worth
figures for all persons with specific last name (e.g. Johnson) who are reachable
(via any number of links) from John Doe.
View source of the page (by pressing Ctrl-U
or using Inspect element feature in Google Chrome or Firebug plugin
for FireFox) to see how elements of text could be distinguished (with regexps or some other method).
Typical approach is like following:
url
into string (for example
here are hints for Python);<a href="./monika-s-smith.html">
- easy to fetch with regexp!);There are also some things to note:
1500
persons and still goes on - you obviously are in a loop of some kind,
since there aren't that many pages;200-300
milliseconds) after fetching each page may surprisingly reduce delays
introduced by site itself and speed up the process.Input data will contain last name (lowercased) which we are interested in.
Answer should contain total net worth
for all people with such last name who are reachable from the initial page.
Example:
input data:
doe
answer:
130000