User:Sdfjjelwjelk/Bitsnoop

Bitsnoop is a BitTorrent index launched on on October 2, 2009. It has over 9,699,121 indexed torrents as of November 5, 2010 from 9,881 trackers, 2,280 of which are listed as 'active'. Bitsnoop does not host their own tracker. Hundreds of torrents are added each hour.

Technology
Bitsnoop uses several exclusive technologies for the collection and verification of torrents. One includes the TV Finder, a robot organized collection of Television shows. Another is TrackerMatch, a technology that checks torrents for errors, typos, and junk tracker Urls. It also checks for backup trackers, redirects, DNS aliases, tracker clusters, etc. The last mentioned on the website is FakeSkan. Bots "check if the torrent can be flagged "Verified" automatically — based on how was it discoverd, how is it tracked, named, what are torrent contents and some other variables. Then, for each and every torrent we run a bunch of checks (names, structure, file sizes, etc.) and adjust torrent rating (adding votes). Suspicious torrents will be lowered in search results and marked as such. Or even flagged as "Fake". You can see how many times robots voted at statistics page." -Bitsnoop "These technologies are exclusive to Bitsnoop. Bitsnoop also has details about how the site is run.  Servers run Linux.

Indexer is Java / C++. Indexer is built quite similar to enterprise data processing systems — pluggable services/queues/message formats. We use only utility third-party libraries (Apache Commons, etc.), all infrastructure is coded from scratch. We do not use J2EE/MySQL/Sphinx/Lucene/whatever, we've invented our own bicycle.

Front end is PHP/nginx.

We get about 2 requests per second from botnets / script kiddies probing site for known vulnerabilities. We laugh at them: Ha! Ha!

Each word in your search is scanning around 24 millions of records in indexer's lexicon.

Every 30 seconds our robots: discover, download, parse, analyze encoding, normalize names, split infodata, upload to index, generate phonetic info, run SafeSearch checks, run FakeScan checks and make a new torrent available.

Also robots run around 200 jobs each day to keep systems healthy and update around 1,500 TrackerMatch records each second.

At any given moment there are at least 15 robots working on something useful. Or at least trying to look busy, you never know with robots." -Bitsnoop

History
"Bitsnoop started as a proof of concept utility back in 2008. It was just something developed for the fun of it, running on Java and BerkeleyDB. Didn't even have a name.

It was a mere toy up to March 2009 when we've decided to start developing it on a larger scale and drew first architecture draft. This is how first live version started, code named "BitRover".

The thing went live in beta on 3rd of October 2009 with 1.5-something million torrent loaded during the previous month. It was pure-Java with Tomcat/nginx as front end servers.

In a week we've received a review on Torrentfreak (thanks, Ernesto!) and instantly became popular — this is when it hit the fan. It was a tough week. We've done lots of performance testing — but it's a bit different when you have a horde of users on site. Tomcat was crashing with "out of memory" errors or melting half of CPUs with 100% load. Or stopping to respond for no apparent reason. Figures.

We've had to code all of our front end in PHP — in 6 days, while desperately trying to keep Tomcat alive.

Things got stable at the end of October with 2 million torrents in index — and we could start reviewing performance and planning new features. This is when first draft of "BTek3" architecture started to appear. Development took more than 3 months (well, we have quite long New Year vacation here) — and here it is, new and kicking!

We've had a catastrophic failure of our storage subsystem in February 2010 (just after St. Valentine's), which caused Bitsnoop to be offline for some time. Now everything is fixed and improved tenfold." -Bitsnoop