Whatever I'm enthused about at the moment ;-p

IRC Database Project: An Attempt At Finding Local IRC Users :-P

2005-07-18 - Category: computing

<< 4th of July Parade 2007

It's amazing just how much information about a person you can gather on IRC. In fact, I really can't think of any other Internet resource that automatically offers so much information about a person. From a simple IRC /whois you can usually get someone's name and IP address. And from the IP address, you can deduce their ISP and location. This is all assuming, of course, that they include their name in their IRC client setup info and that their hostname isn't cloaked. About a month ago I decided I wanted to see if I could build a database of IRC users so I could search for local geeks. Why? Well, I've met local geeks before and it's a lot of fun getting together with people of like mind. Anyway, the following is the tale of how I went about this endeavor.

Networks and Channel Lists

It was my goal to get the largest number of IRC users possible to increase my chances of finding people in my small region of the world. To do this, I wanted to find only the most populated networks and channels and not bother with the small ones. Also the larger channels and networks would be less likely to notice any silly little IRC bots I would create to gather my data, tee hee! Yes, I was rather unsure that some parts of this project were 100% legal :-P Anyway, I first went to places like searchirc.com and irc.netsplit.de to find out what the world's most populated networks were, got the hostnames of about the top 30, and then made a simple bot (I use the POE::Component::IRC module in Perl to make my IRC bots, by the way. Yes, I'm a lame script kiddie XD) to visit each one and do a /list command. Actually, before I made this bot, I was using a program written in C (written by zer0python) that performed the /list on a specified server, but I decided to write my own for the sake of flexibility of features. This worked well only about 50% of the time since many networks truncate the output of the list command. Meh. I then recorded all channels with at least 30 users in a MySQL database so that I would have maximum flexibility for manipulating the data. Perl was a wise choice of programming language for this project since it involved extensive text parsing ;-)

Venturing Into The Channels

I then wrote a bot for simply connecting to a specified channel, joining a specified channel, performing a /who, sleeping for about 2 minutes, then disconnecting. After doing a few trial runs of this maneuver, I found out that most of the networks I was connecting to cloaked the hostnames of the all the users. Drat. I needed to have complete hostnames or ip addresses of users or else this whole project would be rather pointless. So I set out to find out which of the 46 networks I had gathered didn't mess with the users' hostnames and there were only 5:

At this point, I decided I'd better just find out the top 100 or so most populated channels of these networks, gather all those users, and go from there. So I created a script (netsplit_info.pl) to download the most highly populated channel listings for these networks from irc.netsplit.de and another script (netsplit_import.pl) to stash the channels in the database.

Lists O' Users

Next, I had to write a script to connect to a network and visit each of the channels for that network in the database. Thus I didst create *DUN DUN DUNNNN!!!* chan_spider.pl! Yes, I know, there's a lot of repititious subroutines at the bottom of that program, but I wrote it that way because POE::Component would complain otherwise. Anyway, it's just a silly Perl script, relax. :-P It's supposed to be hideous. XD And here (link removed) is the script for parsing and dumping the output of the channel spider into the database. Pretty nifty, aye?

After spidering freenode, I had about 18,000 user records in my database. But after using the SQL distinct function, I found that I only had around 4,000 distinct users. This is partially because A) most users are in multiple channels and B) my channel spider script was quite buggy and did stupid things like performing a /who multiple times in the same channel and other bizarrities. I'm not sure whether this was a bug in POE::Component::IRC or my code. If you have a comment on this, feel free to give me a holler. :-P So, to find out whether or not these people lived in my vicinity, I found out the netranges of local ISPs in my area using the whois program, which is available for all *nix operating systems. For those of you who aren't aware, when you give whois an ip address, it will look it up in one of the four Network Information Centers (NICs): APNIC, ARIN, LACNIC and RIPE NCC. The whois program then returns information about what ISP owns the ip address, the netrange that the ip address belongs to, and some other information.

However, most of the user records in the database had hostnames that needed to be resolved into ip addresses before I could do an ip address netrange search. So I created a program to do that :-)

DNS Mayhem

So, I had approximately 3,000 DNS lookups to do. I couldn't do this with my ISP's DNS servers since they might notice and shut off my net connection and that would be a bad thing ;-p Many ISPs leave their DNS servers open to anyone who happens to want to use them to do a DNS resolution, so I compiled a list of several of these servers I found in the Seattle and Denver areas. 64 of them, to be exact. I just googled for Seattle and Denver ISPs since those cities are bound to have some pretty large ISPs and probably wouldn't notice a few hundred DNS lookups :-P I found out their nameservers by, again, using the oh-so-handy whois program.

The Not-So-Exciting Conclusion

After a couple hours of dns_lookup.pl chugging along, it finished and I feverishly tried my first netrange, which was one of cablone.net's Pocatello netranges:

mysql> select * from users where ip between '' and '';
| id  | chan_id | username | hostname                        | nick      | ircname   | ip             |
| 523 |     211 | furuba   | 24-116-157-215.cpe.cableone.net | buckminst | buckminst | |
1 row in set (0.03 sec)

Sigh. It's Bucky. Well, let's try another one! This time, lets try all the Qwest hostnames for Southeast Idaho! *types madly in MySQL!*

mysql> select * from users where hostname like '%bois.qwest.net';
| id   | chan_id | username | hostname                       | nick | ircname | ip              |
| 4587 |     305 | tlp      | 168-103-130-145.bois.qwest.net | tlp  | tlp     | |
1 row in set (0.02 sec)

ARGH! It's tlp! *groan* Why can't Sexy_Linux_ChiX0r be found?!?!?

Well, in conclusion, there's a lot of people on IRC.... but not THAT many. Searchirc.com records an average of a little over 1,000,000 which, spread out over planet earth, is actually pretty sparce. Ah, well, I learned a lot of nifty things with this project, even though it was a miserable failure. Ok, maybe not totally a failure... like, I can PM Bucky and tlp on IRC and say something like "w00t! my lil channel spider scriptie thingie found you!! ARRR HAR HAR HAR HAR!!!!". *ahem*

Copyright © 2001 - 2014, Korey Pelton -- Static HTML blog generated with PeltonBlog