A Shovel’s Work is Never Done

I have around 8K users on this site. Most of them are nonsense, but I do have several who actually exist. I want to bulk delete a bunch, so I am going in search of a tool which is better than the default ability in WP.

[Gave up on that.]

Here’s where I knew I had a problem. I have become kind of a numbers nut, seeing a power law distribution everywhere. If you can group things, you can get them to show you a power law. Well, typically. In the past I have gone into the WP user management area, searched for users from a given domain, and gone from there. For example, anybody with an email address ending in @mail.ru is gone for sure. Except on this, the first pass, there are over 4K users from @mail.ru. Now WP handles user bulk deletion in a terrible way — the user interface (the web page served on my computer) allows me to check “all” of the users displayed on a given screen.

Fine, but there are two things wrong with this — first, the page can only list something like 259 users at a time, which means that for over 4K users, I will have to do this entire process more than 167 times. And believe, me, waiting for each page to load and then entering the search criteria, selecting all of the users, hitting delete, confirming, and starting over again is a pretty annoying mindless behavior. Skinner boxes have netter schemes.

[And yet that’s exactly what I did.  Again!]

Second, what my browser will actually return to the system up on the server s a LONG URL filled with the user number of EACH user tacked onto the end of some sort of URL string. There is no way that the meaty long user numbers are going to survive the trip in a URL — they just don’t let URLS get that long.
RFC 2616 and RFC 7230 address the maximum length allowed in a URL. There isn’t one, but a request that each server be able to handle URLs of at least 8K characters, and an admonition that each server be able to handle URLs as long as they dish out. Fair enough, but the internet is not a point to point connection, in most cases. It is, as Senator Ted Stevens aptly puts it. “a series of tubes”. You don’t get to pick which tubes. So any node along the way can have its own limit on the length of a URL. I have experienced failure due to something kicking back long URLs before (hate when that happens), so I already know that simply hacking the code on my WP site to allow me to list more users on a screenful is not going to work.

I need more power just to get rid of the very first group of user, the @mail.ru bunch.

A bit about my methodology:
I would like to group, count, and sort (descending) users by domain, so that I get a list with the most populous domain (in this case @mail.ru) and its user count (in this case, 4.606) listed first, and then the second-most populous, and so forth.
This way, the first action I take will be the most effective, followed by the second most effective, and so forth. Think of the worst case scenario — I work n the list in reverse order, so that I spend an entire afternoon chewing through a whole bunch of domains which each sourced only a single user to the blog. I would consider it a minor victory to reach the level of knocking out users two at a time!
The problem is that, like many mail services, WP will not give me this listing without adding a plugin of some sort, or of course, rolling my own code. Well, I have sworn off of writing PHP/SQL/HTML code, as I do not want to bear the maintenance burden. I want everything I touch to be maintained in whole by somebody else. I’ll integrate the pieces, but I’m not diving under the damned hood ever again*.
I cannot get this listing directly. So I must use indirect methods. I assume (and there is a lot of fun math I want to do, but haven’t) that the first screenful of users is overwhelmingly likely to contain a user from the most populous group. Oh, the list of users is presented i alphabetic order, but if we assume all spammers may or may not be using the same tricks, then alphabetization should not matter in selecting the most populous spammer domain.  In fact, all things being equal, it is likely to be the first entry. So I look at the domain of the first entry and search for all users matching that domain.

There are 8,295 users. 4,606 of them are from @mail.ru.

Total users: 8295
@mail.ru 4606
@gmail.com 846
@yandex.com 260
@… 145
@yandex.ru 111

And so on. I did some of this on another machine at home and am now trying to dredge these up from memory. I mean, who remembers things like this?


Anyway, this post is a work in progress.  I’ll post a graphic of the expected power law when I get it done.

One thing I noticed once I got down to about the yandex.com level of things was that the memorable “___Medamelve” pattern of bogus names is really numerous. By the time I was don to ~3500 remaining, 298 of the were named “JuliusMedamelve” or something like it, for differing values of Julius. I decided not to do the obviously productive thing and wipe out that population, as they were from all over the domains, and I do have a blog post about domains to finish, after all.

* Actually, I like diving under the hood, but just like having enough money to pay somebody else to work on my car — I will work on hobby cars, but not spend an afternoon under my daily driver just so I can get to work the next day. Done with that.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply