Things I Learned About Drupal And Odd 404s

Blog by Sumana Harihareswara, Changeset founder

18 Nov 2014, 9:46 a.m.

Hi, reader. I wrote this in 2014 and it's now more than five years old. So it may be very out of date; the world, and I, have changed a lot since I wrote it! I'm keeping this up for historical archive purposes, but the me of today may 100% disagree with what I said then. I rarely edit posts after publishing them, but if I do, I usually leave a note in italics to mark the edit and the reason. If this post is particularly offensive or breaches someone's privacy, please contact me.

Back on October 7th, I offered "Some Tips On Domain Names And Hosting", and said: "So, next step: choosing a provider, spinning up a server, loading it up, and pointing my new domain name at it!" And then an interesting unexpected thing came up, which takes up the majority of this post (see the "Weird spam and HTTP tricks" section).

I chose DigitalOcean mainly because a peer had a $10 referral coupon thing, so I could for free enjoy the benefits of using a service that has a business model that makes sense and won't get all ad skeevy (relevant rant, parts one, two, and three).

Security stuff

I faced some two-factor auth problems basically because the most convenient 2FA solutions assume you are fine with installing a closed-source app on a computing device you control.

Also, when spinning up a DigitalOcean droplet for the first time and SSHing into it, I'd like to establish the authenticity of the host by verifying the ECDSA key fingerprint. Where in one's digitalocean.com settings or in the web UI should one look to find that? The answer: one can't. I looked on the web and asked around, and found a lot of people saying, "when you get to 'the authenticity of this host cannot be established, are you sure,' just say yes." There is apparently no way to verify that key fingerprint in the web UI. The attack vector is microscopic (someone else coming in and spoofing the IP address right after you spin it up and before you have a chance to SSH in). But it still annoys me. I hear Amazon EC2 has solved this problem and does give you a way to verify the fingerprint.

Server setup

I followed some useful tutorials to refresh my memory so I could set up an Ubuntu server and get a LAMP stack installed. Another helped me install Drupal. I have now successfully installed Drupal!

Drupal

Generally, if you want to make Drupal do what you want it to do, it's helpful to install modules that other people have made, and maybe themes. You can check out popular modules such as Views, and you can look up how to install modules and themes, and learn how to install modules and themes specifically in Drupal 7.

Thanks to much help from Fureigh (example), when I looked up an "installation profile" ("ngpprofile") that interested me, I found out about Drush and installed it. It seems as though drush wants or seems to need to do everything as root, which doesn't feel right to me, so maybe I misunderstood. Then again, a sysadmin of my acquaintance mentioned his "you gotta be kidding me" reaction to a Drupal installation HOWTO that blithely said "now chmod 777 the web directory", so maybe I just have a different attitude to privileging than Drupal does! Some more thoughts on Drush: a slide deck, GitHub, a homepage, and a project page.

And Fureigh submitted a patch to get ngpprofile to work properly with Drush! ... And then I ungratefully did not try to use ngpprofile, and instead looked at a very very simple theme, and then fiddled manually with templates and the admin dashboard to make my site look just slightly different from a regular stock Drupal site. Drupal theming seems to be a pretty deep skill in and of itself.

I got help from the #drupal-support IRC channel on Freenode as I went -- thanks! If I ever dip into Drupal again, I'll check out a video resource they recommended, including a "build your first Drupal 7 website" video sequence.

Weird spam and HTTP tricks

I bought a brand-new domain name via Hover and pointed it to my DigitalOcean droplet. The next day, I looked at various admin logs and noticed strange 404s that had nothing to do with my site. Clearly they were spam and the attackers hoped I would click on their URLs thinking they were referrers, or similar (if the attacked site's 404 logs are public, intentionally or accidentally, then this tactic would increase the spammer's pagerank). I'll reproduce one here, with the actual URL replaced with "myphishingsite.biz" and eliding the IP.

TYPE page not found
DATE Thursday, October 9, 2014 - 10:46
USER Anonymous (not verified)
LOCATION http://myphishingsite.biz/http://myphishingsite.biz
REFERRER 
MESSAGE ttp://myphishingsite.biz
SEVERITY warning
HOSTNAME [IP address elided]

Hmmm. The spammer left their URL in the LOCATION field somehow, but there's no referer (Drupal spells it "referrer in the admin console). I found that I could cause a "page not found" log entry by going to a nonexistent page on my site, e.g. /bleeber, but then the LOCATION for that log entry was http://[hostname.tld]/bleeber. How was the spammer manufacturing an entry with a LOCATION of http://myphishingsite.biz? And what was up with the truncated initial "h" in the MESSAGE field?

With a few pointers from two Hacker School colleagues, a bit of reading up on how Drupal logs 404s, what access logs look like in Apache, and what 404 actually means, and some trial-and-error, I began to see what was happening. If I went to http://myhostname.tld/http://panix.com , then my access logs included GET /http://panix.com . But the attacker sent requests that logged as GET http://[spamsite] (notice that there is no leading /). So I began to suspect that the attacker programmatically sends GET requests with some kind of intentionally malformed header. (And then this helped me explain why, in the report overview in the web-based admin console, the spammed URLs miss their first character (the h in http) -- usually you don't care about the leading slash or about the base URL when you're skimming that overview, so Drupal programmers made some kind of "omit the first character" choice.)

Time to break out netcat! Usually, the first string after GET in an HTTP request header is the location of the resource you want on the host that you're sending the request to (below, "myhostname.tld" is the host that I'm sending the request to). You'll often see GET / or GET /favicon.ico, for instance. But there's no reason you can't do something like this:

$ nc myhostname.tld 80
GET http://berkeley.edu HTTP/1.1
Host: berkeley.edu
Referrer: 
User-Agent: netcat

When I sent that HTTP request manually, I could replicate precisely what the spammers were doing, in terms of what characters showed up or got clipped in the relevant logs. For instance, the access log entry:

[IP address elided] - - [11/Oct/2014:16:23:47 -0400] "GET http://berkeley.edu HTTP/1.1" 404 7574 "-" "netcat"

And if I were specifically attacking Drupal administrators and wanted them to click on things, and I knew about the initial truncated character in the web-based admin console view, I might send a GET request that includes an initial character to throw away:

$ nc myhostname.tld 80
GET /http://nyc.gov/ HTTP/1.1
Host: nyc.gov
Referrer: 
User-Agent: netcat

Success

So, my first week of my second Hacker School batch, I succeeded in learning a bunch about using the domain name system, hosting, and Drupal, AND I learned how to do hilariously wrong things with HTTP requests. (The site isn't up anymore, because that wasn't the point.) I then went on to build some more sites with different tools, and I'll blog about the rest of them in upcoming posts.

Shelter and Memory

A Node.js Project, And Deciding to Shelve It

Comments

Sumana Harihareswara
http://harihareswara.net
18 Nov 2014, 12:25 p.m.

Oh right, there were a few more resources I meant to add to this post.

How to create your own Drupal 7 theme from scratch and how the theme system works. An overview of theme files, and a guide to writing theme ".info" files.

Also, the various strings or ints I needed to either save or look up over and over again while building the site were:

* my DigitalOcean web login credentials * the droplet port I'd left open to SSH into * the root password on the droplet * the unprivileged user name I'd created to log into the droplet with, and the password (or SSH passphrase) * the MySQL password * the name and password of the Drupal user for MySQL * the password for the Drupal site maintenance (admin) account