Blog by Sumana Harihareswara, Changeset founder
Things I Learned About Drupal And Odd 404s
Hi, reader. I wrote this in 2014 and it's now more than five years old. So it may be very out of date; the world, and I, have changed a lot since I wrote it! I'm keeping this up for historical archive purposes, but the me of today may 100% disagree with what I said then. I rarely edit posts after publishing them, but if I do, I usually leave a note in italics to mark the edit and the reason. If this post is particularly offensive or breaches someone's privacy, please contact me.
Back on October 7th, I offered "Some Tips On Domain Names And Hosting", and said: "So, next step: choosing a provider, spinning up a server, loading it up, and pointing my new domain name at it!" And then an interesting unexpected thing came up, which takes up the majority of this post (see the "Weird spam and HTTP tricks" section).
I chose DigitalOcean mainly because a peer had a $10 referral coupon thing, so I could for free enjoy the benefits of using a service that has a business model that makes sense and won't get all ad skeevy (relevant rant, parts one, two, and three).
I faced some two-factor auth problems basically because the most convenient 2FA solutions assume you are fine with installing a closed-source app on a computing device you control.
Also, when spinning up a DigitalOcean droplet for the first time and SSHing into it, I'd like to establish the authenticity of the host by verifying the ECDSA key fingerprint. Where in one's digitalocean.com settings or in the web UI should one look to find that? The answer: one can't. I looked on the web and asked around, and found a lot of people saying, "when you get to 'the authenticity of this host cannot be established, are you sure,' just say yes." There is apparently no way to verify that key fingerprint in the web UI. The attack vector is microscopic (someone else coming in and spoofing the IP address right after you spin it up and before you have a chance to SSH in). But it still annoys me. I hear Amazon EC2 has solved this problem and does give you a way to verify the fingerprint.
Generally, if you want to make Drupal do what you want it to do, it's helpful to install modules that other people have made, and maybe themes. You can check out popular modules such as Views, and you can look up how to install modules and themes, and learn how to install modules and themes specifically in Drupal 7.
Thanks to much help from Fureigh (example), when I looked up an "installation profile" ("ngpprofile") that interested me, I found out about Drush and installed it. It seems as though drush wants or seems to need to do everything as root, which doesn't feel right to me, so maybe I misunderstood. Then again, a sysadmin of my acquaintance mentioned his "you gotta be kidding me" reaction to a Drupal installation HOWTO that blithely said "now
chmod 777 the web directory", so maybe I just have a different attitude to privileging than Drupal does! Some more thoughts on Drush: a slide deck, GitHub, a homepage, and a project page.
And Fureigh submitted a patch to get ngpprofile to work properly with Drush! ... And then I ungratefully did not try to use ngpprofile, and instead looked at a very very simple theme, and then fiddled manually with templates and the admin dashboard to make my site look just slightly different from a regular stock Drupal site. Drupal theming seems to be a pretty deep skill in and of itself.
I got help from the
#drupal-support IRC channel on Freenode as I went -- thanks! If I ever dip into Drupal again, I'll check out a video resource they recommended, including a "build your first Drupal 7 website" video sequence.
Weird spam and HTTP tricks
I bought a brand-new domain name via Hover and pointed it to my DigitalOcean droplet. The next day, I looked at various admin logs and noticed strange 404s that had nothing to do with my site. Clearly they were spam and the attackers hoped I would click on their URLs thinking they were referrers, or similar (if the attacked site's 404 logs are public, intentionally or accidentally, then this tactic would increase the spammer's pagerank). I'll reproduce one here, with the actual URL replaced with "myphishingsite.biz" and eliding the IP.
Hmmm. The spammer left their URL in the LOCATION field somehow, but there's no referer (Drupal spells it "referrer in the admin console). I found that I could cause a "page not found" log entry by going to a nonexistent page on my site, e.g.TYPE page not found DATE Thursday, October 9, 2014 - 10:46 USER Anonymous (not verified) LOCATION http://myphishingsite.biz/http://myphishingsite.biz REFERRER MESSAGE ttp://myphishingsite.biz SEVERITY warning HOSTNAME [IP address elided]
/bleeber, but then the LOCATION for that log entry was
http://[hostname.tld]/bleeber. How was the spammer manufacturing an entry with a LOCATION of
http://myphishingsite.biz? And what was up with the truncated initial "h" in the MESSAGE field?
With a few pointers from two Hacker School colleagues, a bit of reading up on how Drupal logs 404s, what access logs look like in Apache, and what 404 actually means, and some trial-and-error, I began to see what was happening. If I went to http://myhostname.tld/http://panix.com , then my access logs included
GET /http://panix.com . But the attacker sent requests that logged as
GET http://[spamsite] (notice that there is no leading
/). So I began to suspect that the attacker programmatically sends
GET requests with some kind of intentionally malformed header. (And then this helped me explain why, in the report overview in the web-based admin console, the spammed URLs miss their first character (the h in http) -- usually you don't care about the leading slash or about the base URL when you're skimming that overview, so Drupal programmers made some kind of "omit the first character" choice.)
Time to break out
netcat! Usually, the first string after
GET in an HTTP request header is the location of the resource you want on the host that you're sending the request to (below, "myhostname.tld" is the host that I'm sending the request to). You'll often see
GET / or
GET /favicon.ico, for instance. But there's no reason you can't do something like this:
$ nc myhostname.tld 80 GET http://berkeley.edu HTTP/1.1 Host: berkeley.edu Referrer: User-Agent: netcat
When I sent that HTTP request manually, I could replicate precisely what the spammers were doing, in terms of what characters showed up or got clipped in the relevant logs. For instance, the access log entry:
[IP address elided] - - [11/Oct/2014:16:23:47 -0400] "GET http://berkeley.edu HTTP/1.1" 404 7574 "-" "netcat"
And if I were specifically attacking Drupal administrators and wanted them to click on things, and I knew about the initial truncated character in the web-based admin console view, I might send a
GET request that includes an initial character to throw away:
$ nc myhostname.tld 80 GET /http://nyc.gov/ HTTP/1.1 Host: nyc.gov Referrer: User-Agent: netcat
So, my first week of my second Hacker School batch, I succeeded in learning a bunch about using the domain name system, hosting, and Drupal, AND I learned how to do hilariously wrong things with HTTP requests. (The site isn't up anymore, because that wasn't the point.) I then went on to build some more sites with different tools, and I'll blog about the rest of them in upcoming posts.
18 Nov 2014, 12:25 p.m.