Cogito, Ergo Sumana
Sumana oscillates between focus and opportunity

(0) : Attachment: From The Young Buddhists' Path To Success, by Venerable Master Hsing Yun, 1987, p. 25:

What the Buddhist youth lack today is a sense of ambition.

I'm turning this over in my head, half thoughtful and half amused.

Filed under:


(0) : Normal's Just A Setting On The Dryer: (Title from some lost-to-the-ages sage, I think.)

I recently said to a friend that I'm pretty on-board with "Labels are for mailing things" and "Normal's just a setting on the dryer" (also said as "Normal: of or pertaining to someone named Norm"). I also shy away from calling things "real", from using the minimizing adverb "just", from saying that "everyone" does or is or doesn't or isn't something.

Today I was thinking about the assumptions that US children of immigrants make, about the fact that we know that "normal" is relative. When I'm in Mysore, it's as normal to sidestep cow poop on the road as it is to avoid clicking on phishing links in my email.

Evidence is mounting that several people consider me a role model and a leader. (And yes if you thought "that must be something Sumana is resistant to acknowledging, to herself or publicly" then you are accurate in your predictions!) So I'm mulling that. Role models demonstrate that something is possible, for, lo, here is an existence proof. And leaders get to influence perceptions of what's normal, and what's bullshit.


(1) : Using Beautiful Soup, Pystache, and Lunr.js for an Archival Site: My third week of my 2014 Hacker School batch, I decided to take on a project that I'd originally thought about doing a year before, during my first go at HS.

Between April 2005 and August 2007, I wrote a weekly column called "MC Masala" for the "Inside Bay Area" section of several papers in the San Francisco Bay Area, including the Oakland Tribune. My work circulated to about a million people, I'm told. A few years ago I grabbed a softcopy of almost all my archives off a periodicals database, and then in 2011 I made an abortive attempt to get the columns online, but gave up on all the fiddly textmunging bits.

But a few weeks ago I felt ready to make a go of it, and I figured this would be a fun and useful way to learn Beautiful Soup and learn to finagle a search engine. So I basically stopped doing the Matasano crypto challenges and started a new project.*

Beautiful Soup, Pystache, and sed

I wrote a script to take a list of HTML files of my old newspaper columns and scrape them using Beautiful Soup. (I only needed a tiny bit of live help from Leonard -- to whit, he got me to use the html5lib parser instead of the default.) My script output a Python dictionary containing the stories as structured data: headline, date, & body. And I wrote a script to render that data through Pystache templates I wrote and write an HTML file for each story, plus a table of contents page. (I don't intend on adding comments or starting the column back again, so I didn't think I'd want a CMS. Pystache, the Python implementation for lightweight Mustache templates, seemed like a reasonable choice.) I got some help on this, notably from a pairing session with Chase Lambert on testing Unicode stuff, and from a pairing session with Geoff Shannon on a Pystache type and inheritance problem.

Unfortunately I never quite figured out how to get one Pystache template nested in another, so there's some code duplication (perhaps partials are the answer). And I had to hack my way around some loopback issues so as to put chronological next/previous links on each article. (Story URLs are just kebab-cased dates. So, my script gets the headline and date (and thus the URL) of the next or previous story by traversing a date-sorted list of dates-and-headlines dicts, then renders the dates and URLs into variables in the template. Oh right, this is where a CMS would have been nice! Lightweight is great until it's not.)

(In the course of all this, I (with help from a sed FAQ) wrote my first real honest-to-goodness "changing a bunch of files in-place with sed" one-liner in years or possibly ever. A ton of links in several files were pointing to the parent directory instead of the current directory. So: sed -i '/head/s/\.\.\///' *.html means "In-place, change ../ to nil, in all the .html files in this directory." Whoo!)

The look, the feel

(There was a cotton ad on TV when I was a kid, with the jingle, "The look / the feel / the fabric of our lives." Sometimes Nandini and I sing it to each other. I suppose if there were an ad for Cascading Style Sheets on TV today it could use the same motto.)

I wrote the stylesheet and arranged the proper elements in the template with a bunch of help from Mozilla Developer Network's guidance on boxes and tables, and that old standby, CSS Zen Garden. I gratefully and curiously perused several nice-looking styles for inspiration and edification. I now more thoroughly understand the difference between margin and padding, and grok better why modern sites have a zillion divs.

For a "home" image, I used a picture of me that Valerie Aurora took, and for a header decoration, I used the GNU Image Manipulation Program to stitch together repetitions of a photo that Kitt Hodsden took and blogged in 2012.

Lunr.js

I thought about adding a server-side search engine with something like Lucene or ElasticSearch, but then I heard about a client-side search engine, Lunr.js. My previous HS batch had included a little JS exploration, and I'd futzed with JavaScript in my Node project the previous week, so Lunr sounded like a good approach. I got it installed okay, and borrowed Ben Smith's minified JS package and Jared Dominguez's index-builder, and got a ton of experience with Chrome developer tools. Over the course of getting Lunr.js working on my site (with help from Nicholas Cassleman and Vito LaVilla) I wrote JS to query the index and return search results. I especially like that the result shows up in the same page, without the need for a redirect or full page refresh.

I've made database schema decisions before, but I haven't previously decided on search indices. It was cool that I had the power to change up the parsed output once I realized that the structured data ought to have hrefs as the unique IDs, rather than otherwise-useless unique doc IDs.

My site!

MC Masala is live! I am so happy that these columns have a nice home now, and that I made it. I got to exercise my Python, which is strong, and I got to strengthen a bunch of other skills along the way. It's not perfect, and I have a TODO list, but it's the nicest-looking site I've ever made, and it fulfills its function well. And I made it in just a few days.


* I basically stalled on the Matasano challenges, and will come back to them someday when I don't feel so time-constrained. I did get some use out of doing the ones I did! I have now grokked byte-level stuff much better, and learned about bytearrays thanks to Allison Kaptur. And I got some laughs out of the process. Example: In challenge six, the Hamming distance the player calculates should be 37. First attempt: came up with 14. Next: 598. I literally laughed aloud. Then, when I finally got 37, I thrust my arms into the air with great vigor because I WAS A DEITY OF PURE LIGHT. But then I started getting depressingly wrong answers and kept getting them; I got help from friends, but decided to hold off and only look at one friend's potentially-spoilery explanation when I'm ready to come back, and I still haven't looked at it. I tried to remind myself of a sort of Allison Kaptur/Carol Dweck "the edge of maybe-can't/"The only thing that makes you smarter is doing hard things" attitude, that I am a Joseph Campbell hero and the greater my struggle the greater my triumph will be. But I was tearing up in frustration, and I decided to give myself a rest from crypto and level up on the main skill I'd come to Hacker School to learn, namely, webdev. And I think that was the right decision. You gotta manage your own morale and momentum -- that's a resource too.

Filed under:


(0) : A Node.js Project, And Deciding to Shelve It: In my second week of my 2014 Hacker School batch, I asked:

What are red flags in scifi/fantasy magazines' calls for submissions? What words/phrases make you think "ew, avoid"? -- @brainwane, 3:48 PM - 13 Oct 2014

As Moss guessed, I was thinking of making an SF&F version of joblint.org, to automatically check for suspect wording in "please submit" pages and posts by speculative fiction publishers.

I take off my hat to Rowan Manning for creating the tool and the site, which I found easy to adapt (my fork of the tool, my fork of the site). The code's in Node.js, and despite an npm problem on Ubuntu, I found it fairly easy to figure out how to change the tests, regular expressions, and error messages, modify the package dependencies and update appropriately (especially thanks to Hacker School colleagues). Check it out: package.json lets you point specifically to a git repo as a dependency, and specify a branch. Even though my JavaScript is terrible, I even figured out how to check for the absence of a thing we want in calls for submissions (specifically, wordcount expectations). Overall, the tech side of this project was easier than I expected. (I also did a few of the Matasano crypto challenges that week, which was a very different approach to looking for signals in text!)

But conversation with some SF&F community members led me to believe that the joblint approach wouldn't help here. In tech industry job descriptions, you can rely on certain buzzwords and key off them; joblint should be only part of a suite that catches problems, the way a code linter should be in a software engineering process, but it prookes thought and is useful on its own. But problems with SF&F calls for submissions are often in subtler approaches rather than easy-to-match strings. So it didn't feel worthwhile for me to try for a regexes-alone approach, and I didn't want to spend my Hacker School time thinking though the automated literature analysis part of this problem; that's not what I wanted to do in this batch.

So I shelved the project and I have not gotten it even close to launch. But the code's up with a TODO list, and y'all should feel free to grab it and run with it if it strikes your fancy!

And I got some hands-on time getting comfortable with Node and I reassured myself that I can cargo-cult JavaScript modifications when necessary, so that was cool. And I got and merged a pull request from an old Wikimedia acquaintance, which made me feel warm and fuzzy. I've left the Foundation, but relationships remain.

Filed under:


(1) : Things I Learned About Drupal And Odd 404s: Back on October 7th, I offered "Some Tips On Domain Names And Hosting", and said: "So, next step: choosing a provider, spinning up a server, loading it up, and pointing my new domain name at it!" And then an interesting unexpected thing came up, which takes up the majority of this post (see the "Weird spam and HTTP tricks" section).

I chose DigitalOcean mainly because a peer had a $10 referral coupon thing, so I could for free enjoy the benefits of using a service that has a business model that makes sense and won't get all ad skeevy (relevant rant, parts one, two, and three).

Security stuff

I faced some two-factor auth problems basically because the most convenient 2FA solutions assume you are fine with installing a closed-source app on a computing device you control.

Also, when spinning up a DigitalOcean droplet for the first time and SSHing into it, I'd like to establish the authenticity of the host by verifying the ECDSA key fingerprint. Where in one's digitalocean.com settings or in the web UI should one look to find that? The answer: one can't. I looked on the web and asked around, and found a lot of people saying, "when you get to 'the authenticity of this host cannot be established, are you sure,' just say yes." There is apparently no way to verify that key fingerprint in the web UI. The attack vector is microscopic (someone else coming in and spoofing the IP address right after you spin it up and before you have a chance to SSH in). But it still annoys me. I hear Amazon EC2 has solved this problem and does give you a way to verify the fingerprint.

Server setup

I followed some useful tutorials to refresh my memory so I could set up an Ubuntu server and get a LAMP stack installed. Another helped me install Drupal. I have now successfully installed Drupal!

Drupal

Generally, if you want to make Drupal do what you want it to do, it's helpful to install modules that other people have made, and maybe themes. You can check out popular modules such as Views, and you can look up how to install modules and themes, and learn how to install modules and themes specifically in Drupal 7.

Thanks to much help from Fureigh (example), when I looked up an "installation profile" ("ngpprofile") that interested me, I found out about Drush and installed it. It seems as though drush wants or seems to need to do everything as root, which doesn't feel right to me, so maybe I misunderstood. Then again, a sysadmin of my acquaintance mentioned his "you gotta be kidding me" reaction to a Drupal installation HOWTO that blithely said "now chmod 777 the web directory", so maybe I just have a different attitude to privileging than Drupal does! Some more thoughts on Drush: a slide deck, GitHub, a homepage, and a project page.

And Fureigh submitted a patch to get ngpprofile to work properly with Drush! ... And then I ungratefully did not try to use ngpprofile, and instead looked at a very very simple theme, and then fiddled manually with templates and the admin dashboard to make my site look just slightly different from a regular stock Drupal site. Drupal theming seems to be a pretty deep skill in and of itself.

I got help from the #drupal-support IRC channel on Freenode as I went -- thanks! If I ever dip into Drupal again, I'll check out a video resource they recommended, including a "build your first Drupal 7 website" video sequence.

Weird spam and HTTP tricks

I bought a brand-new domain name via Hover and pointed it to my DigitalOcean droplet. The next day, I looked at various admin logs and noticed strange 404s that had nothing to do with my site. Clearly they were spam and the attackers hoped I would click on their URLs thinking they were referrers, or similar (if the attacked site's 404 logs are public, intentionally or accidentally, then this tactic would increase the spammer's pagerank). I'll reproduce one here, with the actual URL replaced with "myphishingsite.biz" and eliding the IP.

TYPE page not found
DATE Thursday, October 9, 2014 - 10:46
USER Anonymous (not verified)
LOCATION http://myphishingsite.biz/http://myphishingsite.biz
REFERRER 
MESSAGE ttp://myphishingsite.biz
SEVERITY warning
HOSTNAME [IP address elided]
Hmmm. The spammer left their URL in the LOCATION field somehow, but there's no referer (Drupal spells it "referrer in the admin console). I found that I could cause a "page not found" log entry by going to a nonexistent page on my site, e.g. /bleeber, but then the LOCATION for that log entry was http://[hostname.tld]/bleeber. How was the spammer manufacturing an entry with a LOCATION of http://myphishingsite.biz? And what was up with the truncated initial "h" in the MESSAGE field?

With a few pointers from two Hacker School colleagues, a bit of reading up on how Drupal logs 404s, what access logs look like in Apache, and what 404 actually means, and some trial-and-error, I began to see what was happening. If I went to http://myhostname.tld/http://panix.com , then my access logs included GET /http://panix.com . But the attacker sent requests that logged as GET http://[spamsite] (notice that there is no leading /). So I began to suspect that the attacker programmatically sends GET requests with some kind of intentionally malformed header. (And then this helped me explain why, in the report overview in the web-based admin console, the spammed URLs miss their first character (the h in http) -- usually you don't care about the leading slash or about the base URL when you're skimming that overview, so Drupal programmers made some kind of "omit the first character" choice.)

Time to break out netcat! Usually, the first string after GET in an HTTP request header is the location of the resource you want on the host that you're sending the request to (below, "myhostname.tld" is the host that I'm sending the request to). You'll often see GET / or GET /favicon.ico, for instance. But there's no reason you can't do something like this:

$ nc myhostname.tld 80
GET http://berkeley.edu HTTP/1.1
Host: berkeley.edu
Referrer: 
User-Agent: netcat

When I sent that HTTP request manually, I could replicate precisely what the spammers were doing, in terms of what characters showed up or got clipped in the relevant logs. For instance, the access log entry:

[IP address elided] - - [11/Oct/2014:16:23:47 -0400] "GET http://berkeley.edu HTTP/1.1" 404 7574 "-" "netcat"

And if I were specifically attacking Drupal administrators and wanted them to click on things, and I knew about the initial truncated character in the web-based admin console view, I might send a GET request that includes an initial character to throw away:

$ nc myhostname.tld 80
GET /http://nyc.gov/ HTTP/1.1
Host: nyc.gov
Referrer: 
User-Agent: netcat

Success

So, my first week of my second Hacker School batch, I succeeded in learning a bunch about using the domain name system, hosting, and Drupal, AND I learned how to do hilariously wrong things with HTTP requests. (The site isn't up anymore, because that wasn't the point.) I then went on to build some more sites with different tools, and I'll blog about the rest of them in upcoming posts.

Filed under:


(0) : Shelter and Memory: Mary Schmich wrote in that 1997 "wear sunscreen" advicedump, which has stuck with me and overall proven a good guide for adult Sumana:

Understand that friends come and go, but with a precious few you should hold on. Work hard to bridge the gaps in geography and lifestyle, because the older you get, the more you need the people who knew you when you were young.

This weekend I hung out with a couple of Wikimedia engineers I'd known for a while -- heck, I'd helped one of them move. One of them mentioned, "I was looking at the Wikipedia article for Team America: World Police --"

And I joked something like, "Oh, because it was interfering with the Education Program's Team America namespace?"

And he laughed at my joke, because he remembered that two years ago, we tried to help out professors by introducing a Course namespace (basically wiki pages starting with "Course:"), but that this caused a conflict with the article about the Star Trek: Voyager episode "Course: Oblivion". Such an obscure joke.

That's the time and the place for the coziness of an inside joke -- among friends, the ones who've helped you shape your identity, so the homosocial bonding doesn't exclude newbies and imply to them that if they don't get the joke then they don't belong. I wonder what idiom speakers of other languages use; the phrase "inside joke" carries these connotations of shelter and interiority to me.

There's a saying that you know you're a New Yorker when you point to a storefront and say "I remember when that was [something different]." I've been here going on nine years, longer than I have ever lived in any other city, and I can imagine visual diffs for scores of blocks. It makes me feel rooted, like a tree. I can sense -- and sometimes give in to -- the temptation to assume that the change began when I arrived and began to observe it, as though the only important change is the change I witnessed.

My family moved over and over when I was a child, and I was poor at socializing as a teen, and I've only retained a handful of college friendships. Today I'm doing a big inbox scouring, and this musing reminds me to prioritize replying to the old pals, the ones who knew a Sumana I can barely remember.


(1) : Sometimes Paths Are Useful: I just finished a six-week batch at Hacker School. As an alumna, I had the option of asking to come back for three months or for a six-week minibatch, and I decided on the latter. I'll be writing more about my lessons, but today I can mostly point to my programming partner's writeup and add a silly story.

I met Greg Hendershott at !!Con months back, and then we ended up in the same batch and found that we laugh at each other's jokes. So we tried to figure out what to work on together. He's way into functional programming, Racket, Clojure, stuff like that, and has for instance written an emacs mode for Racket. In contrast, I'm only fluent in Python and have been concentrating on web dev. We found common ground in Python and an interest in security, and made a webservice that runs a static analyzer on a user-submitted code sample and returns to the user a "report card" of vulnerabilities in their code. That's what I spent the last two weeks on.

In his post, Greg describes how we rejected smaller and smaller web frameworks, finally settling on subclassing from BaseHTTPServer (built into Python's standard library). When you do that, you have to literally define methods so that the server can handle even the most basic HTTP verbs, like GET and POST. We defined POST but didn't define GET, because we didn't need to! It felt so tremendously subversive, creating a web service that gave you a 501 (Method Not Supported) if you tried to GET / , and yet actually did other things. Deliciously wrong.

(Also amazing: reading and subclassing from code whose initial code comments specifically and relevantly cite the work of Tim Berners-Lee and Roy Fielding. I felt such awe and gratitude, that I am part of a grand heritage of innovation and infrastructure. What an inheritance!)

So then a few days later we decided to make a simple web page or two, so that someone using a web browser could use the service. I loved the experience of API-first design, and felt amused when I implemented our server's second method, do_GET. (One nice thing about long-term collaboration is that you can pair some of the time and also do some bits on your own, bringing them to your partner for code review.) do_GET, like do_POST, didn't care about the path, because there's only one thing a user is ever going to do with our service. No URL routing required. A GET request always caused the server to return index.html.

Then I stubbed out a small index.html page, borrowing bits and pieces from other past projects where I'd solved similar problems. And I thought "well I'll style this a bit" and copied a style.css file from one of my old sites into the project directory, linked to it in the head element of index.html, futzed with some element names and IDs, and reloaded. Hmm, why no styling? Shift-reload. Still looked bare. I opened up the developer toolbar...

...and saw that "style.css" had the text of index.html. Because I had defined GET to always return index.html! And when you want a browser to be able to use a stylesheet, well, it'll have to GET it!

I laughed pretty hard, then inlined the CSS. (And we did end up writing a bit of URL routing so we could serve a favicon to browsers and to serve a capabilities document to service clients.)

I get so much joy out of playing with the building blocks of the Web. It's a great feeling. Thanks for working on this with me, Greg!

Filed under:


2014 November
MonTueWedThuFriSatSun
     12
343567289
1011121314215162
171841920212223
242526272832930

10 entries this month.

Categories Random XML
Password:

[Show all]

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Permissions beyond the scope of this license may be available by emailing the author at sumanah@panix.com.