Note: A while back, I asked readers for suggestions of who I should interview next for the weblog. Lots of great suggestions, and I’m finally getting around to contacting folks. One of the more popular suggestions was Derek Willis, who works with data at washingtonpost.com. This interview was conducted Thursday, Oct. 3, 2007 via Instant Messaging.
ICM: First off, maybe you could explain for the audience what you do at WaPo.
Willis: I’m a database editor at washingtonpost.com, which technically is a separate company from The Washington Post, the newspaper. I work on finding and gathering data, do a little coding of database applications and generally try to think of cool ideas we can execute.
Most of the apps that I work on are politics-related, but I also try to do other stuff.
ICM: We hear a lot about database-driven journalism these days from industry folks. Is this a more concerted effort than in the past, or are we just hearing about it more?
Willis: Probably both, I think. Database journalism has been around for several decades and has a small but loyal group of practitioners, mostly at newspapers. But the Web has opened up the playing field in several ways, the most significant being that we’re no longer constrained by the limitations of a printed paper or broadcast.
So we’re hearing more because there’s more data that’s accessible, but this is not a new movement in journalism. Many of us doing this now owe a great deal to people like Elliot Jaspin, Phillip Meyer, Steve Doig, guys like that.
ICM: You wrote on your weblog the other day: “But until and unless journalism schools show that data has any sort of importance to them, most journalism students only will be exposed to the possibilities, but not the actual process, of working with data on the Web.” How are you personally doing that at GWU, for instance. And what are some ways college media advisers could use to get students interested in the subject?
Willis: In the class I teach, I try to make as much of the data we work with as local as possible, meaning the campus and its surrounding environment. In practice, that means campus crime data, student demographic statistics, even data on the academic backgrounds of faculty. Sports is a huge opportunity for this type of work - almost noone gets closer to student-athletes than other students.
So, for advisors, I’d suggest thinking about ways to collect, analyze and display data that’s important to the college community, whether that’s crime reports or a log of all of the football team’s plays.
ICM: And let’s suppose that a student gets hooked on database-driven journalism, but they have no (or very little) training in that area, any suggestions for where they could get started?
Willis: Sure. Realizing that walking into the school’s computer science department with confidence isn’t always the easiest thing to do, I’d say that there are plenty of academic disciplines that use data all the time. So if a student knows a professor who does survey research, that’s usually database-oriented. Or political scientists who study election results or voter participation - they usually deal with datasets.
But the easiest and probably best way is to just start. Start keeping some information in a spreadsheet. Pretty soon you’ll learn how to deal with issues and problems. Then you’ll outgrow the spreadsheet and start looking around for a more robust solution.
Free or cheap database software and hosting is abundantly available, too, especially on university campuses. It’s a matter of asking for it.
ICM: How do you keep ideas coming for database projects?
Willis: One of the best ways for me is to read the paper or listen to the news and ask myself, “How do they know that?” whenever something interesting comes up. Or a matter of conventional wisdom. If you’re arguing with a friend over whether the basketball team always shoots more 3-pointers in the second half, well that’s measurable.
There are these things that “everybody knows” - where not to park, for example - and these things are largely based on anecdotal evidence.
I like trying to find ways to test out anecdotal theories.
ICM: For those who may be curious, what’s a good range of the turnaround time on projects you work on - and how many people get involved.
Willis: It varies pretty wildly. Yesterday I got handed something and finished it off, except for design, by today. Other projects take a week or several weeks. Our congressional votes database took Adrian Holovaty and I three months, with Adrian doing most of the work.
Designers are usually the first “other” people to be involved, outside the original source of the idea. And then you’ll get higher-level editors involved depending on the significance of the project.
ICM: what lessons have you learned over your career that might help college journalists in general, or specifically those who may be pursuing a career in your area of specialization. Like, “wow, I wish I’d know that when I was starting,” kind of things.
Willis: I think the first is to never assume that you can’t do this stuff, that databases are beyond the capabilities of college students. You have as much right and responsibility to do it as any other journalist.
The other thing is that as a journalist, you need to find out about sources of data on your beat. This is not a specialty thing anymore. If you don’t understand how the people or institutions you cover are using data, you won’t have a complete picture of what they are doing, and you’ll be at a disadvantage when it comes to evaluating their performance.
The use of data is an arms race, and journalists cannot dismiss it as “that techie stuff” anymore. So on every beat, find out how data is being used and try to replicate what you can in terms of being able to use it yourself.
ICM: That’s a very good point that I don’t know everyone has grasped yet.
Willis: it’s very difficult to explain, because it’s not transparently obvious. But if you’re a sports reporter, you know that the teams are using data and video to evaluate their opponents. You know that political candidates are doing the same with theirs. You know that police agencies use data more and more to justify budgets and patrol areas. It’s an increasingly important component of decision-making.
ICM: As someone who’s worked in the industry and with technology for a while, are you optimistic or pessimistic about the future of the news”paper” industry? If so, why?
Willis: I’m optimistic for the long-term, but the short-term scares me a little bit, because I don’t know that we’ve seen the worst in terms of layoffs and budget cuts at papers.
I do think that newspapers in particular have a large role to play in society and that we can continue it, with a few adjustments for this new environment we’re in. But as readers transition online, leaving a smaller but fairly stable paper readership, we’ve got to figure out how to support the creation of journalism - how to pay for the valuable stuff we do.
And given our industry’s track-record when it comes to innovation, that does cause me some concern.
ICM: Finally, I ask this of everyone I interview: What’s one project you’ve worked on recently that you’d like to point people to as a showcase of your work. Any “back story” that would explain the project?
Willis: Hmm. Even though it’s now two years ago, it’s hard for me to not say our votes database, because I’m extremely proud of that app. Not just because of the public service it provides, which was a major reason we did it, but because it essentially launched database journalism apps at washingtonpost.com.
The funny thing about the votes database is that Adrian and I were able to build it because no one else really cared about the data - we didn’t ask for permission because I had gathered the data and it didn’t fall under anyone’s responsibilities. So we worked basically alone on it and then, just before it launched, we showed it off to people at washingtonpost.com and at The Post.
And it continues to serve as an illustration of good web apps in that we’re never done with it. A few weeks ago I added a particular slice of votes (approval of the House Journal) because I noticed some interesting patterns there.
I’m really, really proud of that app, and very lucky to have a guy like Adrian make it work.
Start Slide Show with PicLens Lite












on Oct 5th, 2007 at 7:41 pm
Terrific interview. Thanks to Derek Willis for spending the time.
on Nov 11th, 2007 at 7:50 pm
Great interview. I wonder how to go about getting access to databases that should be public info?
If you weren’t able to scrape from a web published database (like the awesome chicagocrime.org), you would have to convince someone to hook you up with the data. Hmm…