Ping ([info]zestyping) wrote,
@ 2008-03-25 02:34:00
Previous Entry  Add to memories!  Tell a Friend!  Next Entry
Factville: a new project.
I'm starting a new open source project. It's something i've been thinking about for quite a while now, and have mentioned to people here and there. Let me tell you a bit about it.

Exhibit A: a book on gender differences

A little while ago, i wrote about a newspaper article in the SF Chronicle. The subject of the article was a new book by Louann Brizendine called The Female Brain. On the cover of the book is a brain-shaped mass of white plastic telephone cord, the old kind that comes in a long springy coil — a visual wisecrack depicting the book's central claim that women are born communicators ("excess testosterone shrinks the communications center").

The book jacket lists several gender stereotypes as bullet points. One of them is a specific numerical claim: A woman uses about 20,000 words per day while a man uses about 7,000. Other sources give a wide range of numbers, from "7,000 vs. 2,000" to "50,000 vs. 25,000". Perhaps a lot of people believe women are inherently more talkative. But there doesn't seem to be much evidence to back this up. Actually, a recent study suggests that men and women talk about equally much.

Nonetheless, Brizendine's claim was quoted all over the media. It made a huge impact (the book was a bestseller), and a considerable amount of time went by before it was debunked. To a casual observer, the claim probably doesn't even appear to be debunked at all: a reputable scientist says one thing, a little while later another scientist says the opposite — who's to say which is right? Another virtual throwing up of the hands, another shaking of heads, another anecdote about those silly academics who can never agree on anything.

Catching and recovering from misconceptions

Of course, this sort of thing goes on all the time. Brizendine, as i said, is a reputable scientist — she is a medical doctor and has been on the faculty at Harvard and UCSF. Plenty of facts and figures quoted in the media are presented by people who don't even claim to be scientists or to have evidence. Public misconceptions are pervasive, stubborn, and can be enormously costly.

When you come across a fact — or something that's claimed to be a fact — how do you know whether it's true? Maybe you Google for it; after all, the Web is somewhat more democratic than the TV and print media. But the Internet is also notoriously good at spreading rumours. Maybe you check Wikipedia, trusting its community editing process to do a good job of weeding out errors. Or perhaps you visit Snopes, hoping that the rumour you heard is common enough that someone there will have written an article about it, and you think the people who run that site are pretty decent at what they do.

On the other hand, Wikipedia and Google are a little too general: they may give you an article that's generally related to your topic, and then you need to examine it to see if it mentions the particular claim you want to check. And in both cases, the filtering process is hard to examine: Google's ranking algorithm is secret, and although at Wikipedia everything is public, you could spend weeks reading through the discussion pages trying to find out how a particular claim got inserted into the article. Snopes offers an excellent overview of each rumour, but there's only so much that two people can write. And of course you have to trust those two people.

An idea for a new service

So i think there's a useful service that could be provided by a new website: something with the openness and democratic participation of Wikipedia, but more focused on specific claims and the evidence for them. Thus Factville: a community-edited database of facts and supporting evidence. The site i have in mind would not be an alternative to Wikipedia, but rather a tool to help Wikipedians. A large part of the debating on Wikipedia consists of people gathering sources to support statements they want to put in the article; Factville could help them organize these sources and settle these debates. Factville would also be a tool for bloggers and journalists. When a controversial claim appears in the media, articles spring up all over, taking sides on the claim, quoting and citing sources to support their position. Why not have a place to gather the complete list of sources? Why not discuss them and rate them, the way the Web has taught us to discuss and rate photos, discuss and rate URLs, discuss and rate movies?

That's what Factville is about. It's going to be a Frankensteinian cross between Wiki-style websites (community-edited, completely freeform text, with a recorded history of changes to establish accountability) and Flickr-style websites (community-maintained, structured information, with tags, comments, and ratings). The big challenge will be to make this simple and easy to use. Here is the basic design:
  • The site is a database of claims.
  • Each claim has lists of supporting and refuting citations.
  • Each citation quotes from a source and explains how it relates to the claim.
  • Each claim can also have supporting and refuting arguments.
Each of these things is an editable page, with associated discussion and rating tools.

A source can be any kind of published work — a newspaper article, a conference paper, a video clip, a blog entry, etc. Some sources stand on their own (like Brizendine's book); others belong to a publication venue and rest partly on the venue's reputation (the credibility of an article in the New York Times is related to your opinion of its editing standards).

Citations and sources are separate things because the same source could be used for several claims, or even cited as evidence on both sides of the same claim (perhaps quotations excerpted from different parts of the same source). Information on sources could also be automatically drawn from the syndication feeds of popular publications.

When a contributor wants to put together several sources or other claims on Factville, and combine them into a reasoned case for or against a claim, they can write an argument. Other visitors can rate the arguments up or down so that the most convincing arguments get the most attention.

The ratings of claims, citations, and arguments are not supposed to tell you what is true. They can only tell you about other people's opinions. But the goal is to give you as complete as possible a view of all the evidence, and to let the collaborative power of a large crowd help you find the most relevant factors to consider, as you make your own decision whether to believe each claim.

A modest start

I don't have a running website yet. I have a lot of ideas, some in my head and some written down, many in this journal entry. And i have a start at some code that implements the database structure i just described. Today i registered a new project a Launchpad, an open source project hosting service. You can monitor my progress on the Factville page there. The code I've written so far is available from that page. It's written in Python and runs on Django, which i'm still learning.


(Post a new comment)


[info]istgut
2008-03-25 03:55 pm UTC (link)
I've been thinking about this sort of thing a lot recently as we've gone through the process of trying to figure out what is healthy to eat. In some cases, it actually seems like there simply isn't enough information out there instead of too much. In the case of Acrylamide, for example, the FDA has done plenty of testing of foods, but it still isn't clear how much is safe or even whether it actually matters. There seems to be only a couple people working on it. Looking at the listing of foods, the usual suspects of chips and fried foods are on there (one knows by now that one should avoid those... right? hmm...), but so are things like bread. It seems like the only way to be sure is to eat raw or boiled foods. Have the English had it right all along? (Hmm... Is it actually true that more English foods are boiled than other cuisines?)

My initial thoughts on this (weeks ago when I was trying to figure out how much bread I can eat) was more along the lines of more engaged google experts sort of thing, with people pooling funds to actually pay someone to do the research. All research needs to be funded, but if the funding came aggregated anonymously, there might be less chance for bias. Wouldn't it be awesome to eventually see "this work was supported by Factville" in publications? It certainly need not be the primary activity of the site, but the ability to have a place to pool funds for research like that would be really, really cool. Researchers write grant proposals with TIGHT PAGE LIMITS and then anonymous donors add funds to the pot until the research is funded. All research results would be posted on the site and would be available for free, including all code and raw data, and it would become part of the body of knowledge of the site.

(Reply to this)


[info]surpheon
2008-03-25 04:00 pm UTC (link)
Applying the web to try to get the public discourse back to 'we can have different opinions, but we all have the same facts...' I've always wanted to get a webpage setup to chew over social issues in a formal argument-rebuttal debate layout, but your idea has the potential (and you easily have the skill) to eclipse anything I've thought about. It's a great idea, good luck!

(Reply to this)


[info]eviladmin
2008-03-25 05:58 pm UTC (link)

I think this would be great!  Among other things there is a lot of "newspaper facts" that aren't - someone captured a number in a newspaper story a long time ago and it is now accepted as "fact" purely on precedent.  It must be true because all of the newspapers say its true, even though it never was.  (I can't find the citation for this - but I did look quickly).

The related thing that would be cool is a place to look up common things that people want to know - e.g., paper vs plastic, disposable vs cloth, etc. with references to decent science rather than Ipse Dixit ("He, himself, said it") arguments that usually end up clouding the decision.




Edited at 2008-03-25 06:00 pm UTC

(Reply to this)(Thread)


[info]chimerically
2008-03-27 09:21 am UTC (link)
Among other things there is a lot of "newspaper facts" that aren't - someone captured a number in a newspaper story a long time ago and it is now accepted as "fact" purely on precedent.

My statistics class last quarter was full of these examples -- journalists taking a vague understanding of a new statistical result in medicine or social science and making it into a story only tangentially related to the original paper, often relying heavily on cultural stereotypes to make the story juicier. Of course, upon closer examination it turns out that many of the published academic papers were themselves based on faulty statistical premises and self-fulfilling prophecies. Unfortunately, it was rare in both of these cases to have any published rebuttals (in news or academic journals). I wonder how cases like this could be handled.

Since I'm in the process of filling a new house with stuff getting everything we need for our new place, I've often wished for reputable sources for, say, what household cleaners are actually safe and environmentally friendly, or just what the differences between different shampoos really are. While these are piddling compared to some questions, and while there are a smattering of sources that give me bits of information on these kinds of topics, it sure would be nice to have one place to go that put it all together.

These thoughts are somewhat incoherent; I have others, but I should really focus on quals for now and stop reading LJ. :~) But hopefully we can talk sometime soon, about this and other things as well.

(Reply to this)(Parent)

Great idea!
[info]sebpaquet
2008-03-25 07:30 pm UTC (link)
I see parallels with S.B. Shum's ClaiMaker system (http://claimaker.open.ac.uk/). Perhaps you can make something that is more approachable than what he did?

(Reply to this)(Thread)

Re: Great idea!
[info]zestyping
2008-03-25 07:37 pm UTC (link)
I hope so. I have seen ClaiMaker; there are some good ideas there, but i'm not planning to do graphical argument maps in Factville.

(Reply to this)(Parent)


[info]progrium
2008-03-26 12:02 am UTC (link)
I did some work a while back based on this very concept. Only it involved building a network of assertions and their supporting assertions, but also that different people would have different interpretations of whether something supported an assertion or not. It was all based on a bayes net. I can find you the paper our work was based on if you'd like.

(Reply to this)


[info]flipzagging
2008-03-26 02:17 am UTC (link)
I worry that any system such as this can be gamed, and then you may end up giving a pseudo-objective imprimatur to specious claims. It's not clear to me that there's a shortcut to understanding scientific consensus.

Still, it's worth trying. I've wanted something like this forever too. We have a problem with outright deception in this culture that I'm not sure your site could help. But at the very least it helps people who sincerely disagree.

(Reply to this)(Thread)


[info]zestyping
2008-03-26 03:17 am UTC (link)
I worry about this too. On the other hand, against all (apparent) odds, the world does seem to be a substantially better place with Wikipedia than without it. If you have any insight as to why, or how Factville could be designed to increase its chances of acquiring such magic, do tell.

I guess one way to look at it is that, although I can't guarantee that responsible scientists will "win", I can at least try to give them better odds.

(Reply to this)(Parent)(Thread)


[info]flipzagging
2008-03-26 04:32 am UTC (link)
I don't know. I'm not sure Wikipedia should be your model.

Wikipedia's mandate is to be comprehensive although shallow, so they encourage contribution, and prune mistakes when they come to the attention of admins. Most Wikipedia pages are not controversial, so this model works well.

It seems to me that Factville isn't about comprehensiveness nor even correctness, but auditability. And it's going to zero in on controversial topics on day one. Finally, it might go very deep into certain topics.

Maybe this is an advantage because you can design a system around these requirements. So this could succeed on its own terms.

P.S. Like I said, I've thought about this sort of thing for years and years so I want to see someone try it. I really ought to be more positive when smart people launch ideas they care about.

If successful, it might make journalism as we know it obsolete. Instead of producing zillions of individual narratives the world could have a single resource, a massive heap of interlinked claims.

(Reply to this)(Parent)(Thread)


[info]zestyping
2008-03-26 06:43 am UTC (link)
I think individual narratives are cool; i just have this fantasy that one day they will cite Factville for support. Such is the craziness of my fantasy that it imagines, maybe when people are reading an article and come across a Factville citation, they'll actually be willing to accept that as valid evidence (or meta-evidence) — unlike today when they come across a Wikipedia citation and scoff that Wikipedia is unreliable. (Of course, a lot of people already use Wikipedia for their own purposes as if it were trustworthy, but just aren't willing to admit to it in polite company. Maybe with Factville it won't be taboo.)

(Reply to this)(Parent)(Thread)


[info]flipzagging
2008-03-26 08:12 pm UTC (link)
a lot of people already use Wikipedia for their own purposes as if it were trustworthy, but just aren't willing to admit to it in polite company

Huh? Ordinary people already treat Wikipedia like a real information source. Even among hackers and programmers (who have some degree of intellectual snobbery) citing Wikipedia wouldn't immediately brand you a n00b.

Is this an academia thing?

(Reply to this)(Parent)


[info]kragen
2008-04-15 07:59 pm UTC (link)
What meaning of "citation" are you talking about? I frequently refer to Wikipedia to get evidence about what's true and what's not, and often tell other people, "Well, Wikipedia says..." --- even though I know that's relatively weak evidence, it's some evidence, and it's better than "Some random web page Google pointed me at says..." Are those "citations"?

(Reply to this)(Parent)

Opinions matter, and probably personal networks, too
[info]sebpaquet
2008-03-26 12:49 pm UTC (link)
I see much promise in this if there is a way for me and others to indicate "I believe this claim is well-supported" or "I believe this is bogus" and to see who else, among the people I know, does and doesn't believe the claim.

Honestly, because I don't have the time to seriously dig into the evidence for all of the claims I'm building upon, I have to admit that I delegate a lot of my believing to other people. If I have an indication that someone whom I know as a serious researcher has vetted for a given claim falling in his area of expertise, I'll be more readily willing to build upon that claim.

Being able to examine controversies in terms of how they play out at the level of individuals would be mighty awesome.

(Reply to this)

Jyte
[info]sebpaquet
2008-03-26 01:17 pm UTC (link)
In a similar vein to my last comment, Jyte.com is pretty interesting to look at.

(Reply to this)(Thread)

Re: Jyte
[info]zestyping
2008-03-26 06:21 pm UTC (link)
Jyte is interesting. It's quite pretty, and highly social. Because it's pure opinion, it seems to be much more about chatting with other people and finding people who agree with you, than actually figuring anything out. Factville will aim at the latter purpose.

My hope would be that Factville makes it a little bit easier for you to delegate less of your believing to other people.

(Reply to this)(Parent)

I'd love to see it happen
[info]montyz
2008-03-26 04:35 pm UTC (link)
Some of my heroes are questioning economic reporting "facts" over at http://cepr.net

I just ran across this nice tool:

http://instacalc.com/blog/instacalc-example-youtube-analysis

It lets you embed a calculation in a web page and play with the variables. The example above is the valuation of YouTube, letting you play with estimates of the number of videos posted, monetization rates, etc. I think that documenting that kind of analysis and letting people play with the numbers or formulas could really be beneficial, especially if versioning/history could come into play.

(Reply to this)


[info]kragen
2008-04-15 08:00 pm UTC (link)
This is a great project! Is google.org going to be funding it?

(Reply to this)


Create an Account
Forgot your login?
Login w/ OpenID
English • Español • Deutsch • Русский…