Monday, February 1, 2010

Wherein I respond (calmly!) to Copycense's charge of 'impetuous[ness]'

Copycense has posted a response to my post about Ed Felten and Sauhard Sahi's post about their "census" of infringing files available on BitTorrent. The main thrust of Copycense's post is that, before drawing any definitive conclusions, we should inquire further into Felten and Sahi's methodology and underlying data. I completely agree, and look forward to reading the final paper once it is published.

But Copycense's response to my own post is really off-base. First, Copycense calls my post "reflexive and impetuous." I don't know about "reflexive"; I saw Felten's post, thought it interesting and newsworthy (as did many others), and did a very short post summarizing his findings and alluding to its relevance to public policy debates. But "impetuous"? "[M]arked by impulsive vehemence or passion"? Read my post; sorry to disappoint those who prefer their meat raw, but you'll search in vain for anything remotely "impulsive," "vehemen[t]", or "passion[ate]."

Copycense then writes:
In this quote and subsequent responses to reader comments, Sheffner suggested that Internet service providers have a duty [sic] restrict infringing traffic on their network, and that this duty should manifest itself in a three-strikes/graduated response policy that has been adopted nationwide in France and is beginning to be adopted in other European Union countries.
I'll address the latter part of the sentence first. Copycense's allegation that I have advocated "a three-strikes/graduated response policy that has been adopted nationwide in France and is beginning to be adopted in other European Union countries" is simply false. I have said repeatedly that I am skeptical of a government-run program like Hadopi. If graduated response/three strikes for ISPs is to come to pass in the US, it will most likely be through voluntary agreements between ISPs and copyright owners (which I do support if appropriate safeguards are in place). As for my "suggest[ion] that Internet service providers have a duty [sic] restrict infringing traffic on their network," Copycense does not deign to state what, if anything, it finds incorrect in what I actually wrote. The "response[] to reader comments" I believe it's referring to merely quoted the DMCA verbatim. That statute unquestionably states that, in order for an ISP to benefit from the Section 512(a) safe harbor for "Transitory Digital Network Communications," it must "adopt[] and reasonably implement[], and inform[] subscribers and account holders ... of, a policy that provides for the termination in appropriate circumstances of subscribers and account holders of the service provider’s system or network who are repeat infringers." 17 U.S.C. § 512(i). There's room for debate about what those words mean, but, given the language of the DMCA, does Copycense (or anyone else) dispute that existing US law imposes some enforcement obligations on ISPs?

Lastly, I want to address Copycense's suggestion that there is something wrong with my statement that the Felten/Sahi conclusions about the level of infringement on BitTorrent is "[v]aluable information to keep in mind while debating net neutrality rules and ISPs’ right to manage their networks and fight piracy." I stand by that statement, which shouldn't be remotely controversial. Of course in evaluating any sort of copyright enforcement regime against an intermediary like an ISP or peer-to-peer network, the percentage of infringing works on that system is highly relevant. An overwhelming use for infringement will justify a harsher enforcement regime. But when use for infringement is lower, enforcement must be calibrated so as to avoid targeting innocent activity. Courts routinely cite the percentage of infringing works on a system in deciding whether to impose liability. See, e.g., Grokster ("nearly 90% of the files available for download on the FastTrack system were copyrighted works"); Napster ("as much as eighty-seven percent of the files available on Napster may be copyrighted and more than seventy percent may be owned or administered by plaintiffs."); IsoHunt ("According to Plaintiffs’ expert Richard Waterman, approximately 95% of downloads occurring through Defendants’ sites are downloads of copyright-infringing content"); ("Plaintiffs’ expert has testified that, based on a statistical analysis, over 94% of all content files offered in music-related binary newsgroups previously carried by Defendant UCI were found to be infringing or highly likely to be infringing."). And those numbers all come from copyright owners' paid experts; it's particularly noteworthy that even higher numbers come from Felten, a noted critic of the entertainment industry's enforcement tactics. (The law of course recognizes that "statements against interest" have inherent credibility. See Fed. R. Evid. 804(b)(3).)

By all means, let's have a look at the methodology and data underlying the Felten/Sahi "census." But pending that, surely there's nothing wrong with noting the findings and commenting on their relevance to the debate about the proper way to combat piracy.

Update: Please read Tom Sydnor's take on this kerfuffle. If only I had thought up the phrase "ineptly affected data-prudery" myself...


  1. "By all means, let's have a look at the methodology and data underlying the Felten/Sahi "census.""

    1000 files is not indicative of the bleep on my home machine much less all the content thereon. That's like taking a "census" of one file folder named "Ripped feature films" and calling it a census of a network. If they indeed found this number then their methodology is stu pid.

    I take that back, their methodology is stu pid on its face because of the sample size. 100000 files would probably not be represenative of all the content on one of the larger networks. 1 million might be a statistically significant indicator.

  2. @Anonymous:

    I don't know the minimum number of files that must be examined to come to a statistically significant conclusion in this context. But I do know that pollsters routinely survey about 1000 or 1500 people -- out of a US population of 300 million -- and achieve pretty accurate results. Just labeling Felten's/Sahi's methodology "stupid" is, well, you know...

  3. I'd echo the comment made by Shane in your original post -- the choice to survey only a trackerless torrent DHT network may have biased the sample. Just another reminder that it's hard to come up with a "neutral" methodology to evaluate overall usage of a decentralized technology. And that we all have to be cautious about drawing generalized conclusions from limited data sets.

  4. Fred -

    Can you really, with any credulity, argue that there is anything but a trivial non-infringing population of files on any given BT network? Let's face it, BT traffic is shaped by popularity and current demand, which means that virtually the entirety of any traffic that you sample will consist of copyrighted works.

    I just did a survey of the most popular files across a number of the largest torrent sites. 100% of the top downloaded files represented copyrighted material, and virtually all of it was of recently released media (<1 week of release). And these are just the trackers with central servers. Why should it not get any worse when one takes tracking out of the equation?

    We need to face facts that virtually the entirety of the BT traffic on the internet today consists of the transport of copyrighted files. To deny that shows a profound naiveté of the demand this protocol satisfies.

  5. Anon 10:42am -

    "their methodology is stu pid on its face because of the sample size. ... 1 million might be a statistically significant indicator"

    You clearly have little grasp of statistics, population sampling, or sample size selection. If you did, you would know that, to achieve a certain level of confidence, only a relatively small sample size is needed when the population size is comparatively very large. There are a number of resources to educate yourself online if you wish, as well as a variety of sample size calculators to satisfy your curiosity under a variety of scenarios.

    Let's assume, for a moment, that there are 1 billion torrents available. And let's assume that we want to achieve a level of confidence in our statistics of 99% plus or minus 3% (in other words, a very high degree of certainty in the result). What would our sample size need to be to achieve that?

    You assert that 1 million or more wouldn't be sufficient. Would you be surprised to learn that, on these facts, a sample size of only approx. 1850 would be sufficient to represent that population?

    A sample size of only approx. 16,000 would achieve a confidence level of 99% plus or minus 1% in a population of 1 billion files.

    Simply put, your attack on the methodology holds little merit.

  6. >> To deny that shows a profound naiveté of the demand this protocol satisfies.

    You all talk like it's all about the protocol when it absolutely has *nothing* to do with the protocol. If BT disappeared tomorrow there would, in short order, be another protocol to fill its shoes. Ban all the popular protocols that lend themselves to efficient filesharing and it will be shared with the same vigor over http using large dump sites.

    Let's not forget, either, that a *significant* portion of copyrighted material is literally served (knowingly, willingly, and without preventative measures)to subscribers via newsgroup servers hosted by the major ISPs.

  7. Just a note to commenters (and wanna-be commenters). Please re-read the instructions just above. Keep things civil and substantive. No name-calling. I reject comments that cross the line.

  8. Anon 6:50 -

    When the protocol becomes synonymous with copyright infringement, it's fair to talk about them interchangeably.

    Also, the fact that infringing files can be found on file upload sites and usenet is simply a deflection from the focus on BT networks. Yes, those sites are equivalent cesspools of infringement that need to be stamped out, but that's a discussion for a later time.

  9. To the "Anonymous" who asked whether there is "anything but a trivial non-infringing population of files on any given BT network," I'd answer sure.

    There are two problems with Mr. Anonymous' question (actually, assertion). First, as a technical matter, there is no single "BT network." And so yes, there are certainly BT networks where the vast majority of materials exchanged are non-infringing -- just consider the trackers that specialize in Linux distributions. Of course, if Mr. Anonymous focuses on "the largest torrent sites," he'll get a different answer, because those are different networks.

    Second, while Mr. Anonymous may see noninfringing use of BT as "trivial" as a matter of percentages, given the huge number of BT uses, even small percentages amount to large numbers of actual people downloading actual files. So it depends what you mean by "trivial."

    I'm not suggesting that, if we could somehow poll all BT uses (something that is both technically and legally difficult), we wouldn't discover that most of the activity is infringing. But we can't know over time how that might change (after all, in the days of VCRs before authorized movies on videocassettes, there was lots more infringing use).

    That's why the Sony Betamax doctrine asks whether a technology is "merely capable" of noninfringing uses, rather than focusing on percentages in any moment of time.

  10. Fred --

    I think it's fair to point out that the law has developed significantly since the Sony-Betamax decision in 1984. The list of cases where online infringement facilitators have tried, unsuccessfully, to invoke Sony-Betamax's "capable of substantial noninfringing uses" language as a shield from liability, is long: Napster, Grokster, Aimster,, IsoHunt, and probably a few more that I've forgotten.

    You were successful in invoking Sony-Betamax in Grokster at the district court and the 9th Circuit. But the Supreme Court, in reversing, made very clear that the 9th Circuit (as urged by you) had over-read Sony-Betamax as providing an absolute defense to any product or service that was merely capable of substantial noninfringing uses, no matter the intent and activities of the operators or the degree of infringement occurring on the site:

    "The Ninth Circuit has read Sony’s limitation to mean that whenever a product is capable of substantial lawful use, the producer can never be held contributorily liable for third parties’ infringing use of it; it read the rule as being this broad, even when an actual purpose to cause infringing use is shown by evidence independent of design and distribution of the product, unless the distributors had “specific knowledge of infringement at a time at which they contributed to the infringement, and failed to act upon that information.” 380 F.3d, at 1162 (internal quotation marks and alterations omitted). Because the Circuit found the StreamCast and Grokster software capable of substantial lawful use, it concluded on the basis of its reading of Sony that neither company could be held liable, since there was no showing that their software, being without any central server, afforded them knowledge of specific unlawful uses.

    "This view of Sony, however, was error, converting the case from one about liability resting on imputed intent to one about liability on any theory. Because Sony did not displace other theories of secondary liability, and because we find below that it was error to grant summary judgment to the companies on MGM’s inducement claim, we do not revisit Sony further, as MGM requests, to add a more quantified description of the point of balance between protection and commerce when liability rests solely on distribution with knowledge that unlawful use will occur. It is enough to note that the Ninth Circuit’s judgment rested on an erroneous understanding of Sony and to leave further consideration of the Sony rule for a day when that may be required."

  11. Mr. Sheffner:

    Hi, this is Tom Sydnor from the Progress & Freedom Foundation. Here is a link to my own reply to the Copycense Editorial:

    Simply put, it's too cute by half. Sure, a published Sahi-Felten study will be more informative, reliable, and valuable than a summary. And, sure, the list of things that could go wrong during any sort of statistical analysis is much longer than even the Copycense editors suggest.

    That said, your focus on net neutrality makes the objections voice by Copycense almost uniquely ridiculous. Policymaking based upon imperfect or preliminary data that at least aspires to be generalizable may not be ideal--but it sure beats policymaking based upon a few anecdotal cases in which zealots chose to engage in behavior that looks suspiciously like the electronic equivalent of using the butt of a screwdriver to pound nails....

    Regards. --Tom

  12. Fred -

    That was a non-response. In your earlier post, you assailed the statistics as "biased," thus leading to unwarranted "generalizations" about infringements on the network. There inference was that the level of infringement seen can't possibly be that high, so my question focused on the propriety of that argument. Instead of an answer, I got unrelated quotes from Grokster.

    On Grokster, however, I find it pertinent that the Court also said:

    "In sum, when an article is 'good for nothing else' but infringement ... there is no legitimate public interest in its unlicensed availability."

    When the surveys yield that 99-100% of the files surveyed are infringing, and that conceivably all of the BT traffic in any given day consists in the transport of copyrighted material, one thing is clear - the BT network is good for nothing else but infringement, the two simply become synonymous. When that happens, the Court reasoned, "the only practical alternative [is] to go against the distributor of the copying device[s]" because "it [becomes] impossible to enforce rights in the protected work effectively against all direct infringers."

    The only other interpretation of the Court's statement is that a technology will never be liable for secondary infringement because there's always some miniscule, conceptual non-infringing use. Clearly, the Court wouldn't adopt such an absurd logical assumption.


Comments here are moderated. I appreciate substantive comments, whether or not they agree with what I've written. Stay on topic, and be civil. Comments that contain name-calling, personal attacks, or the like will be rejected. If you want to rant about how evil the RIAA and MPAA are, and how entertainment companies' employees and attorneys are bad people, there are plenty of other places for you to go.