Automated Filtering vs Human-Powered Curation

With solutions like Paper.li and Browse My Stuff grabbing attention, and with people like Robin Good doing a series on Real-Time News Curation, and Ross Dawson tells us has curation has hit the tipping point, it seems like the concepts of curation, aggregation, filtering are suddenly a central conversation.  Of course, this has long been a conversation as Robin, Ross or I could tell you.  However, what has really changed here is first the explosion of content sources.  As Robin put it:

“You cannot follow and keep yourself updated in an effective way by simply subscribing to as many sources as possible.”

I would also say that what has really changed is the sophistication of automated filtering to deal with extracting value from the noise.  I had an interesting exchange with Robin around the question of what constitutes filtering and how that differs from curation.  I’ll get to that in a bit.  Let’ me first provide some background on filtering.

Approaches to Automated Filtering

In Different Approaches To News Filtering, Mahendra Palsule who is an editor at TechMeme, identifies the following ways to automatically filter content.  Adding in a few from Louis Gray’s The Five Stages of Filtering, Relevance and Curation, I believe there are roughly the following types of automated filtering mechanisms:

  • Text Filtering – exclude-include specific keywords, terms and phrases
  • Semantic Filtering – exclude or include based on semantic analysis of content
  • Explicit Crowdsourced Filtering – using voting mechanisms to identify what people prefer, e.g., Digg.
  • Social Filtering – using social signals for implicit crowdsourced filtering.  This can take several forms:
    • Content Social Filtering – use social signals from against particular content broadly, ex. Browse My Stuff and Tweetmeme.
    • Social Graph – use a particular social graph ex. Paper.li
    • Influencers – use high influencers or people who influence you or your particular friends, ex. Google Social Search.
  • Explicit Personalization – Based on what you have told us you like or don’t like, or where you are located, the system will filter the results, e.g., Netflix’s, MeeHive.
  • Implicit Personalization – Based on what you have done in the system, it automatically filters.  This is being done by Google search as well as Amazon.com and my6sense.

Of course, it’s likely the case that any effective automated filtering mechanism will combine several of these to derive even better results.  And, of course, “better” is often hard to define.

Aggregation, Automated Filtering and Curation

In Real-Time News Curation – The Complete Guide Part 2: Aggregation Is Not Curation, Robin good talks in detail to the following:

  • Aggregation is not curation
  • Filtering is not curation
  • Aggregation is Automated, Curation is Manual
  • Automated Aggregation without Curation, is mostly spam
  • The Solution is in the MIX – Human Curation + Machine Aggregation

Robin and I agree on the definition of aggregation as the bringing together of content from various sources, filtering is finding the good stuff (removing noise and ranking the rest), and curation is about organizing, maintaining and adding value to a body of information for current and future use.

Where we disagree is whether there are forms of automated curation.  Robin believes that for something to be curated there must be a human doing manual curation.  Robin believes that this must be manual and I believe he also sees this as a single individual or small group of editors.

Where I think he runs into problems is when there are many people involved, possibly a crowd, or even everyone.  Maybe even there are different levels of these people.  The crowd does curation using explicit crowdsourced filtering or via social signals.  This results in filtering and ranking.  Then maybe a smaller group of individuals does additional filtering, ranking.  And maybe they do this explicitly or implicitly.

In any case, it’s very hard to separate social filtering from curation.  For me, I will continue to refer to curation via social signals as curation.  I don’t disagree that additional curation from direct involvement of active, human curators will produce additional information that can be used.  But I also see a bit of John Henry vs. the Steam Drill here.  See an example in Filtering and Curation of Specific Topics.

I’m also a bit worried that many traditional publishers will read Robin’s post and believe that they should continue to focus on human powered curation to their detriment.  As Steve Rosenbuam asks, Can Curation Save Media?  To me, it only can save publishers if they rapidly move to take advantage of automated filtering as opposed to human-powered curation.  Otherwise, they will be like John Henry and fight incredibly hard only to die.

Still all of this argument likely leaves Robin and I in the same place.  I thought that David Koretz in You’re Not That Interesting captured it pretty well:

Successful media will become aggregators and editors of content, rather than creators. The smart money will build a technology to gather, sort, and filter stories from every corner of the world, and couple it with smart and thoughtful humans to do the editing.

He’s right on about who will likely win at the end.  Just be careful that you don’t make your editors do too much manual curation and die fighting the machine.

11 Responses to “Automated Filtering vs Human-Powered Curation”

  1. Tom Pick September 20, 2010 at 8:51 am #

    Great piece Tony – impressive research! I love the John Henry reference, very appropriate.

    One possible approach is to “tier” sources, e.g. Tier 1 is sources that become “trusted” over time, and can safely be republished automatically unless a problem occurs. Tier 2 might be “trust but verify” where the author is usually reliable but occasionally veers off course. Tier 3 would be those whose content always needs to be screened, either because the source is new or just inconsistent.

  2. admin September 20, 2010 at 9:19 am #

    Tom – that’s a great suggestion on how to screen. We are always looking for ways to better filter and when to involve human curation. I’ll have to think about how to handle this.

    And since your comment got caught in a spam filter – obviously there’s always a limit to what automated filtering can do.

  3. Robin Good September 20, 2010 at 12:46 pm #

    Hi Tony, great analysis and commentary. I really appreciate it, as it does help me question my own points and allows everyone to see this from multiple perspectives.

    You say: I will continue to refer to curation via social signals as curation.

    I say: In my view, “curation via social signals” is a wonderful thing, but, by definition it is something different from “automatic filtering based on social signals”.

    I think that “automatic filtering via social signals” is a great, powerful asset that curators should leverage for their work.

    As far as me scaring out some old traditional publisher about the fact that what I call real-time news curation, can be best done with the help of human-intelligence instead of algorithms, you may be indeed right. My limitation is probably due to the fact that I base my views on the experience I have in doing content and news curation. I use automated tools to filter out some of the stuff, or to find special things, but if I was only to rely on them, I’d hardly find new interesting, emerging new sources or the dissenting voice.

    But again, I maybe wrong and would love to learn something news on this.

    So my question is: excluding general news gathering a-la Twittertim.es, Paper.li or Flipboard, can you share some interesting examples of automated social news curation that you think are effective in curating a very specific topic – theme?

    I also couldn’t agree more with your statement: “I don’t disagree that additional curation from direct involvement of active, human curators will produce additional information that can be used”, because I think the point is all in there. In those simple words. Human curation can add a TON of additional value. This is key point indeed.

  4. Mud Kitten September 21, 2010 at 12:08 am #

    I’ve been reading about the new $50 buzzword curation for the past 6 months, and the various semantic debates over what it is and isn’t. It doesn’t matter what you call it. It’s all about getting product to the right people and there will be many ways to distribute/repurpose content. For some categories and users (e.g., entertainment and 20-year-olds), lowest common denominator filters like digg or whatever comes next may be good enough and serendipity is a lower bar. For users whose time is money, content will need to be filtered more finely, through a smaller select crowd or through paid editors. There will be no one-stop, one size fits all solution and no deus ex machina algorithm that solves all our content needs. Just like there wasn’t before 1996.

  5. admin September 21, 2010 at 8:46 am #

    Robin – great comment and a REALLY great question – “effective curating/filtering a very specific topic” – let me come back to you on that.

    Mud Kitten – you are likely right that this turns into a small semantic debate over whether and how humans are involved in anything that is called “curated”. And you raise a similar question as Robin (without having seen his comment).

    Clearly this is what needs to be addressed.

  6. Mud Kitten September 21, 2010 at 10:26 pm #

    I haven’t run across a world-class focused curation site yet. See some interesting and differing examples below of sites on the right track.

    For topics that interest me, I want someone to scour the web and elsewhere for the highest quality, “durable” content even if it’s long-form and I have to instapaper it, or (god forbid) print the text. I want stuff from blogs, traditional print, podcasts, videos, audio — whatever is relevant and good. And I want it now, and I want it forever and into the distant past. I want context and history but updates and forecasting. I want analysis not news. More thought than action. And I will pay. And in this world of info overload, I cannot possibly be alone, at least not for long. There is a tipping point coming when users especially professionals who have to ration time ruthlessly are going to conclude that they might be better off without the distraction of all these new fancy web tools. They might just pine for the return of something that looks like (gasp!) a portal (or at least lots of them focused deeply on key topics). So sites that have put some but not nearly all of the pieces together are:
    RealClearPolitics (yes, links are boring and they could do more, but the kernel is here — pick the most important things to read each day)
    Arts & Letters Daily (ok the site functionality sucks — no way to tell what you’ve read, no search, categories, etc) — but again, the basic premise is good — here’s the best stuff to read
    Eguiders (maybe not the best video recommendation site — not my area of expertise — but headed the right direction by using experts to recommend stuff
    Interested to see your lists.

  7. Sérgio Santos September 30, 2010 at 5:40 pm #

    Great round up on both automatic and manual curation techniques.

    One thought intrigues me. Will journalist and other publishers disclose publicly how much they’re aided by automatic methods. Won’t credibility be an issue here?

  8. Pietro Polsinelli October 21, 2010 at 2:11 am #

    I think there is quite a difference about real time curation and non-web-journalist curation. The amount of information that non-journalist are receiving got considerable too, and with social media in a way many non-journalist are doing content curation for their listeners. But the tools, ends and approach may be a bit different in the two cases. A blog post about this “Curation beyond social media” here:

    http://pietro.open-lab.com/2010/09/28/curation-beyond-social-media/

  9. Tony Karrer October 1, 2010 at 5:47 pm #

    Great question Sergio. I bet that most people won’t disclose that.

Trackbacks/Pingbacks

  1. Marketing via Aggregation, Filtering and Curation – Tools and Resources - January 31, 2011

    [...] Automated Filtering vs Human-Powered Curation [...]

  2. Digital curation, babes digitala « Deusto's Littera Media - May 9, 2012

    [...] curation eta informazio filtroaren arteko lotura estuaz, honela mintzo da Tony Karrer, ”It’s very hard to separate social filtering from curation. Automated aggregation without [...]

Leave a Reply