Human-Machine Collaboration and the Problem of Information Overload

On August 17, 2009, I attended the third Toronto Semantic Web Meetup organized by William Mougayar.  The topic for this event was “Human-Machine Collaboration & the Problem of Information Overload”.  The presenter was Karl Dawson, CEO of phiScape AG, a company specializing in “distributed computing and data integration in heterogeneous environments, with particular application to digital media and knowledge management systems”.  Karl also gave us an advance screening of his new product Topicmarks, “a browser plugin which helps people learn more efficiently and effectively by extracting significant concepts from unstructured text”.  I first met Karl at the inaugural Information Overload Awareness Day Toronto event, where he presented “Information Overload – More Personal Than You Think” using the Pecha Kucha format.

My Thoughts Enclosed…Rb

Giving Back to the Presenter

The first part that caught my attention about this event, was not the presentation itself but the guideline that surrounded it.  William Mougayar, the organizer, made a fair statement at the beginning of the presentation which set the tone for the evening.  William indicated that as part of our contribution back to Karl for his presentation, we should try to provide Karl with ideas and comments that would help tailor or market his product.  I found this to be a very insightful comment to be made prior to a presentation.

Quite often, presentations are followed by a Q&A session.  These Q&A sessions typically consist of numerous attendees failing to engage in dialogue.   When they do, it is typically self-benefiting questions that are asked.  There is nothing wrong with asking questions to further expand your understanding of the topic.  However, we often fail to provide value back to the presenter who has taken the time to share his knowledge for often little more than a round of muted applause.

What made this presentation different, is that William made it clear at the beginning of the presentation that this would be a two way exchange of knowledge.  To me this is what defines a successful presentation: attendee engagement.  By outlining the challenges or gaps the presenter is facing at the beginning of the presentation, you provide an opportunity for the attendees to focus their contributions to the topic.

As we all know, it is often very hard to make money on the web in software development.  How to monetize innovation is a challenge that faces not only startups, but established pioneers of the web.  The key is to target the pain point of your target audience.  Unless you make a direct correlation to a problem of significance, your audience is unlikely to part with their ever tightening capital.  By targeting a solution to a problem that people can equate monetary value to, you can achieve market equilibrium for your product.  I also consider this to be the fundamental failure of most companies in this tough market.  To many staffers in organizations are not fully aware of the real goal of their employment.  Thus the company fails to meet the expectations of their clientele, but that is a topic for another day.

Our part in the presentation, was not only to digest the knowledge that was being shared with us.  Our contribution would be to provide our insight to how Karl’s product could be tweaked to address potential pain-points in the market.  This to me is a great use of crowdsourcing of your product management and market analysis.  By tapping into the knowledge of your audience through an engaged presentation, all parties come out feeling the presentation provided shared value.  It is my hope that the feedback given to Karl at the end of the presentation was both constructive and of value.

Those Who Control the Knowledge, Control the Masses

Karl indicated that there are between 20 and 45 Billion pages of accessible content on the Internet.  He put forward that there is likely another 500 times more pages of dynamic content that are non index-able using mainstream means.  Modern search engines rely heavily on diverse forms of indexes to aggregate their content.  Karl puts forward the accurate notion that to harness the power of the untapped content can be a marketable advantage.

Karl also put forward the notion that search engines provide both the access points for the Web, but also its advertising engine (e.g. Google AdSense).  As these search engines become more and more commercialized, their interests may not necessarily match your searching needs.  Their need for control over the results of your contextual search will likely become another challenge to attaining net neutrality.  How Search Engine Providers and Internet Service Providers manage the content being made available is an ever growing concern as the Internet we now know evolves to its increased commercialization.

Karl discussed briefly on a study that was made to identify the best means for searching content.  The following query languages were reviewed: Keywords (e.g. NLP-Reduce), Natural Language (e.g. Querix), Controlled Language (e.g. Ginseng), and Graph-based Formal Logic (e.g. Semantic Crystal).  What was found what that full English questions were judged to be the best form for casual users.  However, keywords queries tended to provide for better results.  As such, it can be conclude that a combination of keyword and natural language would most likely provide both the ease of query and the quality of results.  A discussion on Talking to the Semantic Web: Query Interfaces to Ontologies for the Rest of Us further discusses the query options applicable to the Semantic Web.

The framework that is the Semantic Web is to facilitate context for both Humans and Machines.  It fundamentally starts with associating a Uniform Resource Identifier (URI) with Resource Description Framework (RDF) metadata.  RDF is an assertional language comprising of what is referred to a the RDF Triples.  RDF Triples make statements about URI resources by using a Subject, Predicate, Object statement (e.g. This Blog entry (subject) was written by (Predicate) Robert Lavigne (Object)).  By binding properties to URI resources, we are providing valuable context that can be used not only by Humans but by the Systems who support their efforts.

Semantic Web Stack (via Wikipedia)

Ontologies, such as Web Ontology Language (OWL), provide specifications for the conceptualization of content.  Ontology hopes to addresses the n-space of exponential content growth and thus information overload.  By providing multi-dimensional connectivity to content, we can harness the full power of the metadata available.  Most ontologies have an upper level that provides a pre-defined definition.  The problem we face however is that the web provides us with no upper level ontology for the majority of its content.  Everyone has their own definition of content that has evolved by independent means.  The merging of independent definitions is a large challenge in bringing definition to our global content.  Those who control the content will also control the access to content in a defined and controlled language.  The consolidation of language and the “battle of the bots” to structure the content will define our accessible knowledge.  The possibility of “gaming” the results is a real concern as we become more and more dependent on this evolving source of knowledge.

Prosumers and Their Role in Semantics

In a growing world of Prosumers (producer-consumers), our accessible knowledge is no longer simply stemming from the experts in a top-down distribution hierarchy.  We are now faced with a bottom-up creation hierarchy with the weight of the masses contributing to our knowledge.  I Slept Through Class is a prime example of the commercialization of educational content in a Prosumer market.  The knowledge we access is no longer segregated into expert opinions and individualistic expression.  As such, we often must ask ourselves where does this knowledge come from and can it be trusted,  This is were the role of the Prosumer plays not only a part in the creation of content but also the validation of content.  Introducing a layer of accreditation and attribution is key to building trust to the “wisdom of the crowd”.

Trust is at the top of the Semantic Web framework and is supported by a layer of proof.  This is an ever growing issue for the masses with the rise of Prosumers.  This is becoming more and more common in the market as crowdsourcing becomes more mainstream.  Prosumers are also key to the successful contextualization of the untapped content the web has to offer.  As the author has the initial responsibility to annotate their content, the role of the producer should be clear.  Descriptive elements, such as Dublin Core, entered by the author should ideally precede publication of content to the web.  However, it is also the consumers responsibility to expand on these annotations.  This is were concepts such as Folksonomies come into play.  Social tagging provides not only added context, but also provides added validation by the masses which inevitably will build a layer of trust in the content being retrieved.

Knowledge is complex and highly subjective.  There exists two primary types of knowledge: A priori and a posteriori.  As such, we need to make use of Heuristics to break out of our defined logical processes.  Our systems need to shift from deductive reasoning to an inductive-based model.  The development of content synonyms will hopefully lead to the necessary lateral thinking required to achieve this goal.  With that, Personal and Shared Knowledge will hopefully meet and thus define the required ontology to achieve Human Machine Collaboration.

Recommended Links:

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s