Monday, June 6th, 2016 9:00-10:30, Auditorium
Slides: PDF (27.7MB)
In this talk we ask the question: How far are we from collecting the knowledge in the world? We analyze the knowledge that has been extracted to Freebase in three categories: head knowledge in head verticals (e.g., music), long-tail knowledge in head verticals, and head knowledge in long-tail verticals, showing the limitations and challenges in current knowledge-collection techniques.
We then present two key efforts at Google on collecting tail knowledge. The first, called Knowledge Vault, targeted on tail knowledge in head verticals. It used 16 extractors to periodically extract knowledge from 1B+ Webpages, obtaining 3B+ distinct (subject, predicate, object) knowledge triples. The second, called Lightweight Verticals, targets on head knowledge in tail verticals. It uses a crowd-sourcing approach to collect knowledge by annotating websites, and currently has millions of active Google Search users every day. We present some key technologies under both projects, namely, knowledge fusion for guaranteeing knowledge correctness, and knowledge-based trust for finding authoritative sources for knowledge curation.
Xin Luna Dong is a Senior Research Scientist at Google Inc. She is one of the major contributors for the Knowledge Vault project, and has led the Knowledge-based Trust project, which is called the "Google Truth Machine" by Washington's Post. She has co-authored book "Big Data Integration", published 65+ papers in top conferences and journals, given 20+ keynotes/invited-talks/tutorials, and got the Best Demo award in Sigmod 2005. She is the PC co-chair for WAIM 2015 and serves as an area chair for Sigmod 2017, Sigmod 2015, ICDE 2013, and CIKM 2011.