Telling experts from spammers: Expertise ranking in folksonomies / Noll & Yeung (2009)
Citation - Noll, M. G., & Yeung, A. (2009). Telling experts from spammers: Expertise ranking in folksonomies.
區別專家與灌水者:以群眾分類學進行專業評比
Keyword - folksonomy, social network analysis
- 研究對象: del.icio.us 網路社群當中的網路資源 tagging 行為
- (行為) tagging: Freely annotating resources with keywords
- (行為的目的): self organizing resources, sharing, self-promotion,…
- (協同標記平台): 讓網友們自己進行資源關鍵詞註記的網路服務與社群平台 (e.g., delicious.com, flickr.com )
- (平台上的利用行為): 搜尋相關資源(relevant resources), 搜尋相關專家(experts in particular domain)
- (行為導致的現象)tagging result phenomena : bottom-up “categorization” by end users, aka “folksonomy”
- 研究問題現況: 目前的排序只能根據數量與頻率, 無法區分專業性標記與大量灌水性標記行為
- 研究目標: 設計新的演算法 SPEAR (SPamming-resistant Expertise Analysis and Ranking [防灌水專業性分析排序法]), 此方法區分專業家與灌水者, 進而改善搜尋的相關性。
- 設計原則(研究假定): 使用者在特定主題的專業性程度, 主要取決於:
We propose that the level of expertise of a user with respect to a particular topic is mainly determined by two factors: (1) there should be a relationship of mutual reinforcement between the expertise of a user and the quality of a resource; and (2) an expert should be one who tends to identify useful resources before other users discover them.- (1)越專業的人與所分享資源的品質越好;
Mutual reinforcement of user expertise and document quality: Expert users tend to have many high quality documents, and high quality documents are tagged by users of high expertise. - (2)專家比其他人更早發現有用的資源
Discoverers vs. followers: Expert users are discoverers – they tend to be the first to bookmark and tag high quality documents, thereby bringing them to the attention of the user community. Think: researchers in academia.
- (研究設計)演算法設計: graph-based algorithm (網絡關係為基礎的演算法)
- 根據在 IR 研究中, 以專家辨識作為改善檢索相關性的相關研究成果。這類似引文分析的作法。
- (研究檢驗分析) : We carry out experiments on both simulated and real-world data sets obtained from Delicious, and show that SPEAR is able to detect the difference between different types of experts, and is more resistant to spammers than other methods.
- SPEAR – SPamming-resistant Expertise Analysis and Ranking [防灌水專業性分析排序法]
- 基於[超文本連入主題搜尋]演算法 Based on the HITS (Hypertext Induced Topic Search) algorithm
- Hubs [樞紐]: 指向許多品質優良頁面的頁面 pages that points to good pages
- Authorities [權威來源]: 被許多優良頁面連結的頁面 pages that are pointed to by good pages
- 專業性(Expertise)與品質(Quality)的概念類似於樞紐與權威
- 專家是樞紐 Users are hubs – we find useful pages through them
- 品質優良的頁面是權威 Pages are authorities – provide relevant information
- 不同之處: 只有使用者(專家)可以指向文件(權威來源),而不能反轉這種關係。
- 演算法
- 實驗設計 Experimental
- 在真實世界系統中,放入模擬的使用者 Workaround: Inserting simulated users into real-world data from Delicious.com and check where they end up after ranking
- 比較 Delicious.com 中 50 tags ,當中包含了 515,000 真的使用者、71,300 實際上的頁面、2,190,000 實際上的書籤
- 模擬使用者的變項 Probabilistic simulation, simulated users generated with four parameters
- P1: 使用者收錄的書籤數量 Number of user’s bookmarks – active or inactive user?
- P2: 網頁的新穎性 Newness – fraction of Web pages not already in data set
- P3: 使用者收錄網頁的時間偏好 Time preference – discoverer or follower?
- P4: 網頁的品質 Document preference – high quality or low quality?
- 區分六種不同使用者類型
- 技客 Geek – 收錄大量高品質網頁,發掘者(跨領域研究者)
lots of high quality documents, discoverer Distinguished Researcher) - 老鳥 Veteran – 收錄高品質網頁,發掘者(教授)
high quality documents, discoverer (Professor) - 菜鳥 Newcomer – 收錄高品質網頁,跟隨者(博士生)
high quality documents, follower (PhD student) - 氾濫 Flooder – 隨機的收錄大量網頁,跟隨者
lots of random documents, follower (found in Delicious) - 促銷者 Promoter – 主要收錄自己的網頁,發掘者
some documents (most are his own), discoverer (found in Delicious) - (鄉民)特洛伊人 Trojan – 收錄少數網頁,跟隨者
some documents, follower (next-gen spammer)
- 比較三種不同演算法的成效
- SPEAR
- HIT
- frequency count ranking algorithm, FREQ,
- 研究結果
- SPEAR 較另兩種演算法,更能有效的區別出三種不同類型的Spammer
Note
這篇文章在定義上,混同了 folksonomy 與 collaborative tagging 。這可能會有一些理論上的爭議,但若使用 folksonomy 是一種現象的定義則可。作者的 folksonomy 比較像是 collaborative-tagging-graph 。
Metadata/Backlinks
.