Premium Essay

Unit 8 P2pr System

Submitted By
Words 767
Pages 4
The idea of a reputation-based approach to P2PIR systems is to mine users past interaction data to assign reputation scores to documents and peers. However, in our evaluation testbeds, we do not have any past user interaction data. Hence, we simulate user interaction and assume that the user may click a relevant document if he/she is presented with a ranked list. We also consider a scenario where users may click some non-relevant documents as well. Our simulation is conducted under three steps as follows:
\\
\noindent{\textbf{Preprocessing phase (or Simulate Queries):}} we use the TREC topics 451-550 to generate 100 simulated queries for each topic (or $Y = \{y_{1}, y_{2},..., y_{100}\}$) to be used as training set and use the corresponding …show more content…
$tf\!\cdot\!idf(t,d_{i}^{y})$ is the $tf\!\cdot\!idf$ of the term $t$ in retrieved document $d_{i}^{y}$ from the set $D_{y}$. \item The set of extracted terms along with their values from previous process (i.e., $T_{D_{y}}$) were used to generate queries for simulation (training queries) by combining them together (phrases). The most likely phrases were selected by using a co-occurrence method called Tanimoto co-occurrence function \cite{Jose2008} as follows: \begin{equation} \small \label{relevance_Phrases} \begin{split} \hskip-0.7cm \!rel(q,t_{A} \land t_{B}) =\!\sum_{t_{C} \in q} (t_{A}^{Score} + t_{B}^{Score})\!\times\!Tanimoto_{D_{y}}(t_{A} \land t_{B},t_{C}) \\ \forall t_{A},t_{B} \in T_{D_{y}} \land t_{A}\neq t_{B} \end{split} \end{equation} $t_{A}$ and $t_{B}$ are two tuples of terms and values as discussed before; $(t_{A}^{term},t_{A}^{Score})$ and $(t_{B}^{term},t_{B}^{Score})$ respectively. \begin{equation} Tanimoto_{D_{y}}(t_{A} \land t_{B},t_{C})= \frac{c_{ABC}}{c_{C}+c_{A}+c_{B}-c_{ABC}} \label{2} …show more content…
\item If the two top-ranked terms based unigram appear to be a phrase in the top-ranked phrases based on the bi-gram, these two terms are replaced by these phrases. \end{enumerate}
\end{enumerate}

\noindent{\textbf{Training phase:}} The system, in this phase, issues the 10,000 queries (i.e.~100 for each TREC topic) randomly from different peers for each query. Then, the system follows the flooding approach in semi-structured P2P-IR models to route the query to the peers and super-peers. Once the sender receives the final result list, the simulation algorithm mimics the behaviour of users as shown in Figure~\ref{SimulatingUserBehaviour}: if a user is involved, he/she would have downloaded or clicked the relevant documents in the final ranked results list. Our approach uses the assessment judgement file of TREC topic 451-550 to determine the relevant documents in the result list. Consequently, the system selects 10\% or less of documents randomly from the result list, assuming that the user implicitly downloaded or clicked them as occurred in a real-life user behaviour in P2P-IR systems. The feedback results of documents are sent back to their super-peers that manage the peers contain the documents. Each super-peer

Similar Documents