Personal tools
You are here: Home Events ANC Workshop: Jinli Hu/Xin He Chair: Ian Simpson

ANC Workshop: Jinli Hu/Xin He Chair: Ian Simpson

— filed under:

  • ANC Workshop Talk
When Nov 11, 2014
from 11:00 AM to 12:00 PM
Where Room 4.31/4.33
Add event to calendar vCal

Jinli Hu

Title:   Latent Scoring Rules and Its Implementation


Scoring rules provide a simple way to elicit agent's belief information. The agent is required to provide a probabilistic report/distribution on the possible observations of a random variable, and a logarithmic scoring rule scores this report by using the real observed value.

There exists a close relationship between logarithmic scoring rules and log likelihood. By utilising this relationship we are able to extend the class of log scoring rules to score latent variables whose values will never be observed. The basic idea is to build a scoring procedure mimicking the EM algorithm.

More things need to be considered in scoring rules than in the standard EM setting. One of particular importance is to ensure that the latent scoring rules will still elicit agents' truthful beliefs.

 This latent log scoring rule is motivated in two ways: 1) that it makes sense if the problem clearly contains latent variables; and 2) that modelling the joint report over the observable and the latent is simpler than the marginal report over just the observable, as is the case in the EM algorithm. These two arguments will become clear if the report/distribution is restricted to exponential family, which can provide an easy way to store and represent the report/distribution.


Xin He

 Title: A generic data fusion framework for biological ontologies


Researchers often want to derive biological insight from lists of genes that have emerged from experimental and computational studies. A common approach is to perform gene set enrichment analysis (GSEA) to find out whether any particular biological properties are annotated  to genes of the list in a higher than expected number. The effectiveness of any such approach is dependent on the quality and the coverage of the gene-association data. Whilst iterative improvements to Gene Ontology association data have been ongoing for decades, many other established and emerging ontologies that would be beneficial for biological interpretation of gene lists have poor and inconsistent coverage.

In order to facilitate the adoption of other ontologies and data corpora for ontology based GSEA we have developed an integrated and generic framework, DisEnt, that first uses NCBO-Annotator and MetaMap to map ontology terms from free text, then integrates this multi-sourced mapping data into a unified relational database system for GSEA analysis. Finally we use a modified version of the R/Bioconductor TopGO package to calculate topology aware statistics for term annotations between source and background gene lists. We generated a gene-disease association database with Human Disease Ontology from sources including OMIM, GENERIF and ensembl-variation and got improved result from the GSEA.