十二月 29, 2008 0

国内搜索相关的实验室+news digest

By clfour in 干活

1. 哈尔滨工业大学信息检索研究室  语言技术网
2. 复旦大学媒体计算与WEB智能实验室信息检索和自然语言处理组
3. 北京大学计算机网络于分布式系统实验室
4. 清华大学智能技术与系统国家重点实验室信息检索组
5. 山东大学信息检索实验室
6. 上海交通大学APEX数据和知识管理实验室

另外,翻了一下最近的google alerts,下面这几个还可以关注下。

这两个是讨论企业搜索的和网页搜索的,前一个是对后一个的讨论。不过他们讨论的搜索是普遍企业环境的,也包括人力管理,部门管理,结构组织等资源。
Recommind used a technique called Probabilistic Latent Semantic Analysis, which are statistical models that the system builds from your documents.
“It will look at the language and derive meanings, themes and concepts from within content, then relate them to similar concepts in a different batch of documents,” says Carpenter.
Autonomy uses Bayesian statistical models, similar to those used to filter spam, to determine the categories of documents. FAST uses a semantic index that can restrict the scope of a concept to a sentence or paragraph to get a more accurate answer. For example, a document might talk about both orange (the fruit) and orange (the colour) but a paragraph is more likely to be about one or the other. It also extracts ‘entities’ like names, phone numbers, addresses and companies.
Enterprise Search Is Not Web Search — A Revelation 
We explain why you can’t always get the best search results for your business from Google

这篇paper用基于主题地图的本体改善信息检索的性能,有40个参与者参加测试。还没找到可以下载的资源。
Information Organization and Retrieval Using a Topic Maps-Based Ontology : Results of a Task-Based Evaluation

这一篇里面的链接有几篇Google查询扩展专利(08.12.25)相关的论文,并简单描述了概率机器翻译和环境图的方法寻找同义词。
How a Search Engine Might Find Synonyms to Use to Expand Search Queries

Tags: , , ,

Leave a Reply