Azure Search のプレビュー API に追加された ja.microsoft アナライザーとは何者なのか

先日公開された Azure Search の 2015-02-28-Preview API には、More Like This の他にアナライザーの追加が行われています。基本的には Microsoft が開発して Office や Bing で使われているものらしいです。

50 analyzers backed by proprietary Microsoft natural language processing technology used in Office and Bing.

Azure Search Service REST API Version 2015-02-28-Preview

日本語のアナライザーも Lucene と Microsoft の 2 種類が用意されるようになりました。

ja.lucene
- Uses morphological analysis
- Normalizes common katakana spelling variations
- Light stopwords/stoptags removal
- Character width-normalization
- Lemmatization - reduces inflected adjectives and verbs to their base form
ja.microsoft
- Uses morphological analysis

とりあえず試してみようと思ったので、まずは RedDot.Search を前回と同様に改造しました。一応、対応したバージョンはブランチ切って GitHub にプッシュしてあります。

<a href="https://github.com/shibayan/RedDog.Search/tree/2015-02-28-preview">shibayan/RedDog.Search</a>

このフォークした RedDog.Search を使えば、以下のようなコードで新しいアナライザーを指定できます。

var connection = ApiConnection.Create("SEARCH_NAME", "ADMIN_KEY");

var client = new IndexManagementClient(connection);

await client.CreateIndexAsync(new Index("entries")
    .WithStringField("id", x => x.IsKey().IsRetrievable())
    .WithStringField("title", x => x.IsSearchable().IsRetrievable().Analyzer("ja.microsoft"))
    .WithStringField("body", x => x.IsSearchable().IsRetrievable().Analyzer("ja.microsoft")));

プレビューポータルで確認すると、以下のように表示されます。でも設定が出来ないのは残念ですね。

f:id:shiba-yan:20150307135924p:plain