项目作者: mathetake

项目描述 :
gann(go-approximate-nearest-neighbor) is a library for Approximate Nearest Neighbor Search written in Go
高级语言: Go
项目地址: git://github.com/mathetake/gann.git
创建时间: 2018-05-14T01:23:50Z
项目社区:https://github.com/mathetake/gann

开源协议:MIT License

下载


gann

CircleCI
MIT License

portfolio_view

gann (go-approximate-nearest-neighbor) is a library for approximate nearest neighbor search purely written in golang.

The implemented algorithm is truly inspired by Annoy (https://github.com/spotify/annoy).

feature

  1. purely written in Go: no dependencies out of Go world.
  2. easy to tune with a bit of parameters

installation

  1. go get github.com/mathetake/gann

parameters

setup phase parameters

name type description run-time complexity space complexity accuracy
dim int dimension of target vectors the larger, the more expensive the larger, the more expensive N/A
nTree int # of trees the larger, the more expensive the larger, the more expensive the larger, the more accurate
k int maximum # of items in a single leaf the larger, the less expensive N/A the larger, the less accurate

runtime (search phase) parameters

name type description time complexity accuracy
searchNum int # of requested neighbors the larger, the more expensive N/A
bucketScale float64 affects the size of bucket the larger, the more expensive the larger, the more accurate

bucketScale affects the size of bucket which consists of items for exact distance calculation.
The actual size of the bucket is calculated by int(searchNum * bucketScale).

In the search phase, we traverse index trees and continuously put items on reached leaves to the bucket until the bucket becomes full.
Then we calculate the exact distances between a item in the bucket and the query vector to get approximate nearest neighbors.

Therefore, the larger bucketScale, the more computational complexity while the more accurate result to be produced.

example

  1. package main
  2. import (
  3. "fmt"
  4. "math/rand"
  5. "time"
  6. "github.com/mathetake/gann"
  7. "github.com/mathetake/gann/metric"
  8. )
  9. var (
  10. dim = 3
  11. nTrees = 2
  12. k = 10
  13. nItem = 1000
  14. )
  15. func main() {
  16. rawItems := make([][]float64, 0, nItem)
  17. rand.Seed(time.Now().UnixNano())
  18. for i := 0; i < nItem; i++ {
  19. item := make([]float64, 0, dim)
  20. for j := 0; j < dim; j++ {
  21. item = append(item, rand.Float64())
  22. }
  23. rawItems = append(rawItems, item)
  24. }
  25. m, err := metric.NewCosineMetric(dim)
  26. if err != nil {
  27. // err handling
  28. return
  29. }
  30. // create index
  31. idx, err := gann.CreateNewIndex(rawItems, dim, nTrees, k, m)
  32. if err != nil {
  33. // error handling
  34. return
  35. }
  36. // search
  37. var searchNum = 5
  38. var bucketScale float64 = 10
  39. q := []float64{0.1, 0.02, 0.001}
  40. res, err := idx.GetANNbyVector(q, searchNum, bucketScale)
  41. if err != nil {
  42. // error handling
  43. return
  44. }
  45. fmt.Printf("res: %v\n", res)
  46. }

references

License

MIT