项目作者: foreverycast

项目描述 :
Search for duplicates
高级语言: Go
项目地址: git://github.com/foreverycast/similarity.git
创建时间: 2020-01-02T12:01:42Z
项目社区:https://github.com/foreverycast/similarity

开源协议:Apache License 2.0

下载


similarity

Search for duplicates

INTRODUCTION

Similarity script is for searching duplicates in master data. The script use Levenshtein distance to find similar names.

INSTALLATION

  • Create csv files for compare (replace the names in conf JSON (“duplicates.csv”) for file to load and (“result.csv”) for result csv)
  • If neccessary, the additional file can be added, important for intergrating data set isAddFileToCheck = True and replace “additional.csv” for file name
  • Go Lang is required
  • The package is still in development
  • Just execute the go run duplicatesearch.go

ABOUT THIS RELEASE

  • Tests are not ready
  • stopwords.csv is not working