项目作者: mozilla

项目描述 :
提供对数据的高速过滤和聚合
高级语言: JavaScript
项目地址: git://github.com/mozilla/ActiveData.git
创建时间: 2015-02-02T22:19:33Z
项目社区:https://github.com/mozilla/ActiveData

开源协议:Mozilla Public License 2.0

下载


ActiveData

Provide high speed filtering and aggregation over data see ActiveData Wiki Page for project details

Branch Status Coverage
master Build Status
dev Build Status Coverage Status
v1.7 Build Status

Use it now!

ActiveData is a service! You can certainly setup your own service, but it is easier to use Mozilla’s!

  1. curl -XPOST -d "{\"from\":\"unittest\"}" http://activedata.allizom.org/query

Requirements

  • Python2.7 installed
  • Elasticsearch version 6.x

Elasticsearch Configuration

Elasticsearch has a configuration file at config/elasticsearch.yml. You must modify it to handle a high number of scripts

  1. script.painless.regex.enabled: true
  2. script.max_compilations_rate: 10000/1m

We enable compression for faster transfer speeds

  1. http.compression: true

And it is a good idea to give your cluster a unique name so it does not join others on your local network

  1. cluster.name: lahnakoski_dev

then you can run Elasticsearch:

  1. c:\elasticsearch>bin\elasticsearch

Elasticsearch runs off port 9200. Test it is working

  1. curl http://localhost:9200

you should expect something like

  1. {
  2. "status" : 200,
  3. "name" : "dev",
  4. "cluster_name" : "lahnakoski_dev",
  5. "version" : {
  6. "number" : "1.7.5",
  7. "build_hash" : "00f95f4ffca6de89d68b7ccaf80d148f1f70e4d4",
  8. "build_timestamp" : "2016-02-02T09:55:30Z",
  9. "build_snapshot" : false,
  10. "lucene_version" : "4.10.4"
  11. },
  12. "tagline" : "You Know, for Search"
  13. }

Installation

There is no PyPi install. Please clone master branch off of Github:

  1. git clone https://github.com/mozilla/ActiveData.git
  2. git checkout master

and install your requirements:

  1. pip install -r requirements.txt

Configuration

The ActiveData service requires a configuration file that will point to the
default Elasticsearch index. You can find a few sample config files in
resources/config. simple_settings.json is simplest one:

  1. {
  2. "flask":{
  3. "host":"0.0.0.0",
  4. "port":5000,
  5. "debug":false,
  6. "threaded":true,
  7. "processes":1
  8. },
  9. "constants":{
  10. "mo_http.http.default_headers":{"From":"https://wiki.mozilla.org/Auto-tools/Projects/ActiveData"}
  11. },
  12. "elasticsearch":{
  13. "host":"http://localhost",
  14. "port":9200,
  15. "index":"unittest",
  16. "type":"test_result",
  17. "debug":true
  18. }
  19. ...<snip>...
  20. }

The elasticsearch property must be updated to point to a specific cluster,
index and type. It is used as a default, and to find other indexes by name.

Run

Jump to your git project directory, set your PYTHONPATH and run app.py:

  1. cd ~/ActiveData
  2. export PYTHONPATH=.:vendor
  3. python active_data/app.py --settings=resources/config/simple_settings.json

Verify

If you have no records in your Elasticsearch cluster, then you must add some before you can query them.

Make a table in Elasticsearch, with one record:

  1. curl -XPUT "http://localhost:9200/movies/movie/1" -d "{\"name\":\"The Parent Trap\",\"released\":\"29 July` 1998\",\"imdb\":\"http://www.imdb.com/title/tt0120783/\",\"rating\":\"PG\",\"director\":{\"name\":\"Nancy Meyers\",\"dob\":\"December 8, 1949\"}}"

Assuming you used the defaults, you can verify the service is up if you can
access the Query Tool at http://localhost:5000/tools/query.html.
You may use it to send queries to your instance of the service. For example:

  1. {"from":"movies"}

Tests

The Github repo also included the test suite, and you can run it against
your service if you wish. The tests will create indexes on your
cluster which are filled, queried, and destroyed

Linux

  1. cd ~/ActiveData
  2. export PYTHONPATH=.:vendor
  3. python -m unittest discover -v -s tests

Windows

  1. cd ActiveData
  2. SET PYTHONPATH=.:vendor
  3. python -m unittest discover -v -s tests