Discovery App

Discovery App service to connect and crawler provider

  • Encharge to manager and authenticate in each provider
  • Crawler the data and record into db
  • Consume batch insert data

../../_images/discovery.png

Discovery using Flask, and python >3.5, has api rest, and tasks.

Setup dev env

cd devtool/

docker-compose up -d

Will be setup rabbitmq and redis

Windows Env

If you use windows, celery havent support for windows, the last version is 3.1.25.

pip3 install celery==3.1.25

npm run powershell

Important topics

  • Controller used factory dc abstract to create easy way to make CRUD in mongodb

  • The crawler is divided in:

    • api: connect in api provider and get result
    • translate: normalize the data
    • setup: reset tracker stats (used in datacenters to ensure a sync resource)

    • tracker: add list entry into tracker stats

      • insert: insert/update data in mongodb

      Each step have unique task.

  • Config is managed by env variables, need to be, because in production env like k8s is easier to manager the pods.

  • Repository has pymongo objects.


Flower - Debbug Celery

You can install a flower, it’s a control panel to centralize results throughout rabbitMQ, very useful to troubleshooting producer and consumers.

pip install flower

flower -A app.celery

npm run flower

Installation with python 3

  • Python >3.4
  • RabbitMQ

Download de repository

git clone https://github.com/maestro-server/discovery-api.git

Install dependences

pip install -r requeriments.txt

Install run api

python -m flask run.py

or

FLASK_APP=run.py FLASK_DEBUG=1 flask run

or

npm run server

Install run rabbit workers

celery -A app.celery worker -E -Q discovery --hostname=discovery@%h --loglevel=info

or

npm run celery

Warning

For production environment, use something like gunicorn.

# gunicorn_config.py

import os

bind = "0.0.0.0:" + str(os.environ.get("MAESTRO_PORT", 5000))
workers = os.environ.get("MAESTRO_GWORKERS", 2)

Env variables

Env Variables Example Description
MAESTRO_DATA_URI http://localhost:5010 Data Layer API URL
MAESTRO_SECRETJWT xxxx Same that Server App
MAESTRO_TRANSLATE_QTD 200 Prefetch translation process
MAESTRO_GWORKERS 2 Gunicorn multi process
CELERY_BROKER_URL amqp://rabbitmq:5672 RabbitMQ connection
CELERYD_TASK_TIME_LIMIT 10 Timeout workers