Skip to content

Releases: Beit-Hatfutsot/mojp-dbs-pipelines

0.2.0

11 Jan 16:23
Compare
Choose a tag to compare
  • integrate with mojp-k8s
  • revive the clearmash pipelines - focus on getting a one time dump of the data
  • fix bug with get entitiy ids which prevented getting family trees data

0.1.4

08 Aug 12:32
Compare
Choose a tag to compare
  • Add a new pipeline for the external source 'Bagnowka':
    • Download, Convert and Sync (#71, #68)
  • added docker monitoring stack (grafana / cadvisor / influx) - http://devapi.dbs.bh.org.il:3180/dashboard/db/cadvisor
  • many fixes to deal with memory problems and other bugs in the clearmash pipeline
  • clearmash: limit downloading of related items only to items with collection + fewer then 50 relateds
  • add main image url for photoUnits (part of #24)
  • #78 improved handling and deletion of not allowed items
  • #76 added support for images attached to each doc
    • images field is synced to elasticsearch as part of the common schema
    • it contains an array of images, like this: [{"url": "http://url.to.image", "thumbnail_url": "http://url.to.thumbnail.image"}, ... ]
    • Modify Bagnowka pipeline (download and convert process) to populate the image and thumbnail urls.
  • added support for detecting collection by template_id (#72)

DB / schema changes

  • added column display_allowed to clearmash-entities table
  • added images attribute in clearmash (index mapping should be updated in Beit-Hatfutsot/dbs-back#215 before deploying)

0.1.3

26 Jul 13:35
Compare
Choose a tag to compare
  • #14 clearmash api and sync should support start date for images
  • #45 / #58 / #59 - clearmash api should support field types: related documents, child documents and media galleries
  • #49 added support for related_documents field in the common data source and in clearmash
    • field name in Elasticsearch is related_documents_* where * is source specific category / type of related document
    • for clearmash it is the field id, e.g.: related_documents__c6_beit_hatfutsot_bh_base_template_related_place
    • value could be null, string of single doc, or array of docs
    • each doc is represented by the main ES doc id (e.g. clearmash_261783)
  • #64 refactor to pipelines structure, might solve the photoUnits problem
    • see the PR for more details: #64
  • add postgresql DB + elasticsearch services + misc. docker improvements
  • #61 modify clearmash processors to allow efficient download of related docs
  • disabled the automatic deploy on push to master because we have long running pipelines which are interrupted on deploy

0.1.2

12 Jul 14:30
Compare
Choose a tag to compare
  • #25 Update api to get related docs of an item by field
  • #29 change clearmash pipeline to have a separate pipeline for each folder / collection
  • #19 add slugs for all items (#31)
  • #33 add slugs attribute for ES with all slugs from all languages
  • #34 add title_suggest field ensured to always have a value with min length of 1 (if no title, will use value of _)
  • #17 only sync items allowed to be shown
    • clearmash: only processes items that have right permissions (based on the old BHP logic, adjusted for CM)
    • common docs: only sync items that have content in either he or en
  • #41 ensure all processors provide details about which document failed when raising exceptions
  • added logging of all clearmash api calls in the pipelines dashboard log
  • fix failure of pipeline due to items missing many fields (#39)
  • allow to run processor only for specific items for debugging (using CLEARMASH_OVERRIDE_ITEM_IDS environment variable)
  • #15 delete items which weren't synced
    • delete processor runs after sync, it aggregates all ids which were synced, then compares with all ids in ES and deletes any items which were not in the sync
    • this assumes we download all the data on every sync run (which we do for clearmash)
  • #35 ensure slug uniqueness (if slug conflict is found, slug is prepended with the item id)
  • sync to ES should save dicts as json

deployment

  • first, deploy dbs-back 0.13.5 - and create the new index
  • then, deploy normally (merge)
  • check travis to ensure it deployed to dev

0.1.1

03 Jul 15:06
Compare
Choose a tag to compare
  • Add lower-case titles for all available languages.
  • travis: deploy to dev on push to master

deployment (to dev)

gcloud compute ssh bhs-dev-3
rm mojp-dbs-pipelines
git clone https://github.com/Beit-Hatfutsot/mojp-dbs-pipelines
cd mojp-dbs-pipelines
cp ../mojp-dbs-pipelines-0.0.2/docker-compose.override.yml ./
make docker-build
make docker-start
ssh-keygen -t rsa -b 4096 -C "deploy-mojp-dbs-pipelines" -f /home/bhs/deploy-mojp-dbs-pipelines.id_rsa
echo "command="/home/bhs/deploy-mojp-dbs-pipelines.sh $SSH_ORIGINAL_COMMAND" `cat /home/bhs/deploy-mojp-dbs-pipelines.id_rsa.pub`" >> ~/.ssh/authorized_keys
echo '#!/usr/bin/env bash
cd /home/bhs/mojp-dbs-pipelines
make deploy' > /home/bhs/deploy-mojp-dbs-pipelines.sh
chmod +x /home/bhs/deploy-mojp-dbs-pipelines.sh

v0.1.0

14 Jun 14:19
Compare
Choose a tag to compare

deployment to dev environment

gcloud compute ssh bhs-dev-3
sudo su -l bhs
wget https://github.com/Beit-Hatfutsot/mojp-dbs-pipelines/archive/v0.0.2.tar.gz -O mojp-dbs-pipelines-v0.0.2.tar.gz
tar -xzvf mojp-dbs-pipelines-v0.0.2.tar.gz
ln -s mojp-dbs-pipelines-0.0.2/ mojp-dbs-pipelines
sudo apt-get remove docker docker-engine docker-compose docker.io
sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable"
sudo apt-get update
sudo apt-get install docker-ce
curl -L https://github.com/docker/compose/releases/download/1.13.0/docker-compose-`uname -s`-`uname -m` | sudo tee /usr/local/bin/docker-compose > /dev/null
sudo chmod +x /usr/local/bin/docker-compose
sudo usermod -aG docker $USER
sudo su -l $USER
docker ps
cd ~/mojp-dbs-pipelines
make docker-build
cp docker-compose.override.yml{.example,}
nano docker-compose.override.yml

I had to set IP of the elasticsearch server, as there was a problem with using the internal DNS name, currently it writes to bhs-dev-db and index next-mojp-dev

v0.0.2

14 Jun 13:43
Compare
Choose a tag to compare
  • #10 finalize clearmash (not including persons) + docker for datapackage-pipelines framework

v0.0.1

01 May 09:50
Compare
Choose a tag to compare
  • adding more pipelines and tests, refactoring
    • support for jinja2 pipeline fixtures
    • MOJP_PIPELINES_ENV var which allows to switch/force .env files
    • use the above env var to ensure same environment is used for tests