DocStore

DocStore is a database backend API facilitating access to MongoDB.

CESSDA Metadata Aggregator - Document Store

Build Status Bugs Code Smells Coverage Duplicated Lines (%) Lines of Code Maintainability Rating Quality Gate Status Reliability Rating Security Rating Technical Debt Vulnerabilities

HTTP server providing an API in front of a MongoDB cluster. This program is part of CESSDA Metadata Aggregator.

Source code is hosted at Github https://github.com/cessda/cessda.cdc.aggregator.doc-store

Features

DocStore provides CRUD access to the records via REST API. It also features a flexible Query API for filtering records.

DocStore feature list:

  • REST API for full control of records.

  • Query API for flexible filtering of records.

  • Logical deletions.

  • Streaming responses.

  • Support for MongoDB replicas.

  • Helper script to ease initial database setup.

Requirements

  • Python 3.8 or newer.

  • MongoDB 3.6 or newer.

Installation

On Ubuntu 20.04

Database setup

This is an example setup with a single virtual machine containing three mongodb replicas.

  1. Install MongoDB

    sudo apt install mongodb
    
  2. Create directories for replica data

    sudo mkdir /var/lib/mongodb/{r1,r2,r3}
    sudo chown mongodb:mongodb /var/lib/mongodb/{r1,r2,r3}
    sudo chmod 0755 /var/lib/mongodb/{r1,r2,r3}
    
  3. Configure single mongodb instance to use r1 replica data directory.

    sudo sed -i 's#dbpath=/var/lib/mongodb#dbpath=/var/lib/mongodb/r1#' /etc/mongodb.conf
    
  4. Restart mongodb.service

    sudo systemctl restart mongodb.service
    
  5. Create rootadmin user using the mongo shell

    mongo
    use admin
    db.createUser({user: 'rootadmin', pwd: 'password', roles: [{role: 'root', db: 'admin'}]})
    exit
    
  6. Stop & disable mongodb.service

    sudo systemctl stop mongodb.service
    sudo systemctl disable mongodb.service
    
  7. Create directory for mongodb replica configuration

    sudo mkdir /etc/mongodb
    sudo chmod 0755 /etc/mongodb
    
  8. Generate keyfile for replica authentication

    sudo openssl rand -base64 756 | sudo tee /var/lib/mongodb/auth_key
    sudo chown mongodb:mongodb /var/lib/mongodb/auth_key
    sudo chmod 0600 /var/lib/mongodb/auth_key
    
  9. Configure replicas. Example for /etc/mongodb/r1.conf.

    storage:
      dbPath: /var/lib/mongodb/r1
      journal:
        enabled: true
    
    systemLog:
      destination: file
      logAppend: true
      path: /var/lib/mongodb/r1.log
    
    net:
      port: 27017
      bindIp: 0.0.0.0
    
    processManagement:
      timeZoneInfo: /usr/share/zoneinfo
    
    security:
      authorization: enabled
      keyFile: /var/lib/mongodb/auth_key
    
    replication:
      replSetName: rs_cdcagg
    
  10. Ensure permissions

    sudo chmod 0644 /etc/mongodb/{r1,r2,r3}.conf
    
  11. Create systemd units for replicas. Example for /etc/systemd/system/mongod_r1.service.

    [Unit]
    Description=MongoDB Database Server
    Documentation=https://docs.mongodb.org/manual
    After=network.target
    
    [Service]
    Type=simple
    User=mongodb
    Group=mongodb
    ExecStart=/usr/bin/mongod --config /etc/mongodb/r1.conf
    Restart=always
    PIDFile=/var/run/mongodb/mongod_r1.pid
    # file size
    LimitFSIZE=infinity
    # cpu time
    LimitCPU=infinity
    # virtual memory size
    LimitAS=infinity
    # open files
    LimitNOFILE=64000
    # processes/threads
    LimitNPROC=64000
    # locked memory
    LimitMEMLOCK=infinity
    # total threads (user+kernel)
    TasksMax=infinity
    TasksAccounting=false
    # Recommended limits for mongod as specified in
    # http://docs.mongodb.org/manual/reference/ulimit/#recommended-settings
    
    [Install]
    WantedBy=multi-user.target
    
  12. Ensure permissions

    sudo chmod 0644 /etc/systemd/system/mongod_r{1,2,3}.service
    
  13. Enable replica services

    sudo systemctl enable mongod_r1.service
    sudo systemctl enable mongod_r2.service
    sudo systemctl enable mongod_r3.service
    
  14. Reload systemd manager configuration

    sudo systemctl daemon-reload
    
  15. Start services

    sudo systemctl start mongod_r1.service
    sudo systemctl start mongod_r2.service
    sudo systemctl start mongod_r3.service
    

Get Package

Clone the repository using Git.

git clone https://github.com/cessda/cessda.cdc.aggregator.doc-store.git

Or fetch a specific release using a tag. For example to get 0.2.0 release.

git clone --branch 0.2.0 https://github.com/cessda/cessda.cdc.aggregator.doc-store.git

Install DocStore

It is recommended to install packages inside Python virtual environment to isolate the install. This package also provides a Dockerfile to help setup a containerized environment.

Create the Python virtual environment and activate it.

python3 -m venv cdcagg-env
source cdcagg-env/bin/activate

Install Python packages.

cd cessda.cdc.aggregator.doc-store
pip install -r requirements.txt
pip install .

To upgrade existing install, use --upgrade flag in pip commands. Pip uses only-if-needed upgrade strategy by default since version 10.0.0, but for backwards compatibility the option is also included in the example.

pip install --upgrade -r requirements.txt --upgrade-strategy=only-if-needed
pip install . --upgrade --upgrade-strategy=only-if-needed

Run application database setup

Change <ip> to mongodb vm ip.

python -m cdcagg_docstore.db_admin --replica "<ip>:27017"  --replica "<ip>:27018" --replica "<ip>:27019" initiate_replicaset setup_database setup_collections setup_users

Database setup configuration reference

python -m cdcagg_docstore.db_admin --help

Running the server

Change <ip> to mongodb ip.

python -m cdcagg_docstore --replica "<ip>:27017"  --replica "<ip>:27018" --replica "<ip>:27019"

Server configuration reference

python -m cdcagg_docstore --help

License

See the LICENSE file.

Changelog

All notable changes to the CDC Aggregator DocStore will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

0.5.0 - 2024-04-30

Added

  • Support external_link, external_link_role, external_link_uri and external_link_title attributes in Study.principal_investigator.

Changed

  • Update dependencies:

    • Require CDC Aggregator Shared Library 0.7.0 in setup.py and requirements.txt. (Implements #31)

    • Require Kuha Common 2.4.0 or newer in setup.py and 2.4.0 in requirements.txt. (Implements #31)

    • Require Kuha Document Store 1.3.0 in setup.py and requirements.txt. (Implements #31)

0.4.0 - 2023-11-24

Added

  • Support study._direct_base_url (Implements #27)

0.3.0 - 2022-11-21

Added

  • Support grant & funding information and identifiers for related publications in studies collection. (Implements #20)

Changed

  • Update dependencies:

    • Require CDC Aggregator Shared Library 0.5.0 in setup.py and requirements.txt.

    • Require Kuha Common 2.0.0 or newer in setup.py and 2.0.1 in requirements.txt.

    • Require Kuha Document Store 1.1.0 in setup.py and requirements.txt.

    • Require tornado 6.2.0 in requirements.txt.

0.2.0 - 2021-12-17

DOI

Changed

  • Implement CDCAggDatabase._prepare_validation_schema(), which returns the validation schema for Study record. (Implements #14)

  • Require latest commit of Kuha Document Store master (Implements #14)

  • Update dependencies in requirements.txt.

    • ConfigArgParse 1.5.3

    • python-dateutil 2.8.2

    • Motor 2.5.1

    • PyMongo 3.12.0

    • Cerberus 1.3.4

    • Kuha Common to Git commit 8e7de1f16530decc356fee660255b60fcacaea23

    • Kuha Document Store to Git commit 31b277685fd7568032d037db4334cb15da2a28da

    • CDC Aggregator Shared Library 0.2.0

Added

  • Validation and indexing of Study record’s _aggregator_identifier field to MongoDB. (Fixes #13)

0.1.0 - 2021-09-21

Added

  • New codebase for CDC Aggregator DocStore.

  • HTTP API in front of a MongoDB cluster.

  • RESTful endpoint ‘/v0/studies/<resource_id>’ with support for GET, POST, PUT and DELETE.

  • Query endpoint ‘/v0/query/studies’ for SELECT, COUNT and DISTINCT types of DB queries.

  • Admin module to ease DB setup.