DocStore
DocStore is a database backend API facilitating access to MongoDB.
CESSDA Metadata Aggregator - Document Store
HTTP server providing an API in front of a MongoDB cluster. This program is part of CESSDA Metadata Aggregator.
Source code is hosted at Github https://github.com/cessda/cessda.cdc.aggregator.doc-store
Features
DocStore provides CRUD access to the records via REST API. It also features a flexible Query API for filtering records.
DocStore feature list:
REST API for full control of records.
Query API for flexible filtering of records.
Logical deletions.
Streaming responses.
Support for MongoDB replicas.
Helper script to ease initial database setup.
Requirements
Python 3.8 or newer.
MongoDB 3.6 or newer.
Installation
On Ubuntu 20.04
Database setup
This is an example setup with a single virtual machine containing three mongodb replicas.
Install MongoDB
sudo apt install mongodb
Create directories for replica data
sudo mkdir /var/lib/mongodb/{r1,r2,r3} sudo chown mongodb:mongodb /var/lib/mongodb/{r1,r2,r3} sudo chmod 0755 /var/lib/mongodb/{r1,r2,r3}
Configure single mongodb instance to use r1 replica data directory.
sudo sed -i 's#dbpath=/var/lib/mongodb#dbpath=/var/lib/mongodb/r1#' /etc/mongodb.conf
Restart mongodb.service
sudo systemctl restart mongodb.service
Create rootadmin user using the mongo shell
mongo use admin db.createUser({user: 'rootadmin', pwd: 'password', roles: [{role: 'root', db: 'admin'}]}) exit
Stop & disable mongodb.service
sudo systemctl stop mongodb.service sudo systemctl disable mongodb.service
Create directory for mongodb replica configuration
sudo mkdir /etc/mongodb sudo chmod 0755 /etc/mongodb
Generate keyfile for replica authentication
sudo openssl rand -base64 756 | sudo tee /var/lib/mongodb/auth_key sudo chown mongodb:mongodb /var/lib/mongodb/auth_key sudo chmod 0600 /var/lib/mongodb/auth_key
Configure replicas. Example for /etc/mongodb/r1.conf.
storage: dbPath: /var/lib/mongodb/r1 journal: enabled: true systemLog: destination: file logAppend: true path: /var/lib/mongodb/r1.log net: port: 27017 bindIp: 0.0.0.0 processManagement: timeZoneInfo: /usr/share/zoneinfo security: authorization: enabled keyFile: /var/lib/mongodb/auth_key replication: replSetName: rs_cdcagg
Ensure permissions
sudo chmod 0644 /etc/mongodb/{r1,r2,r3}.conf
Create systemd units for replicas. Example for /etc/systemd/system/mongod_r1.service.
[Unit] Description=MongoDB Database Server Documentation=https://docs.mongodb.org/manual After=network.target [Service] Type=simple User=mongodb Group=mongodb ExecStart=/usr/bin/mongod --config /etc/mongodb/r1.conf Restart=always PIDFile=/var/run/mongodb/mongod_r1.pid # file size LimitFSIZE=infinity # cpu time LimitCPU=infinity # virtual memory size LimitAS=infinity # open files LimitNOFILE=64000 # processes/threads LimitNPROC=64000 # locked memory LimitMEMLOCK=infinity # total threads (user+kernel) TasksMax=infinity TasksAccounting=false # Recommended limits for mongod as specified in # http://docs.mongodb.org/manual/reference/ulimit/#recommended-settings [Install] WantedBy=multi-user.target
Ensure permissions
sudo chmod 0644 /etc/systemd/system/mongod_r{1,2,3}.service
Enable replica services
sudo systemctl enable mongod_r1.service sudo systemctl enable mongod_r2.service sudo systemctl enable mongod_r3.service
Reload systemd manager configuration
sudo systemctl daemon-reload
Start services
sudo systemctl start mongod_r1.service sudo systemctl start mongod_r2.service sudo systemctl start mongod_r3.service
Get Package
Clone the repository using Git.
git clone https://github.com/cessda/cessda.cdc.aggregator.doc-store.git
Or fetch a specific release using a tag. For example to get 0.2.0 release.
git clone --branch 0.2.0 https://github.com/cessda/cessda.cdc.aggregator.doc-store.git
Install DocStore
It is recommended to install packages inside Python virtual environment to isolate the install. This package also provides a Dockerfile to help setup a containerized environment.
Create the Python virtual environment and activate it.
python3 -m venv cdcagg-env
source cdcagg-env/bin/activate
Install Python packages.
cd cessda.cdc.aggregator.doc-store
pip install -r requirements.txt
pip install .
To upgrade existing install, use --upgrade
flag in pip commands. Pip
uses only-if-needed
upgrade strategy by default since version
10.0.0, but for backwards compatibility the option is also included in
the example.
pip install --upgrade -r requirements.txt --upgrade-strategy=only-if-needed
pip install . --upgrade --upgrade-strategy=only-if-needed
Run application database setup
Change <ip>
to mongodb vm ip.
python -m cdcagg_docstore.db_admin --replica "<ip>:27017" --replica "<ip>:27018" --replica "<ip>:27019" initiate_replicaset setup_database setup_collections setup_users
Database setup configuration reference
python -m cdcagg_docstore.db_admin --help
Running the server
Change <ip>
to mongodb ip.
python -m cdcagg_docstore --replica "<ip>:27017" --replica "<ip>:27018" --replica "<ip>:27019"
Server configuration reference
python -m cdcagg_docstore --help
License
See the LICENSE file.
Changelog
All notable changes to the CDC Aggregator DocStore will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
0.5.0 - 2024-04-30
Added
Support
external_link
,external_link_role
,external_link_uri
andexternal_link_title
attributes in Study.principal_investigator.
Changed
0.4.0 - 2023-11-24
Added
Support
study._direct_base_url
(Implements #27)
0.3.0 - 2022-11-21
Added
Support grant & funding information and identifiers for related publications in
studies
collection. (Implements #20)
Changed
Update dependencies:
Require CDC Aggregator Shared Library 0.5.0 in setup.py and requirements.txt.
Require Kuha Common 2.0.0 or newer in setup.py and 2.0.1 in requirements.txt.
Require Kuha Document Store 1.1.0 in setup.py and requirements.txt.
Require tornado 6.2.0 in requirements.txt.
0.2.0 - 2021-12-17
Changed
Implement
CDCAggDatabase._prepare_validation_schema()
, which returns the validation schema for Study record. (Implements #14)Require latest commit of Kuha Document Store master (Implements #14)
Update dependencies in requirements.txt.
ConfigArgParse 1.5.3
python-dateutil 2.8.2
Motor 2.5.1
PyMongo 3.12.0
Cerberus 1.3.4
Kuha Common to Git commit 8e7de1f16530decc356fee660255b60fcacaea23
Kuha Document Store to Git commit 31b277685fd7568032d037db4334cb15da2a28da
CDC Aggregator Shared Library 0.2.0
Added
Validation and indexing of Study record’s
_aggregator_identifier
field to MongoDB. (Fixes #13)
0.1.0 - 2021-09-21
Added
New codebase for CDC Aggregator DocStore.
HTTP API in front of a MongoDB cluster.
RESTful endpoint ‘/v0/studies/<resource_id>’ with support for GET, POST, PUT and DELETE.
Query endpoint ‘/v0/query/studies’ for SELECT, COUNT and DISTINCT types of DB queries.
Admin module to ease DB setup.