Client

Client provides a command line application for synchronizing records from XML files to DocStore.

CESSDA Metadata Aggregator - Client

Build Status Bugs Code Smells Coverage Duplicated Lines (%) Lines of Code Maintainability Rating Quality Gate Status Reliability Rating Security Rating Technical Debt Vulnerabilities

Command line client for synchronizing records to CESSDA Metadata Aggregator DocStore. This program is part of CESSDA Metadata Aggregator.

Source code is hosted at Github https://github.com/cessda/cessda.cdc.aggregator.client.

Features

  • Synchronize a folder of XML files recursively to DocStore.

  • Keep a file-cache to speed-up consecutive syncronization runs.

  • Supports DDI 1.2.2, DDI 2.5, DDI 3.1 and DDI3.3 XML files.

Requirements

  • Python 3.8 or newer.

  • Running CESSDA Metadata Aggregator DocStore instance.

Installation

On Ubuntu 20.04

Get Package

Clone the repository using Git.

git clone https://github.com/cessda/cessda.cdc.aggregator.client

Or fetch a specific release using a tag. For example to get 0.2.0 release.

git clone --branch 0.2.0 https://github.com/cessda/cessda.cdc.aggregator.client

Install Client

It is recommended to install packages inside Python virtual environment to isolate the install. This package also provides a Dockerfile to help setup a containerized environment.

Create the Python virtual environment and activate it.

python3 -m venv cdcagg-env
source cdcagg-env/bin/activate

Install Python packages.

cd cessda.cdc.aggregator.client
pip install -r requirements.txt
pip install .

To upgrade existing install, use --upgrade flag in pip commands. Pip uses only-if-needed upgrade strategy by default since version 10.0.0, but for backwards compatibility the option is also included in the example.

pip install --upgrade -r requirements.txt --upgrade-strategy=only-if-needed
pip install . --upgrade --upgrade-strategy=only-if-needed

Run

Change <docstore-url> to CDC Aggregator DocStore server URL. Change <xml-sources> to a path pointing to a folder containing files to synchronize.

python -m cdcagg_client.sync --document-store-url <docstore-url> --file-cache file_cache.pickle <xml-sources>

Configuration reference

python -m cdcagg_client.sync --help

License

See the LICENSE file.

Changelog

All notable changes to the CDC Aggregator Client will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

0.8.0 - 2024-04-30

Added

  • Map DDI-C /codeBook/stdyDscr/citation/rspStmt/AuthEnty/ExtLink to Study.principal_investigator attributes external_link, external_link_uri, external_link_role and external_link_title.

Changed

  • Update dependencies:

    • Require Aggregator Shared Library 0.7.0 in requirements.txt and setup.py. (Implements #35)

    • Require Kuha Common 2.4.0 in requirements.txt and setup.py. Change URL to point to new source at gitlab.tuni. (Implements #35)

    • Require Kuha Client 1.4.0 in requirements.txt and setup.py. Change URL to point to new source at gitlab.tuni. (Implements #35)

0.7.0 - 2023-11-24

Added

  • Support Study._direct_base_url. (Implements #33)

Changed

  • Update dependencies: Require Aggregator Shared Library 0.6.0 in requirements.txt and setup.py.

0.6.0 - 2023-05-24

Changed

  • Parse DDI-C files with relPubl/citation that do not contain titlStmt/titl child. These files will result in a study without any related publications. (Implements #27)

  • Update dependencies:

    • Require Kuha Common 2.2.0 in requirements.txt and setup.py.

0.5.0 - 2022-11-21

Added

  • Support grant and funding information and related publication identifiers for studies. (Implements #23)

Changed

  • Update dependencies:

    • Require Aggregator Shared Library 0.5.0 in requirements.txt and setup.py.

    • Require Kuha Common 2.0.1 in requirements.txt and setup.py.

    • Require Kuha Client 1.2.1 in requirements.txt and setup.py.

    • Require Tornado 6.2.0 in requirements.txt.

Fixed

  • Read DDI-Codebook XML from /codeBook/stdyDscr/stdyInfo/sumDscr/anlyUnit/concept/@vocabURI and /codeBook/stdyDscr/method/dataColl/collMode/concept/@vocabURI correctly.

Note: After upgrade the file-cache should be removed to make sure all files are re-read and saved to Document Store.

0.4.0 - 2022-06-29

Added

  • Support reading DDI 3.3 for study level metadata. (Implements #22)

Changed

  • Require Aggregator Shared Library 0.4.0 or newer at setup.py.

  • Require Kuha Common 1.1.0 or newer at setup.py.

  • Require Kuha Client 1.1.0 or newer at setup.py.

  • Update dependecies at requirements.txt

    • Aggregator Shared Library 0.4.0

    • Kuha Common 1.1.0

    • Kuha Client 1.1.0

0.3.0 - 2022-05-18

DOI

Changed

  • Generate Study._aggregator_identifier using OAI-PMH provenance info. (Implements #20)

  • Require CDCAGG Common 0.3.0

0.2.1 - 2022-04-05

Changed

  • Use 1.0.0 releases of Kuha Common and Kuha Client dependencies in requirements.txt.

Fixed

  • Updating an existing record must not change the _aggregator_identifier value. (Fixes #19)

0.2.0 - 2021-12-17

DOI

Added

  • Sync entrypoint configuration option --fail-on-parse to make the processing fail out on errors during file parsing.

Changed

  • Default behaviour now is to skip files that cannot be parsed because of a MappingError. Other errors still lead to failures that terminate the processing. The behaviour can be controlled with --fail-on-parse configuration option. (Fixes #11)

  • Update dependencies in requirements.txt

    • ConfigArgParse 1.5.3

    • Kuha Common to Git commit 8e7de1f16530decc356fee660255b60fcacaea23

    • Kuha Client to Git commit 46ba0501e92f6db3475d721344f456627c01f459

    • Aggregator Shared Library 0.2.0

Fixed

  • Correct query for record with match in provenance information. This query is used to find duplicate records. Now a record is considered a duplicate if it has a matching item (baseUrl + identifier -combination) in list of provenances. New record always overwrites the old one. (Fixes #12)

0.1.0 - 2021-09-21

Added

  • New codebase for CDC Aggregator Client.

  • Support synchronizing records to CDC Aggregator DocStore.