Client
Client provides a command line application for synchronizing records from XML files to DocStore.
CESSDA Metadata Aggregator - Client
Command line client for synchronizing records to CESSDA Metadata Aggregator DocStore. This program is part of CESSDA Metadata Aggregator.
Source code is hosted at Github https://github.com/cessda/cessda.cdc.aggregator.client.
Features
Synchronize a folder of XML files recursively to DocStore.
Keep a file-cache to speed-up consecutive syncronization runs.
Supports DDI 1.2.2, DDI 2.5, DDI 3.1 and DDI3.3 XML files.
Requirements
Python 3.8 or newer.
Running CESSDA Metadata Aggregator DocStore instance.
Installation
On Ubuntu 20.04
Get Package
Clone the repository using Git.
git clone https://github.com/cessda/cessda.cdc.aggregator.client
Or fetch a specific release using a tag. For example to get 0.2.0 release.
git clone --branch 0.2.0 https://github.com/cessda/cessda.cdc.aggregator.client
Install Client
It is recommended to install packages inside Python virtual environment to isolate the install. This package also provides a Dockerfile to help setup a containerized environment.
Create the Python virtual environment and activate it.
python3 -m venv cdcagg-env
source cdcagg-env/bin/activate
Install Python packages.
cd cessda.cdc.aggregator.client
pip install -r requirements.txt
pip install .
To upgrade existing install, use --upgrade
flag in pip commands. Pip
uses only-if-needed
upgrade strategy by default since version
10.0.0, but for backwards compatibility the option is also included in
the example.
pip install --upgrade -r requirements.txt --upgrade-strategy=only-if-needed
pip install . --upgrade --upgrade-strategy=only-if-needed
Run
Change <docstore-url>
to CDC Aggregator DocStore server URL. Change
<xml-sources>
to a path pointing to a folder containing files to
synchronize.
python -m cdcagg_client.sync --document-store-url <docstore-url> --file-cache file_cache.pickle <xml-sources>
Configuration reference
python -m cdcagg_client.sync --help
License
See the LICENSE file.
Changelog
All notable changes to the CDC Aggregator Client will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
0.8.0 - 2024-04-30
Added
Map DDI-C
/codeBook/stdyDscr/citation/rspStmt/AuthEnty/ExtLink
to Study.principal_investigator attributesexternal_link
,external_link_uri
,external_link_role
andexternal_link_title
.
Changed
Update dependencies:
Require Aggregator Shared Library 0.7.0 in requirements.txt and setup.py. (Implements #35)
Require Kuha Common 2.4.0 in requirements.txt and setup.py. Change URL to point to new source at gitlab.tuni. (Implements #35)
Require Kuha Client 1.4.0 in requirements.txt and setup.py. Change URL to point to new source at gitlab.tuni. (Implements #35)
0.7.0 - 2023-11-24
Added
Support
Study._direct_base_url
. (Implements #33)
Changed
Update dependencies: Require Aggregator Shared Library 0.6.0 in requirements.txt and setup.py.
0.6.0 - 2023-05-24
Changed
Parse DDI-C files with relPubl/citation that do not contain titlStmt/titl child. These files will result in a study without any related publications. (Implements #27)
Update dependencies:
Require Kuha Common 2.2.0 in requirements.txt and setup.py.
0.5.0 - 2022-11-21
Added
Support grant and funding information and related publication identifiers for studies. (Implements #23)
Changed
Update dependencies:
Require Aggregator Shared Library 0.5.0 in requirements.txt and setup.py.
Require Kuha Common 2.0.1 in requirements.txt and setup.py.
Require Kuha Client 1.2.1 in requirements.txt and setup.py.
Require Tornado 6.2.0 in requirements.txt.
Fixed
Read DDI-Codebook XML from
/codeBook/stdyDscr/stdyInfo/sumDscr/anlyUnit/concept/@vocabURI
and/codeBook/stdyDscr/method/dataColl/collMode/concept/@vocabURI
correctly.
Note: After upgrade the file-cache should be removed to make sure all files are re-read and saved to Document Store.
0.4.0 - 2022-06-29
Added
Support reading DDI 3.3 for study level metadata. (Implements #22)
Changed
Require Aggregator Shared Library 0.4.0 or newer at setup.py.
Require Kuha Common 1.1.0 or newer at setup.py.
Require Kuha Client 1.1.0 or newer at setup.py.
Update dependecies at requirements.txt
Aggregator Shared Library 0.4.0
Kuha Common 1.1.0
Kuha Client 1.1.0
0.3.0 - 2022-05-18
Changed
Generate
Study._aggregator_identifier
using OAI-PMH provenance info. (Implements #20)Require CDCAGG Common 0.3.0
0.2.1 - 2022-04-05
Changed
Use 1.0.0 releases of Kuha Common and Kuha Client dependencies in requirements.txt.
Fixed
Updating an existing record must not change the
_aggregator_identifier
value. (Fixes #19)
0.2.0 - 2021-12-17
Added
Sync entrypoint configuration option
--fail-on-parse
to make the processing fail out on errors during file parsing.
Changed
Default behaviour now is to skip files that cannot be parsed because of a MappingError. Other errors still lead to failures that terminate the processing. The behaviour can be controlled with
--fail-on-parse
configuration option. (Fixes #11)Update dependencies in requirements.txt
ConfigArgParse 1.5.3
Kuha Common to Git commit 8e7de1f16530decc356fee660255b60fcacaea23
Kuha Client to Git commit 46ba0501e92f6db3475d721344f456627c01f459
Aggregator Shared Library 0.2.0
Fixed
Correct query for record with match in provenance information. This query is used to find duplicate records. Now a record is considered a duplicate if it has a matching item (baseUrl + identifier -combination) in list of provenances. New record always overwrites the old one. (Fixes #12)
0.1.0 - 2021-09-21
Added
New codebase for CDC Aggregator Client.
Support synchronizing records to CDC Aggregator DocStore.