Library

Browse and search developer information

CKAN

By Health & Social Care Information Centre | 24 April 2013

Introduction

A lot of very useful NHS data is published by data.gov.uk. To find it, one uses their installation of CKAN. In particular, its REST API. So far, so good. However, the CKAN is just a little bit too difficult to use from the command line or java.

So, as with other things, we wrote some open source clients to make it a lot easier. There are 6 clients:

  • hdn-ckan-list
  • hdn-ckan-details
  • hdn-ckan-relationships
  • hdn-ckan-query
  • hdn-ckan-dataset-search
  • hdn-ckan-resource-search

The split is to cater for differences in using and manipulating the CKAN API.

The clients are such that they can be used either as a standalone program (in which case, it produces results in TSV format for ease of use with standard POSIX tools) or as six java libraries (jar files). To make things as easy as possible, we’ve also written some wrappers for POSIX and Debian/Ubuntu. Your choices, in order of decreasing convenience, are:

  • Debian/Ubuntu deb packages, hosted in our apt repository at http://services.developer-test.nhs.uk/repositories/apt/hdn/
  • A tar ball, which contains a complete file system to untar over your root /. These should work on any POSIX system, including Mac OS X and Cygwin.
  • Six standalone java jars, with all dependencies included, suitable for execution or as a library
  • A set of java code libraries with source
  • Forking from github

The best way to get going is to use the command line. We’ll look later on how to create requests programmatically using the java library.

Using the Clients on the Command Line

The way you do this varies depending on what you used above:

  • If you’ve installed the deb package or the tar ball, you’ll have the programs on your PATH. To use one of them, open a terminal console and type its name, eg hdn-ckan-list. They take standard POSIX options. The programs are:
    • hdn-ckan-list
    • hdn-ckan-details
    • hdn-ckan-relationships
    • hdn-ckan-query
    • hdn-ckan-dataset-search, and
    • hdn-ckan-resource-search
  • If you’ve installed the standalone jar files, you’ll need to run commands from the from the folder you downloaded the files to. Open a terminal console and change folder to the folder it is in. For example, for hdn-ckan-list, type java -jar hdn-ckan-list.jar. Each jar takes the same standard POSIX options as the program above. For the rest of this document, wherever you see hdn-ckan-XXXX … you can substitute java -jar hdn-ckan-XXXX.jar …
  • If you’ve downloaded or forked source from github, you can use IntelliJ to run the main classes. Open source\subprojects.ipr and run the main classes in the group ckan, sub-group client. There are already some sample configurations set up for you to debug in IntelliJ. If you don’t have or use IntelliJ (and really should) then you can open in Eclipse or NetBeans. You’ll need to add the libraries ‘annotations’ (library/annotations/VERSION/annotations.jar) and ‘jopt-simple’ (library/jopt-simple/VERSION/jopt-simple-VERSION.jar) to the class path.

Using hdn-ckan-list

Checking everything’s OK

Before we get going, let’s check that everything works as expected. Run the command hdn-ckan-list –help (remember to substitute java -jar hdn-ckan-list.jar if you need to). You should see a list of supported options. At the time of writing, it looks like this:

Option           Description              
------           -----------              
--dataset-ids    dataset-ids              
--dataset-names  dataset-names            
--group-ids      group-ids                
--group-names    group-names              
--help           Displays help for options
--licences       licences                 
--tag-counts     tag-counts               
--tags           tags                     
--version        Displays version

If the output seems a bit compressed, it’s because we’re formatting for a 40 character wide screen – useful if you’re running this over ssh on Android. Whilst you can’t see it above, help output always produces an exit code of 2.

Since the options are regular POSIX long options (and are named similarly to those in the GNU coding standards), we can abbreviate them. Hence hdn-ckan-list -h and hdn-ckan-list –he will produce the same output. The only time you can’t do this is if the abbreviation would be ambiguous.

Let’s try out one of those options: –version.

Checking the version installed

Let’s run hdn-ckan-list –version:

hdn-ckan-list 2013.03.06.1127-development
© Crown Copyright 2013

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Written by Raphael Cohn (raphael.cohn@stormmq.com)

Standard GNU-like stuff. It’s worth understanding the version number, in this case, 2013.03.01.1537-development. The part before the hyphen is the timestamp of the last git check in used to build the binary – you should be able to find it using git log. Additionally, this should match the version of the deb package. The part after is the git branch the code was built from. Usually this will be either development or master.

If instead it says unknown version then it means you’re using code you’ve compiled yourself or wasn’t released ‘officially’.

Getting a List of all the Known Identifiers on data.gov.uk

Apart from the options –help and –version, you need to specify one, and one only, of the options above. Each option returns a list of identifiers, with a header row.

This really couldn’t be easier:

hdn-ckan-list --dataset-names

Retrieves a list of dataset names, and displays it on standard out (stdout):

datasetName
warwickshire-public-weighbridges
disclosure-ministerial-external-meetings-defra
sw-sha-financial-tansactions
staff-organograms-and-pay-joint-nature-conservation-committee
financial-transactions-over-25k-from-nottingham-city-pct-april-2012
student_loans_for_higher_education_in_england
special-waste-arisings-19867-to-2003
addressbase-premium
financial-transactions-data-bis-bbsrc-sept-2011

...

The exit code is 0. Effectively, the results are a single column of tab separated value (TSV) data, with a header row. The data in data.gov.uk isn’t particularly clean, and occasionally you’ll find data with embedded CR (Carriage Return), LF (Line Feed) and HT (Horizontal Tab) control codes. When these are encountered, the data returned uses the Unicode replacement character U+FFFD, as these control codes are invalid in TSV. You’ll know when this happens, as you’ll see a black diamond with a question mark in it on most modern terminals.

By the way, the sort order of the data isn’t known, and the order of the results is unlikely to be stable. If you don’t want the header row you can remove it using tail on POSIX systems:

hdn-ckan-list --dataset-names | tail -n +2

Which gives:

warwickshire-public-weighbridges
disclosure-ministerial-external-meetings-defra
sw-sha-financial-tansactions
staff-organograms-and-pay-joint-nature-conservation-committee
financial-transactions-over-25k-from-nottingham-city-pct-april-2012
student_loans_for_higher_education_in_england
special-waste-arisings-19867-to-2003
addressbase-premium
financial-transactions-data-bis-bbsrc-sept-2011

...

There are far more lists you can retrieve:

  • dataset-ids
  • group-ids
  • group-names
  • tags
  • tag-counts
  • licences

All of these return single columns, except for licences and tag-counts. Let’s take a look at each of them.

Dataset Ids

hdn-ckan-list --dataset-ids

Retrieves a list of dataset ids, and displays it on standard out (stdout):

datasetId
00055483-dd79-4ada-b4be-eb54eeaec19b
0011142e-93ba-4fc1-b27a-fc708a0aa84b
0023048e-6173-419f-9828-50965ea76d78
00285196-57e4-4048-9266-d3afde801c30
0033ca90-c6b4-400b-9f94-20b4fbadc0d4

...

Dataset ids are considered more useful by CKAN (!), as a dataset name can change. An id, being a UUID, does not.

Group Ids

hdn-ckan-list --group-ids

Retrieves a list of group ids, and displays it on standard out (stdout):

groupId
f253ec11-900c-45da-86e0-4dd10f5f6b37
1cb62be8-50cd-4d87-869d-26dafeb5f649
27c0e1d8-d95e-4ead-8690-395228ec57d9
6c6263fc-bbe7-43cf-9907-cb5aad52e872
609c5a75-4967-40c4-959f-620d029cd390

...

Group Names

hdn-ckan-list --group-names

Retrieves a list of group names, and displays it on standard out (stdout):

groupName
2gether-nhs-foundation-trust
aberdeen-city-council
advantage-west-midlands
advisory-conciliation-and-arbitration-service
agri-food-and-biosciences-institute

...

Tags

hdn-ckan-list --tags

Retrieves a list of tags, and displays it on standard out (stdout):

tag
'Other' meteorological measurements
-10000
-asylum-seekers
-bad-weather
-bank-holidays

...

Tag Counts

hdn-ckan-list --tag-counts

Retrieves a list of tags and their counts and display in on standard out (stdout):

name    count
'Other' meteorological measurements 1
-10000  1
-asylum-seekers 1
-bad-weather    1
-bank-holidays  1

...

Licences

hdn-ckan-list --licences

Retrieves a list of licences and their details and display in on standard out (stdout):

status  maintainer  family  title   url is_generic  is_okd_compliant    is_osi_compliant    domain_data domain_content  domain_software id
active          License Not Specified       true    false   false   false   false   false   notspecified
active          Open Data Commons Public Domain Dedication and Licence (PDDL)   http://www.opendefinition.org/licenses/odc-pddl false   true    false   true    falsfalse   odc-pddl
active          Open Data Commons Open Database License (ODbL)  http://www.opendefinition.org/licenses/odc-odbl false   true    false   true    false   false   odc-odbl
active          Open Data Commons Attribution License   http://www.opendefinition.org/licenses/odc-by   false   true    false   true    false   false   odc-by
active          Creative Commons CCZero http://www.opendefinition.org/licenses/cc-zero  false   true    false   true    true    false   cc-zero

Using hdn-ckan-details

Checking everything’s OK

Before we get going, let’s check that everything works as expected. Run the command hdn-ckan-details –help (remember to substitute java -jar hdn-ckan-details.jar if you need to). You should see a list of supported options. At the time of writing, it looks like this:

Option                            Description              
------                            -----------              
--dataset-by-id <Dataset UUID>    dataset-by-id            
--dataset-by-name <Dataset Name>  dataset-by-name          
--group-by-id <Group UUID>        group-by-id              
--group-by-name <Group Name>      group-by-name            
--help                            Displays help for options
--revision-by-id <Revision UUID>  revision-by-id           
--version                         Displays version

If the output seems a bit compressed, it’s because we’re formatting for a 40 character wide screen – useful if you’re running this over ssh on Android. Whilst you can’t see it above, help output always produces an exit code of 2.

Since the options are regular POSIX long options (and are named similarly to those in the GNU coding standards), we can abbreviate them. Hence hdn-ckan-details -h and hdn-ckan-details –he will produce the same output. The only time you can’t do this is if the abbreviation would be ambiguous.

Let’s try out one of those options: –version.

Checking the version installed

Let’s run hdn-ckan-details –version:

hdn-ckan-details 2013.03.06.1127-development
© Crown Copyright 2013

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Written by Raphael Cohn (raphael.cohn@stormmq.com)

Standard GNU-like stuff. It’s worth understanding the version number, in this case, 2013.03.01.1537-development. The part before the hyphen is the timestamp of the last git check in used to build the binary – you should be able to find it using git log. Additionally, this should match the version of the deb package. The part after is the git branch the code was built from. Usually this will be either development or master.

If instead it says unknown version then it means you’re using code you’ve compiled yourself or wasn’t released ‘officially’.

Getting a Dataset by name

Apart from the options –help and –version, you need to specify one, and one only, of the options above. Each option returns a single row of data, with a header row, as tab separated values (TSV).

This really couldn’t be easier:

hdn-ckan-details --dataset-by-name warwickshire-public-weighbridges

Retrieves details of warwickshire-public-weighbridges:

license_title   maintainer  maintainer_email    id  metadata_created    relationships   license metadata_modified   author  author_email    state   version license_id  type    resources   tags    tracking_summarytotal   tracking_summaryrecent  groups  name    isopen  notes_rendered  url ckan_url    notetitle   ratings_average extras  license_url ratings_count   revision_id
UK Open Government Licence (OGL)            00055483-dd79-4ada-b4be-eb54eeaec19b    2010-05-19T12:33:30.420431      UK Open Government Licence (OGL)    2012-11-12T09:09:11.337779  Trading Standards, Environment and Economy  opendata@warwickshire.gov.uk    active      uk-ogl      Resource(ResourceGroupId(f97959e2-c651-a5ad-8586-acb8746f7c38), 2012-06-27T04:19:59.597590, PackageId(00055483-dd79-4ada-b4be-eb54eeaec19b), null, false, ResourceId(fe5bd141-d2a4-40d6-ac8f-cc9225f5b5b9), 19968, /mnt/shared/ckan_resource_cache/fe/fe5bd141-d2a4-40d6-ac8f-cc9225f5b5b9/public-weighbridges.xls, 2012-06-27T04:19:59.597703, Hash(c436142061293acbca81ae81280fca06989f039f), , Format([Excel]), TrackingSummary(0, 0), null, application/vnd.ms-excel, KnownUrl(http://data.gov.uk/data/resource_cache/fe/fe5bd141-d2a4-40d6-ac8f-cc922

The exit code is 0. The data in data.gov.uk isn’t particularly clean, and occasionally you’ll find data with embedded CR (Carriage Return), LF (Line Feed) and HT (Horizontal Tab) control codes. When these are encountered, the data returned uses the Unicode replacement character U+FFFD, as these control codes are invalid in TSV. You’ll know when this happens, as you’ll see a black diamond with a question mark in it on most modern terminals. This is the case above.

By the way, the sort order of the data isn’t known, and the order of the results is unlikely to be stable.

If you don’t want the header row you can remove it using tail on POSIX systems:

hdn-ckan-details --dataset-by-name warwickshire-public-weighbridges | tail -n +2

Which gives:

UK Open Government Licence (OGL)            00055483-dd79-4ada-b4be-eb54eeaec19b    2010-05-19T12:33:30.420431      UK Open Government Licence (OGL)    2012-11-12T09:09:11.337779  Trading Standards, Environment and Economy  opendata@warwickshire.gov.uk    active      uk-ogl      Resource(ResourceGroupId(f97959e2-c651-a5ad-8586-acb8746f7c38), 2012-06-27T04:19:59.597590, PackageId(00055483-dd79-4ada-b4be-eb54eeaec19b), null, false, ResourceId(fe5bd141-d2a4-40d6-ac8f-cc9225f5b5b9), 19968, /mnt/shared/ckan_resource_cache/fe/fe5bd141-d2a4-40d6-ac8f-cc9225f5b5b9/public-weighbridges.xls, 2012-06-27T04:19:59.597703, Hash(c436142061293acbca81ae81280fca06989f039f), , Format([Excel]), TrackingSummary(0, 0), null, application/vnd.ms-excel, KnownUrl(http://data.gov.uk/data/resource_cache/fe/fe5bd141-d2a4-40d6-ac8f-cc9225f5b5b9/public-weighbridges.xls), null, null, KnownUrl(http://opendata.s3.amazonaws.com/public-weighbridges.xls), UnknownUnknownUrl, 0, null, 19968, 0, 1, The format entered for the resource doesn't match the description from the web server, application/vnd.ms-excel, null, null, null, , UnknownUnknownUrl, null, null, null, , UnknownUnknownUrl) warwickshirewarwickshire-county-councilweighbridges 0   0   d615a457-c8a6-4474-9974-02171a5de623    warwickshire-public-weighbridges    true    <p>A list of all public weighbridges in Warwickshire, including address, telephone, capacity, approximate dimensions, opening times and fees�</p>     http://data.gov.uk//dataset/warwickshire-public-weighbridges    A list of all public weighbridges in Warwickshire, including address, telephone, capacity, approximate dimensions, opening times and fees   Public Weighbridges         http://reference.data.gov.uk/id/open-government-licence 0   8894f663-cbab-4e06-a2bf-9b22db575b68

Sometimes, a dataset name might start with a hyphen. This causes a problem for the program to know where options finish. To overcome this, use the alternative POSIX long argument syntax:

hdn-ckan-details --dataset-by-name=-has-hyphen-name

If there is nothing found, then only the header is returned:

hdn-ckan-details --dataset-by-name BLAH_HDN

gives just:

license_title   maintainer  maintainer_email    id  metadata_created    relationships   license metadata_modified   author  author_email    state   version license_id  type    resources   tags    tracking_summarytotal   tracking_summaryrecent  groups  name    isopen  notes_rendered  url ckan_url    notetitle   ratings_average extras  license_url ratings_count   revision_id

Other things you can get detail on are:

  • dataset-by-id
  • group-by-name
  • group-by-id
  • revision-by-id

All of these return single columns, except for licences and tag-counts. Let’s take a look at each of them.

Datasets by id

hdn-ckan-details --dataset-by-id 00055483-dd79-4ada-b4be-eb54eeaec19b

Gives the same information as above:

license_title   maintainer  maintainer_email    id  metadata_created    relationships   license metadata_modified   author  author_email    state   version license_id  type    resources   tags    tracking_summarytotal   tracking_summaryrecent  groups  name    isopen  notes_rendered  url ckan_url    notetitle   ratings_average extras  license_url ratings_count   revision_id
UK Open Government Licence (OGL)            00055483-dd79-4ada-b4be-eb54eeaec19b    2010-05-19T12:33:30.420431      UK Open Government Licence (OGL)    2012-11-12T09:09:11.337779  Trading Standards, Environment and Economy  opendata@warwickshire.gov.uk    active      uk-ogl      Resource(ResourceGroupId(f97959e2-c651-a5ad-8586-acb8746f7c38), 2012-06-27T04:19:59.597590, PackageId(00055483-dd79-4ada-b4be-eb54eeaec19b), null, false, ResourceId(fe5bd141-d2a4-40d6-ac8f-cc9225f5b5b9), 19968, /mnt/shared/ckan_resource_cache/fe/fe5bd141-d2a4-40d6-ac8f-cc9225f5b5b9/public-weighbridges.xls, 2012-06-27T04:19:59.597703, Hash(c436142061293acbca81ae81280fca06989f039f), , Format([Excel]), TrackingSummary(0, 0), null, application/vnd.ms-excel, KnownUrl(http://data.gov.uk/data/resource_cache/fe/fe5bd141-d2a4-40d6-ac8f-cc9225f5b5b9/public-weighbridges.xls), null, null, KnownUrl(http://opendata.s3.amazonaws.com/public-weighbridges.xls), UnknownUnknownUrl, 0, null, 19968, 0, 1, The format entered for the resource doesn't match the description from the web server, application/vnd.ms-excel, null, null, null, , UnknownUnknownUrl, null, null, null, , UnknownUnknownUrl) warwickshirewarwickshire-county-councilweighbridges 0   0   d615a457-c8a6-4474-9974-02171a5de623    warwickshire-public-weighbridges    true    &lt;p&gt;A list of all public weighbridges in Warwickshire, including address, telephone, capacity, approximate dimensions, opening times and fees�&lt;/p&gt;     http://data.gov.uk//dataset/warwickshire-public-weighbridges    A list of all public weighbridges in Warwickshire, including address, telephone, capacity, approximate dimensions, opening times and fees   Public Weighbridges         http://reference.data.gov.uk/id/open-government-licence 0   8894f663-cbab-4e06-a2bf-9b22db575b68

Groups by name

hdn-ckan-details --group-by-name aberdeen-city-council

Gives:

users   display_name    description title   created approval_status state   extras  image_url   groups  revision_id packages    type    id  tags    name
User(null, User account imported from Drupal system., editor, UserName(user_d120811), 2012-11-15T08:26:40.000000, Hash(2d215346b4762b0f43f216a5f556302f), 0, 0, IanWatt, IanWatt, UserId(5a18c60a-ff25-4c73-a1ce-fab769ca5b2a)) User(null, null, admin, UserName(user_d13491), 2012-06-28T08:06:37.301558, Hash(de24ef8a7a6b6dbff054314c4565894d), 16, 0, hazel lee, hazel lee, UserId(0306e72b-17ba-4e6f-8fd3-67ca4e1ca426))   Aberdeen City Council   Aberdeen City Council   Aberdeen City Council   2013-01-08T09:11:23.076216  pending active              b71d3b1c-9648-419e-a28e-b39b6b2bfac9        publisher   1cb62be8-50cd-4d87-869d-26dafeb5f649        aberdeen-city-council

Groups by id

hdn-ckan-details --group-by-id 1cb62be8-50cd-4d87-869d-26dafeb5f649

Gives the same information as above:

users   display_name    description title   created approval_status state   extras  image_url   groups  revision_id packages    type    id  tags    name
User(null, User account imported from Drupal system., editor, UserName(user_d120811), 2012-11-15T08:26:40.000000, Hash(2d215346b4762b0f43f216a5f556302f), 0, 0, IanWatt, IanWatt, UserId(5a18c60a-ff25-4c73-a1ce-fab769ca5b2a)) User(null, null, admin, UserName(user_d13491), 2012-06-28T08:06:37.301558, Hash(de24ef8a7a6b6dbff054314c4565894d), 16, 0, hazel lee, hazel lee, UserId(0306e72b-17ba-4e6f-8fd3-67ca4e1ca426))   Aberdeen City Council   Aberdeen City Council   Aberdeen City Council   2013-01-08T09:11:23.076216  pending active              b71d3b1c-9648-419e-a28e-b39b6b2bfac9        publisher   1cb62be8-50cd-4d87-869d-26dafeb5f649        aberdeen-city-council

Revisions by id

hdn-ckan-details --revision-by-id b71d3b1c-9648-419e-a28e-b39b6b2bfac9

Gives revision details:

id  timestamp   message author  approved_timestamp  packages    groups
b71d3b1c-9648-419e-a28e-b39b6b2bfac9    2013-01-08T09:11:23.028649      user_d13491         1cb62be8-50cd-4d87-869d-26dafeb5f6494d1ddc1c-97f7-4d02-8c43-9bec05557d5a

Unfortunately, the groups are formatted as a long string – you’ll can split this every 36 characters using cut.

Using hdn-ckan-relationships

This program finds the different kinds of relationships a dataset has to other datasets.

Checking everything’s OK

Before we get going, let’s check that everything works as expected. Run the command hdn-ckan-relationships –help (remember to substitute java -jar hdn-ckan-relationships.jar if you need to). You should see a list of supported options. At the time of writing, it looks like this:

Option                                  Description                           
------                                  -----------                           
--as-dataset-ids [Boolean: true if as   returns results as Dataset Ids (UUIDs)
  dataset ids; false or unspecified to                                        
  produce dataset names]                                                      
--child-of <DatasetName: Dataset Name   child-of                              
  or UUID (UUIDs do not work when                                             
  returning results as Dataset Names)>                                        
--dependency-on <DatasetName: Dataset   dependency-on                         
  Name or UUID (UUIDs do not work when                                        
--depends-on <DatasetName: Dataset      depends-on                            
--derives-from <DatasetName: Dataset    derives-from                          
--has-derivation <DatasetName: Dataset  has-derivation                        
--help                                  Displays help for options             
--linked-from <DatasetName: Dataset     linked-from                           
--links-to <DatasetName: Dataset Name   links-to                              
--parent-of <DatasetName: Dataset Name  parent-of                             
--version                               Displays version

If the output seems a bit compressed, it’s because we’re formatting for a 40 character wide screen – useful if you’re running this over ssh on Android. Whilst you can’t see it above, help output always produces an exit code of 2.

Since the options are regular POSIX long options (and are named similarly to those in the GNU coding standards), we can abbreviate them. Hence hdn-ckan-details -h and hdn-ckan-details –he will produce the same output. The only time you can’t do this is if the abbreviation would be ambiguous.

Let’s try out one of those options: –version.

Checking the version installed

Let’s run hdn-ckan-relationships –version:

hdn-ckan-relationships 2013.03.06.1127-development
© Crown Copyright 2013

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Written by Raphael Cohn (raphael.cohn@stormmq.com)

Standard GNU-like stuff. It’s worth understanding the version number, in this case, 2013.03.01.1537-development. The part before the hyphen is the timestamp of the last git check in used to build the binary – you should be able to find it using git log. Additionally, this should match the version of the deb package. The part after is the git branch the code was built from. Usually this will be either development or master.

If instead it says unknown version then it means you’re using code you’ve compiled yourself or wasn’t released ‘officially’.

Finding the datasets a dataset is a child-of

Apart from the options –help and –version, you need to specify one, and one only, of the options above. Each option returns a listof data, with a header row, as tab separated values (TSV).

For example:

hdn-ckan-relationships --child-of warwickshire-public-weighbridges

Returns:

datasetName
warwickshire-public-weighbridges
disclosure-ministerial-external-meetings-defra
sw-sha-financial-tansactions
staff-organograms-and-pay-joint-nature-conservation-committee
financial-transactions-over-25k-from-nottingham-city-pct-april-2012

...

If you want the results as dataset ids, you can also use the –as-dataset-ids option:

hdn-ckan-relationships --child-of warwickshire-public-weighbridges --as-dataset-ids

Gives:

datasetId
00055483-dd79-4ada-b4be-eb54eeaec19b
0011142e-93ba-4fc1-b27a-fc708a0aa84b
0023048e-6173-419f-9828-50965ea76d78
00285196-57e4-4048-9266-d3afde801c30
0033ca90-c6b4-400b-9f94-20b4fbadc0d4

Note that the order does not have to match that for dataset names. –as-dataset-ids can also take an argument of either true or false. true returns ids; false returns names. This is useful for scripting:

hdn-ckan-relationships --child-of warwickshire-public-weighbridges --as-dataset-ids false

Gives:

datasetName
warwickshire-public-weighbridges
disclosure-ministerial-external-meetings-defra
sw-sha-financial-tansactions
staff-organograms-and-pay-joint-nature-conservation-committee
financial-transactions-over-25k-from-nottingham-city-pct-april-2012

...

It’s also possible to search by data set id, too.

hdn-ckan-relationships --child-of 00055483-dd79-4ada-b4be-eb54eeaec19b

Gives:

datasetName
warwickshire-public-weighbridges
disclosure-ministerial-external-meetings-defra
sw-sha-financial-tansactions
staff-organograms-and-pay-joint-nature-conservation-committee
financial-transactions-over-25k-from-nottingham-city-pct-april-2012

...

And

hdn-ckan-relationships --child-of 00055483-dd79-4ada-b4be-eb54eeaec19b --as-dataset-ids

Gives:

datasetId
00055483-dd79-4ada-b4be-eb54eeaec19b
0011142e-93ba-4fc1-b27a-fc708a0aa84b
0023048e-6173-419f-9828-50965ea76d78
00285196-57e4-4048-9266-d3afde801c30
0033ca90-c6b4-400b-9f94-20b4fbadc0d4

Finding other kinds of relationship

To find a different a different kind of relationship, replace –child-of with one of:-

  • –dependency-on
  • –depends-on
  • –dependency-on
  • –derives-from
  • –has-derivation
  • –linked-from
  • –links-to
  • –parent-of

Using hdn-ckan-query

Checking everything’s OK

Before we get going, let’s check that everything works as expected. Run the command hdn-ckan-query –help (remember to substitute java -jar hdn-ckan-query.jar if you need to). You should see a list of supported options. At the time of writing, it looks like this:

Option                                 Description              
------                                 -----------              
--dataset-ids-by-tag <Tag>             dataset-ids-by-tag       
--dataset-names-by-tag <Tag>           dataset-names-by-tag     
--help                                 Displays help for options
--revisions-by-id <Revision UUID>      revisions-by-id          
--revisions-by-name <Revision Name>    revisions-by-name        
--revisions-since-id <Revision UUID>   revisions-since-id       
--revisions-since-timestamp            revisions-since-timestamp
  <Microsecond Timestamp, eg 2013-01-                           
  28T20:06:30.061645>                                           
--version                              Displays version

If the output seems a bit compressed, it’s because we’re formatting for a 40 character wide screen – useful if you’re running this over ssh on Android. Whilst you can’t see it above, help output always produces an exit code of 2.

Since the options are regular POSIX long options (and are named similarly to those in the GNU coding standards), we can abbreviate them. Hence hdn-ckan-query -h and hdn-ckan-query –he will produce the same output. The only time you can’t do this is if the abbreviation would be ambiguous.

Let’s try out one of those options: –version.

Checking the version installed

Let’s run hdn-ckan-query –version:

hdn-ckan-query 2013.03.06.1127-development
© Crown Copyright 2013

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Written by Raphael Cohn (raphael.cohn@stormmq.com)

Standard GNU-like stuff. It’s worth understanding the version number, in this case, 2013.03.01.1537-development. The part before the hyphen is the timestamp of the last git check in used to build the binary – you should be able to find it using git log. Additionally, this should match the version of the deb package. The part after is the git branch the code was built from. Usually this will be either development or master.

If instead it says unknown version then it means you’re using code you’ve compiled yourself or wasn’t released ‘officially’.

Getting dataset ids from a tag

This really couldn’t be easier:

hdn-ckan-query --dataset-ids-by-tag Jurassic

Retrieves a list of dataset ids, and displays it on standard out (stdout):

datasetId
033c42fa-a433-4b13-af53-133bafa5c8df
59dab25a-ea5a-4df4-9c1d-e5a844bf5662
b54812d6-5706-4360-9a25-2e8664177c6d

Getting dataset names from a tag

Likewise:

hdn-ckan-query --dataset-names-by-tag Jurassic

Retrieves a list of dataset name:

datasetName
biostratigraphical-masterpacks
distribution-of-uk-north-sea-lithostratigraphic-units
london-register-of-microfossils

Getting revisions

There are four ways to get revisions:

  • –revisions-by-id
  • –revisions-by-name
  • –revisions-since-id
  • –revisions-since-timestamp

These are fairly self-explanatory except the last, which is also probably the most interesting (and quite slow):

hdn-ckan-query --revisions-since-timestamp 2013-01-28T20:06:30.061645

which, using a timestamp with microsecond precision (but not accuracy) produces:

revisionId
ff24e891-407f-4152-972c-b8052f1d5737
e523c8a8-553d-4ec7-90a7-530abf7f911a
506e7168-af05-47b4-b911-b30e9b6257aa
1590e5e0-b21e-4e70-8e9a-a1b6d9abc557
06d48005-60fe-472e-9e3e-fae795dac974

Using hdn-ckan-dataset-search

Checking everything’s OK

Before we get going, let’s check that everything works as expected. Run the command hdn-ckan-dataset-search –help (remember to substitute java -jar hdn-ckan-dataset-search.jar if you need to). You should see a list of supported options. At the time of writing, it looks like this:

Option                                  Description                           
------                                  -----------                           
--any <substring to search for case     search in any field                   
  insensitively>                                                              
--as-dataset-ids [Boolean: true if as   returns results as Dataset Ids (UUIDs)
  dataset ids; false or unspecified to                                        
  produce dataset names]                                                      
--author <substring to search for case  search in the author field            
--groups <substring to search for case  search in the groups field (name or   
  insensitively>                          UUID)                               
--help                                  Displays help for options             
--maintainer <substring to search for   search in the maintainer field        
  case insensitively>                                                         
--notes <substring to search for case   search in the notes field             
--tags <substring to search for case    search in the tags field (a hyphenated
  insensitively>                          tag)                                
--title <substring to search for case   search in the title field             
--update-frequency <substring to        search by update frequency (uncertain 
  search for case insensitively>          what this is)                       
--version                               Displays version

If the output seems a bit compressed, it’s because we’re formatting for a 40 character wide screen – useful if you’re running this over ssh on Android. Whilst you can’t see it above, help output always produces an exit code of 2.

Since the options are regular POSIX long options (and are named similarly to those in the GNU coding standards), we can abbreviate them. Hence hdn-ckan-dataset-search -h and hdn-ckan-dataset-search –he will produce the same output. The only time you can’t do this is if the abbreviation would be ambiguous.

Let’s try out one of those options: –version.

Checking the version installed

Let’s run hdn-ckan-dataset-search –version:

hdn-ckan-dataset-search 2013.03.06.1127-development
© Crown Copyright 2013

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Written by Raphael Cohn (raphael.cohn@stormmq.com)

Standard GNU-like stuff. It’s worth understanding the version number, in this case, 2013.03.01.1537-development. The part before the hyphen is the timestamp of the last git check in used to build the binary – you should be able to find it using git log. Additionally, this should match the version of the deb package. The part after is the git branch the code was built from. Usually this will be either development or master.

If instead it says unknown version then it means you’re using code you’ve compiled yourself or wasn’t released ‘officially’.

Getting dataset ids from a tag

This really couldn’t be easier:

hdn-ckan-query --dataset-ids-by-tag Jurassic

Retrieves a list of dataset ids, and displays it on standard out (stdout):

datasetId
033c42fa-a433-4b13-af53-133bafa5c8df
59dab25a-ea5a-4df4-9c1d-e5a844bf5662
b54812d6-5706-4360-9a25-2e8664177c6d

Getting datasets using a text search

The easiest search is one that looks for a string, case insensitively, in all the fields:

hdn-ckan-dataset-search --any health

This returns the by now familiar TSV format on standard out:

datasetName
focus_on_health
health_inequalities
health_inequalities_
health_analysis
warwickshire-health-deprivation
health-survey-for-england-2010-respiratory-health
mental_health_-_prevalence_of_common_mental_health_problems
health_survey_for_england_2009_health_and_lifestyles
health_profile_of_england

...

If you want dataset ids instead, you can use the –as-dataset-ids switch:

hdn-ckan-dataset-search --any health --as-dataset-ids

which gives:

datasetId
1b45312a-a784-473e-b2f7-def3eadddf96
7ca3a54c-2d64-41d8-a19e-45cbd7d75b53
cd00bc6a-5555-46ca-a412-db1ad7678110
c4c2d9be-a749-455f-85aa-1d5e3dd7f9c5
98fd10dc-12b1-4d0c-9bd2-e28c11be51f3
58602501-49a6-4337-9747-cfdbb026bcd3
8a671aef-e9e3-4e99-a9a4-7af6c03d6190
342515de-9cfa-49a4-ae2a-8fe1b403bd56
49016034-c80b-4b9a-b9a5-0d10858df59f

–as-dataset-ids works the same as it does for hdn-ckan-relationships.

Getting datasets by other criteria

You can get datasets by similar string searches on other fields:

  • –author
  • –groups
  • –maintainer
  • –notes
  • –tags
  • –title
  • –update-frequency

These searches it seems are case sensitive and work on whole words only.

The data in author and maintainer is not particularly well populated in data.gov.uk.

An example using tags might be:

hdn-ckan-dataset-search --tags HEALTH

which gives:

datasetName
comparative-merits-of-consuming-vegetables-produced-locally-and-overseas-greenhouse-2004-2008
development-plan-landuse-zones-metadata

which differs to

hdn-ckan-dataset-search --tags health

with results

datasetName
13-butadiene-running-annual-mean-at-automatic-sites-comparison-with-health-objective-for-2003-u-2010
a_section_75_analysis_of_mortality_patterns_in_northern_ireland
abortion_statistics
abortion_statistics_england_and_wales

...

You have been warned!

Combining criteria

It’s possible to combine any number of criteria as an ‘AND’ query. For example:

hdn-ckan-dataset-search --tags health --any nice

Produces just:

datasetName
use_of_nice-appraised_medicines_in_the_nhs_in_england
financial-transactions-data-nice
england-nhs-indicator-proportion-of-incident-cases-reviewed-by-multi-disciplinary-team-for-all-cance
england-nhs-indicator-for-iapt-services---the-number-of-people-who-are-moving-to-recovery-as-a-propo
england-nhs-indicator-percentage-compliance-with-peer-review-by-team-breast-lung-colorectal-local-an

Using hdn-ckan-resource-search

Checking everything’s OK

Before we get going, let’s check that everything works as expected. Run the command hdn-ckan-resource-search –help (remember to substitute java -jar hdn-ckan-resource-search.jar if you need to). You should see a list of supported options. At the time of writing, it looks like this:

Option                                  Description                         
------                                  -----------                         
--description <substring to search for  search in the description field     
  case insensitively>                                                       
--format <substring to search for case  search in the format field (stick to
  insensitively>                          file extensions)                  
--help                                  Displays help for options           
--url <substring to search for case     search in the url field             
  insensitively>                                                            
--version                               Displays version

If the output seems a bit compressed, it’s because we’re formatting for a 40 character wide screen – useful if you’re running this over ssh on Android. Whilst you can’t see it above, help output always produces an exit code of 2.

Since the options are regular POSIX long options (and are named similarly to those in the GNU coding standards), we can abbreviate them. Hence hdn-ckan-resource-search -h and hdn-ckan-resource-search –he will produce the same output. The only time you can’t do this is if the abbreviation would be ambiguous.

Let’s try out one of those options: –version.

Checking the version installed

Let’s run hdn-ckan-resource-search –version:

hdn-ckan-resource-search 2013.03.06.1127-development
© Crown Copyright 2013

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Written by Raphael Cohn (raphael.cohn@stormmq.com)

Standard GNU-like stuff. It’s worth understanding the version number, in this case, 2013.03.01.1537-development. The part before the hyphen is the timestamp of the last git check in used to build the binary – you should be able to find it using git log. Additionally, this should match the version of the deb package. The part after is the git branch the code was built from. Usually this will be either development or master.

If instead it says unknown version then it means you’re using code you’ve compiled yourself or wasn’t released ‘officially’.

Finding resources

Just like hdn-ckan-dataset-search, you can combine options to generate and queries. One difference is that the strings specified are supposed to be used case insensitively, and that substrings are supposed to match.

A simple example to find all resources that are CSV files is:

hdn-ckan-resource-search --format CSV

which produces

resourceId
04ae49d6-7664-4e0e-a160-9451495455e9
17874140-59a8-46d7-90cc-4ec634a22d83
10594353-39f1-44a5-92f0-7d6837abcbd2
8c6e6230-daa6-4641-8fa7-d5b4072d96b5
560eb837-8ace-4955-a8bb-b60e21d2adc6

...

Resources only are returned by UUID, sadly.

A combined query might be

hdn-ckan-resource-search --format CSV --description health

which gives

resourceId
dcac1343-08fd-4aee-bd43-1d51d2c44d58
d244632a-33f3-4a66-bba8-1d596ca70951
fe900112-f6cb-4d69-9653-8a335b8d1f08
8f509afe-909a-471c-a59d-49551b28b382
8d0add74-e1da-462f-9e21-ac21b27494c7

Using the java library programmatically

The way you do this varies depending on what you used above:

  • If you’ve downloaded or forked source from github, you can use IntelliJ. Open source/subprojects.ipr and start hacking.
  • If you’ve downloaded the jars (and source zips), create a project or add them to an existing project in your favourite IDE (if it isn’t IntelliJ, then switch now).

You’ll need the jar files:

  • ckan-api
  • ckan-domain
  • ckan-schema

And their dependencies, which, at the time of writing, are:

  • common
  • common-http
  • common-http-client
  • common-http-client-json
  • common-reflection
  • common-exceptions
  • common-tuples
  • common-naming
  • common-parsers
  • common-parsers-json
  • common-serialisers
  • common-serialisers-separatedValues
  • and the library, annotations.jar.

This list may change. To find the most up-to-date list, either extract META-INF/MANIFEST.MF from ckan-api.jar and read the Class-Path entry, or open the IntelliJ project (source/subprojects.ipr) and look at the dependencies of the module `ckan-api (sensibly, module names match jar names and source zip names).

Making simple requests

The ‘guts’ of the java library’s API is the interface CkanApi. It’s in the package uk.nhs.hdn.ckan.api. This interface provides methods for all the read-only operations one might want to do against CKAN. The methods take some parameters and return an immutable ApiMethod. This works a bit like a java Method – think of it as a late-bound method – but is strongly typed. Effectively, it’s the same as .NET delegate. Calls on this object are thread-safe. The design idea here is that configuring everything for a REST call is quite a bit of work, but one might want to do the call many times.

A concrete instance of the CkanApi interface is the class ConcreteCkanApi. The easiest way to use this is with the constant DataGovUk which provides a properly configured instance for data.gov.uk. For example:

final ApiMethod<DatasetName[]> getAllDatasetNames = DataGovUk.allDatasetNames();
final DatasetName[] allDatasetNames = getAllDatasetNames.execute();

The first line uses a static reference to DataGovUk, which is an instance of ConcreteCkanApi, to create everything that’s needed to get all dataset names.

The second line makes the REST call and returns all the dataset names. If it goes wrong, one of three exceptions will be thrown:

  • CouldNotConnectHttpException – the service is unavailable due to network problems or remote server problems
  • UnacceptableResponseException – the response code was not 200 OK, there was no content or content was not JSON
  • CorruptResponseException – the response was not valid JSON that could be parsed and understood

Of course, a request might be one off, so you could chain it together:

final DatasetId[] execute = DataGovUk.allDatasetIds().execute();

And so pretend it’s like a regular java method (It’s worth noting that if this were Python, we’d make execute() the default method for the object, so one could just do DataGovUk.allDatasetIds()(), but java lacks such syntactic sugar).

So, putting it all together, a class might look like:

package uk.nhs.hdn.ckan.api;

import uk.nhs.hdn.ckan.domain.uniqueNames.DatasetName;
import uk.nhs.hdn.common.http.client.ApiMethod;
import uk.nhs.hdn.common.http.client.exceptions.CorruptResponseException;
import uk.nhs.hdn.common.http.client.exceptions.CouldNotConnectHttpException;
import uk.nhs.hdn.common.http.client.exceptions.UnacceptableResponseException;

import static uk.nhs.hdn.ckan.api.ConcreteCkanApi.DataGovUk;

public class Example1
{
    public void example() throws UnacceptableResponseException, CouldNotConnectHttpException, CorruptResponseException
    {
        final ApiMethod<DatasetName[]> getAllDatasetNames = DataGovUk.allDatasetNames();
        final DatasetName[] allDatasetNames = getAllDatasetNames.execute();

        final DatasetId[] allDatasetIds = DataGovUk.allDatasetIds().execute();
    }
}

This particular code is in the class uk.nhs.hdn.ckan.api.Example1.

More Methods

The remaining methods are nearly all the same. Indeed these ones can be treated identically and take no parameters:

  • allDatasetNames()
  • allDatasetIds()
  • allGroupNames()
  • allGroupIds()
  • allTags()
  • allLicences()
  • tagCounts()

For example:

...

final GroupName[] allGroupNames = DataGovUk.allGroupNames().execute();
final GroupId[] allGroupIds = DataGovUk.allGroupIds().execute();
final TagName[] allTags = DataGovUk.allTags().execute();
final Licence[] allLicences = DataGovUk.allLicences().execute();
final TagCount[] allTagCounts = DataGovUk.tagCounts().execute();

...

Some methods might take an argument or two. These are simple domain objects that are constructed from strings (those ending in Name) or from valueOf (typically UUIDs and dates). For example:

...

final DatasetKey datasetName = new DatasetName("focus_on_health");
final Dataset dataset = DataGovUk.dataset(datasetName).execute();

...

and

...

final DatasetKey datasetId = DatasetId.valueOf("1b45312a-a784-473e-b2f7-def3eadddf96");
final Dataset anotherDataset = DataGovUk.dataset(datasetName).execute();

...

are identical. This is how we support both version 1 and version 2 of the CKAN API.

Other requests are just as simple:

...

final Group group = DataGovUk.group(aGroupName).execute();
final DatasetName[] datasetNamesWithTag = DataGovUk.datasetNamesWithTag(aTagName).execute();
final DatasetId[] datasetIdsWithTag = DataGovUk.datasetIdsWithTag(aTagName).execute();
final Revision[] revisionsForDataset = DataGovUk.datasetRevisions(datasetName).execute();
final Revision revision = DataGovUk.revision(aRevisionId).execute();
final RevisionId[] revisionsSinceRevisioNId = DataGovUk.revisions(sinceRevisionId).execute();
final RevisionId[] revisionsSinceTimestamp = DataGovUk.revisions(sinceTimestamp).execute();

...

The variables for these are easily set-up:

...

final GroupName aGroupName = allGroupNames[0];
final TagName aTagName = allTags[0];
final RevisionId aRevisionId = RevisionId.valueOf("ff24e891-407f-4152-972c-b8052f1d5737");
final RevisionId sinceRevisionId = aRevisionId;
final MicrosecondTimestamp sinceTimestamp = microsecondTimestamp("2013-01-28T20:06:30.061645");

...

Slightly more complex are the two methods that find dataset relationships:

...

final DatasetName[] datasetNamesByLinkedFrom = DataGovUk.datasetRelationshipsByDatasetName((DatasetName) datasetName, linked_from).execute();
final DatasetId[] datasetIdsLinkedFrom = DataGovUk.datasetRelationshipsByDatasetId(datasetId, linked_from).execute();

...

Putting it all together, the code might look like:

package uk.nhs.hdn.ckan.api;

import uk.nhs.hdn.ckan.domain.*;
import uk.nhs.hdn.ckan.domain.dates.MicrosecondTimestamp;
import uk.nhs.hdn.ckan.domain.ids.DatasetId;
import uk.nhs.hdn.ckan.domain.ids.GroupId;
import uk.nhs.hdn.ckan.domain.ids.RevisionId;
import uk.nhs.hdn.ckan.domain.uniqueNames.DatasetKey;
import uk.nhs.hdn.ckan.domain.uniqueNames.DatasetName;
import uk.nhs.hdn.ckan.domain.uniqueNames.GroupName;
import uk.nhs.hdn.ckan.domain.uniqueNames.TagName;
import uk.nhs.hdn.common.http.client.exceptions.CorruptResponseException;
import uk.nhs.hdn.common.http.client.exceptions.CouldNotConnectHttpException;
import uk.nhs.hdn.common.http.client.exceptions.UnacceptableResponseException;

import static uk.nhs.hdn.ckan.api.ConcreteCkanApi.DataGovUk;
import static uk.nhs.hdn.ckan.api.RelationshipType.linked_from;
import static uk.nhs.hdn.ckan.domain.dates.MicrosecondTimestamp.microsecondTimestamp;

public class Example2
{
    public void example() throws UnacceptableResponseException, CouldNotConnectHttpException, CorruptResponseException
    {
        final GroupName[] allGroupNames = DataGovUk.allGroupNames().execute();
        final GroupId[] allGroupIds = DataGovUk.allGroupIds().execute();
        final TagName[] allTags = DataGovUk.allTags().execute();
        final Licence[] allLicences = DataGovUk.allLicences().execute();
        final TagCount[] allTagCounts = DataGovUk.tagCounts().execute();

        final DatasetKey datasetName = new DatasetName("focus_on_health");
        final Dataset dataset = DataGovUk.dataset(datasetName).execute();

        final DatasetKey datasetId = DatasetId.valueOf("1b45312a-a784-473e-b2f7-def3eadddf96");
        final Dataset anotherDataset = DataGovUk.dataset(datasetName).execute();

        final GroupName aGroupName = allGroupNames[0];
        final TagName aTagName = allTags[0];
        final RevisionId aRevisionId = RevisionId.valueOf("ff24e891-407f-4152-972c-b8052f1d5737");
        final RevisionId sinceRevisionId = aRevisionId;
        final MicrosecondTimestamp sinceTimestamp = microsecondTimestamp("2013-01-28T20:06:30.061645");

        final Group group = DataGovUk.group(aGroupName).execute();

        final DatasetName[] datasetNamesWithTag = DataGovUk.datasetNamesWithTag(aTagName).execute();

        final DatasetId[] datasetIdsWithTag = DataGovUk.datasetIdsWithTag(aTagName).execute();

        final Revision[] revisionsForDataset = DataGovUk.datasetRevisions(datasetName).execute();

        final Revision revision = DataGovUk.revision(aRevisionId).execute();

        final RevisionId[] revisionsSinceRevisioNId = DataGovUk.revisions(sinceRevisionId).execute();

        final RevisionId[] revisionsSinceTimestamp = DataGovUk.revisions(sinceTimestamp).execute();

        final DatasetName[] datasetNamesByLinkedFrom = DataGovUk.datasetRelationshipsByDatasetName((DatasetName) datasetName, linked_from).execute();

        final DatasetId[] datasetIdsLinkedFrom = DataGovUk.datasetRelationshipsByDatasetId(datasetId, linked_from).execute();
    }
}

This is in the class in Example2 in the package uk.nhs.hdn.ckan.api.

Search Requests

Search requests are quite complex. There follow the same pattern as other API methods, but take three parameters:

  • searchCriteria – zero or more clauses to match. Multiple clauses to match are treated as an AND.
  • offset – the CKAN API returns paged results. This specifies the first row to be returned.
  • limit – the maximum number of rows to return (values over 1000 are clamped to 1000).

Interestingly, the offset and limit are implemented internally as search criteria by CKAN. For convenience, there are two constants in UnsignedLongSearchCriterion to deal with useful values for offset and limit:

  • MinimumOffset
  • MaximumLimit

This design by CKAN means that one may have to issue the same search criteria more than once, interpreting the previous search response to see if one should increase the offset (and optionally reduce the limit). Such a design works well for a web page, but less well for a regular client. It is also not clear if the search results between requests are consistent (eg ‘pages’ of results don’t get introduced) and stably sorted.

The search criteria uses a Humane Interface Design. One creates an instance of a SearchCriterion and ands clauses on to it. For convenience, common search criterion creating methods are in StringSearchCriterion:

...

// Note that this is highly unlikely to match anything!
final SearchCriteria<Dataset> datasetSearchCriteria = datasetAnySearchCriterion("health").and(datasetAuthorSearchCriterion("Joe Bloggs")).and(datasetGroupsSearchCriterion("PCTs"));

...

Instead of using the direct search queries, we’ve wrapped up the iterative calling in something we call a SearchDelegate. So, to search for datasets and get results by name is really easy:

...

final SearchDelegate<DatasetName,Dataset> datasetNameDatasetSearchDelegate = DataGovUk.datasetNameSearchDelegate();
final DatasetName[] datasetNames = datasetNameDatasetSearchDelegate.allResults(datasetSearchCriteria);

...

The search delegate object is similar to the ApiMethod early – it is expensive to set up (but not terribly so) and so can be re-used by any number of threads.

One can also get the results as Dataset UUIDs:

...

final DatasetId[] datasetIds = DataGovUk.datasetIdSearchDelegate().allResults(datasetSearchCriteria);

...

This also shows how you don’t need to keep the instance of the search delegate.

Similarly, it’s also possible to search this way for resources:

...

final ResourceId[] someExcelResources = DataGovUk.resourceSearchDelegate().allResults(resourceFormatSearchCriterion("XLS"));

...

However, the quality of this data is pretty poor (the search above won’t return all kinds of Excel spreadsheet, as multiple bits of text have been entered).

Putting it altogether as code:

package uk.nhs.hdn.ckan.api;

import uk.nhs.hdn.ckan.api.search.SearchCriteria;
import uk.nhs.hdn.ckan.api.search.searchDelegates.SearchDelegate;
import uk.nhs.hdn.ckan.domain.Dataset;
import uk.nhs.hdn.ckan.domain.ids.DatasetId;
import uk.nhs.hdn.ckan.domain.ids.ResourceId;
import uk.nhs.hdn.ckan.domain.uniqueNames.DatasetName;
import uk.nhs.hdn.common.http.client.exceptions.CorruptResponseException;
import uk.nhs.hdn.common.http.client.exceptions.CouldNotConnectHttpException;
import uk.nhs.hdn.common.http.client.exceptions.UnacceptableResponseException;

import static uk.nhs.hdn.ckan.api.ConcreteCkanApi.DataGovUk;
import static uk.nhs.hdn.ckan.api.search.StringSearchCriterion.*;

public class Example3
{
    public void example() throws UnacceptableResponseException, CouldNotConnectHttpException, CorruptResponseException
    {
        // Note that this is highly unlikely to match anything!
        final SearchCriteria<Dataset> datasetSearchCriteria = datasetAnySearchCriterion("health").and(datasetAuthorSearchCriterion("Joe Bloggs")).and(datasetGroupsSearchCriterion("PCTs"));
        final SearchDelegate<DatasetName,Dataset> datasetNameDatasetSearchDelegate = DataGovUk.datasetNameSearchDelegate();
        final DatasetName[] datasetNames = datasetNameDatasetSearchDelegate.allResults(datasetSearchCriteria);

        final DatasetId[] datasetIds = DataGovUk.datasetIdSearchDelegate().allResults(datasetSearchCriteria);

        final ResourceId[] someExcelResources = DataGovUk.resourceSearchDelegate().allResults(resourceFormatSearchCriterion("XLS"));
    }
}

This is in the class Example3 in the package uk.nhs.hdn.ckan.api.

The generic nature of the search delegate lets you create generic search code for different types of data.