Input Data Formats
Table of Contents
Introduction
In this section we describe the various input formats supported by the data
parameter to each of the find
and find_nhsbt
API calls. Currently there are three supported input formats, namely tab
delimited, XML and JSON. The input format used does not need to correspond with the format of the output
data - it may be best to only use one format for consistency though.
Tab Delimited Input
A tab delimited file has four columns separated by tabs (or in fact spaces). This was the first format used to model the input and as such it only supports a subset of the features. In particular, the specification of donor ids are not supported with this format. If donor ids are required to be returned then either JSON or XML must be used.
An example of data modelled using tab delimited format is shown below:
1 2 3 65 1 3 1 65 1 4 2 49 2 1 2 45 2 5 1 45 3 1 1 25 4 2 1 55 4 3 3 55 4 5 4 55 5 2 1 29
We now give a explanation of this format. Consider line 1: here we have that a donor for patient 1 is compatible with patient 2 and the score between this donor and patient 2 is 3. The final column on the line indicates the donor age, so in this case we have that the donor for patient 1 at line 1 has age 65. However, we note that patient 1 must have two donors as the donor represented by line 3 has age 49. This can cause confusion when a patient may have multiple donors of the same age as we need some way of identifying which donor was actually chosen. To handle this situation the JSON and XML formats allow the specification of a donor id and as such both these formats are preferred to the tab delimited.
XML
Using XML as an input format allows meaning to be attached to the individual elements unlike the tabbed format which relies on the reader of any such data to be intimately aware of the format (which, given the confusing nature of the column meanings, it can sometimes be difficult). As such the use of a structured format such as XML is preferred.
A sample input file is shown below using XML:
<?xml version="1.0" ?> <data> <entry donor_id="1"> <sources> <source>1</source> </sources> <dage>65</dage> <matches> <match> <recipient>2</recipient> <score>3</score> </match> <match> <recipient> 3 </recipient> <score>1</score> </match> <match> <recipient>4</recipient> <score>2</score> </match> </matches> </entry> <entry donor_id="2"> <sources> <source>2</source> </sources> <dage>45</dage> <matches> <match> <recipient> 1</recipient> <score>2</score> </match> <match> <recipient> 5 </recipient> <score>1</score> </match> </matches> </entry> <entry donor_id="3"> <sources> <source>3</source> </sources> <dage>25</dage> <matches> <match> <recipient>1</recipient> <score>1</score> </match> </matches> </entry> <entry donor_id="4"> <sources> <source>4</source> </sources> <dage>55</dage> <matches> <match> <recipient> 3 </recipient> <score>2</score> </match> <match> <recipient> 2 </recipient> <score>3</score> </match> <match> <recipient> 5 </recipient> <score>4</score> </match> </matches> </entry> <entry donor_id="5"> <sources> <source>5</source> <source>6</source> </sources> <dage>30</dage> <matches> <match> <recipient>4</recipient> <score>2</score> </match> <match> <recipient> 2 </recipient> <score>1</score> </match> </matches> </entry> <entry donor_id="6"> <dage>29</dage> <altruistic>true</altruistic> <matches> <match> <recipient>7</recipient> <score>10</score> </match> </matches> </entry> <entry donor_id="7"> <sources> <source>7</source> </sources> <dage>29</dage> </entry> </data>
Looking at the data above we see that the <entry>
tag is used to hold information
about the set of patients that a donor (identified by the donor_id
attribute) is compatible with.
We first note that the donor_id
attribute of the <entry>
element is required; this
is in contrast to the tab delimited form where donor ids were not specified. Within the entry tag we see the
following elements:
-
<sources>
contains a set of<source>
elements where each source element holds an id of a patient whose donor is identified by thedonor_id
attribute. Unless the donor is an altruistic donor then the<sources>
element must be supplied; failure to supply at least one source element will result in an error being returned. When the donor is an altruistic donor then supplying a source id is optional. If a source id is supplied, the<sources>
element must consist of a single<source>
, otherwise the<sources>
element must not be present in the<entry>
. For example, in first entry above we can see that donor 1 is the willing donor for patient 1. We note that although unlikely, it's possible that a given donor may be a willing donor for more than one patient on the list, hence why the format allows for multiple<source>
elements within the<sources>
tag. -
<dage>
holds the age of the donor identified by thedonor_id
attribute. <altruistic>
holds the value true if the donor specified bydonor_id
is an altruistic donor. If the element is not present or false then then we obviously have a non-altruistic donor.-
<matches>
contains a set of<match>
elements that show the set of patients that are compatible with the donor identified bydonor_id
. A<match>
element contains two sub-entries namely<recipient>
and<score>
. Here<recipient>
holds the id of a potential recipient of donordonor_id
's kidney and<score>
holds the score of the potential transplant fromdonor_id
to<recipient>
. For example, in the first<entry>
in the sample file above, the donor with id 1 is compatible with patients 2, 3 and 4, with scores 3, 1, 2 respectively. We note that it is possible to omit the<matches>
element from an entry.
JSON
One problem with formatting input using XML is the increased verbosity of modelling the input data - for very large amounts of data the needless opening and closing of tags will have a significant effect on the amount of traffic passed over the network. To save bytes but retain the structure of the data we can use JSON as our input format.
The same data shown in the XML example is presented below using JSON.
{ "data" : { "1" : { "sources" : [1], "dage" : 65, "matches" : [ { "recipient" : 2, "score" : 3}, { "recipient" : 3, "score" : 1}, { "recipient" : 4, "score" : 2} ]}, "2" : { "sources" : [2], "dage" : 45, "matches" : [ { "recipient" : 1, "score" : 2}, { "recipient" : 5, "score" : 1} ]}, "3" : { "sources" : [3], "dage" : 25, "matches" : [ { "recipient" : 1, "score" : 1}, { "recipient" : 5, "score" : 1} ]}, "4" : { "sources" : [4], "dage" : 55, "matches" : [ { "recipient" : 2, "score" : 3}, { "recipient" : 3, "score" : 2}, { "recipient" : 5, "score" : 4} ]}, "5" : { "sources" : [5,6], "dage" : 30, "matches" : [ { "recipient" : 2, "score" : 1}, { "recipient" : 4, "score" : 2} ]}, "6" : { "altruistic": true, "dage" : 29, "matches" : [ { "recipient" : 7, "score" : 10} ]}, "7" : { "sources" : [7], "dage": 29 } } }
The JSON format consists of a key called data
that holds a JSON object whose keys are formed
from the list of donor ids. That is each entry in the JSON data array has the format:
"integer_value_representing_a_donor" : { "sources" : an_array_of_ids_representing_the_patients_ that_the_donor_is_willing_to_donate_for "dage" : age_of_the_donor, "altruistic": true_if_the_donor_is_altruistic, "matches" : an_array_of_objects_where_each_object_ holds_information_about_compatible_patients }
Looking at the example above we have that the donor with id 1 is a donor for patient 1,
his age is 65, and he is compatible with patients 2, 3 and 4 with scores 3, 1 and 2 respectively. As with the
XML Input Format the sources
key can hold more than one source and so is represented
as an array - donor 5 shows a donor with multiple sources. Each non-altruistic donor must have at least one source,
and every donor must supply a donor age in the dage
field. Altruistic donors are identified by adding the
altruistic
key with the value true
to a donor's JSON object. Donor 6 in the example above
shows a typical entry for an altruistic donor - notice that an altruistic donor need not supply any sources (however you
any source id supplied will be returned consistently in the output). To show the set of patients that are
compatible with a donor we use the matches
element. The matches
element holds an array
of JSON objects where each object in this array represents a compatible patient, whose id is stored in the recipient
field, and the score between this donor and patient (stored in the score
field). We can see this in the
example above by noting that donor 4 is compatible with three patients: patient 2 with score 3, patient 3 with
score 2, and patient 5 with score 4. Finally donor 7 shows a rare case where a donor need not have any matches
key.
Data Validations
Below we outline the validations that are performed on the input data supplied to the service. These validations are undertaken for both the JSON and XML input formats above, if for any reason the data validation fails an error response will be returned. Any elements that are specific to one data format will be indicated in the text.
- Checks that the syntax of any XML and JSON file is valid.
- Checks that the input data is contained within a
data
key/tag. - For JSON format ensure that the value stored in
data
is an object and that the object's key, representing the donor id, can be converted to an integer. - Checks that each
<entry>
in XML has adonor_id
attribute whose value is an integer - Checks that the
sources
key/tag is supplied and that it contains, in the case of JSON, an array of integer ids, and for XML a set of source elements whose values are integer ids. - Checks that the
dage
key/tag is supplied and contains an integer whose value is greater than zero but smaller than 150 - Checks that the
altruistic
value is a boolean - Checks that a
matches
key/tag is present and consists of a single entry if the donor is altruistic - Checks that if a
matches
key/tag is supplied then it is non-empty - Checks that a
match
object in XML, and amatches
object in JSON consists of arecipient
andscore
- Check that any
recipient
keys/tags values are integers - Check that any
score
keys/tags are supplied and contains a double/float