Input Data Formats

Introduction
Tab Delimited Input
XML Input
JSON Input
Data Validations

Introduction

In this section we describe the various input formats supported by the data parameter to each of the find and find_nhsbt API calls. Currently there are three supported input formats, namely tab delimited, XML and JSON. The input format used does not need to correspond with the format of the output data - it may be best to only use one format for consistency though.

Tab Delimited Input

A tab delimited file has four columns separated by tabs (or in fact spaces). This was the first format used to model the input and as such it only supports a subset of the features. In particular, the specification of donor ids are not supported with this format. If donor ids are required to be returned then either JSON or XML must be used.

An example of data modelled using tab delimited format is shown below:

We now give a explanation of this format. Consider line 1: here we have that a donor for patient 1 is compatible with patient 2 and the score between this donor and patient 2 is 3. The final column on the line indicates the donor age, so in this case we have that the donor for patient 1 at line 1 has age 65. However, we note that patient 1 must have two donors as the donor represented by line 3 has age 49. This can cause confusion when a patient may have multiple donors of the same age as we need some way of identifying which donor was actually chosen. To handle this situation the JSON and XML formats allow the specification of a donor id and as such both these formats are preferred to the tab delimited.

XML

Using XML as an input format allows meaning to be attached to the individual elements unlike the tabbed format which relies on the reader of any such data to be intimately aware of the format (which, given the confusing nature of the column meanings, it can sometimes be difficult). As such the use of a structured format such as XML is preferred.

A sample input file is shown below using XML:

<?xml version="1.0" ?>
<data>
	<entry donor_id="1">
		<sources>
      <source>1</source>
    </sources>
    <dage>65</dage>
    <matches>
      <match>
        <recipient>2</recipient>
        <score>3</score>
      </match>
      <match>
        <recipient> 3 </recipient>
        <score>1</score>
      </match>
      <match>
        <recipient>4</recipient>
        <score>2</score>
      </match>
    </matches>
  </entry>

  <entry donor_id="2">
    <sources>
      <source>2</source>
    </sources>
    <dage>45</dage>
    <matches>
      <match>
        <recipient> 1</recipient>
        <score>2</score>
      </match>
      <match>
        <recipient> 5 </recipient>
        <score>1</score>
      </match>
    </matches>
  </entry>

  <entry donor_id="3">
    <sources>
      <source>3</source>
    </sources>
    <dage>25</dage>
    <matches>
      <match>
        <recipient>1</recipient>
        <score>1</score>
      </match>
    </matches>
  </entry>

  <entry donor_id="4">
    <sources>
      <source>4</source>
    </sources>
    <dage>55</dage>
    <matches>
      <match>
        <recipient> 3 </recipient>
        <score>2</score>
      </match>
      <match>
        <recipient> 2 </recipient>
        <score>3</score>
      </match>
      <match>
        <recipient> 5 </recipient>
        <score>4</score>
      </match>
    </matches>
  </entry>

  <entry donor_id="5">
    <sources>
      <source>5</source>
      <source>6</source>
    </sources>
    <dage>30</dage>
    <matches>
      <match>
        <recipient>4</recipient>
        <score>2</score>
      </match>
      <match>
        <recipient> 2 </recipient>
        <score>1</score>
      </match>
    </matches>
  </entry>

  <entry donor_id="6">
    <dage>29</dage>
    <altruistic>true</altruistic>
    <matches>
      <match>
        <recipient>7</recipient>
        <score>10</score>
      </match>
    </matches>
  </entry>

  <entry donor_id="7">
    <sources>
      <source>7</source>
    </sources>
    <dage>29</dage>
  </entry>

</data>

Looking at the data above we see that the <entry> tag is used to hold information about the set of patients that a donor (identified by the donor_id attribute) is compatible with. We first note that the donor_id attribute of the <entry> element is required; this is in contrast to the tab delimited form where donor ids were not specified. Within the entry tag we see the following elements:

<sources> contains a set of <source> elements where each source element holds an id of a patient whose donor is identified by the donor_id attribute. Unless the donor is an altruistic donor then the <sources> element must be supplied; failure to supply at least one source element will result in an error being returned. When the donor is an altruistic donor then supplying a source id is optional. If a source id is supplied, the <sources> element must consist of a single <source>, otherwise the <sources> element must not be present in the <entry>. For example, in first entry above we can see that donor 1 is the willing donor for patient 1. We note that although unlikely, it's possible that a given donor may be a willing donor for more than one patient on the list, hence why the format allows for multiple <source> elements within the <sources> tag.
<dage> holds the age of the donor identified by the donor_id attribute.
<altruistic> holds the value true if the donor specified by donor_id is an altruistic donor. If the element is not present or false then then we obviously have a non-altruistic donor.
<matches> contains a set of <match> elements that show the set of patients that are compatible with the donor identified by donor_id. A <match> element contains two sub-entries namely <recipient> and <score>. Here <recipient> holds the id of a potential recipient of donor donor_id's kidney and <score> holds the score of the potential transplant from donor_id to <recipient>. For example, in the first <entry> in the sample file above, the donor with id 1 is compatible with patients 2, 3 and 4, with scores 3, 1, 2 respectively. We note that it is possible to omit the <matches> element from an entry.

JSON

One problem with formatting input using XML is the increased verbosity of modelling the input data - for very large amounts of data the needless opening and closing of tags will have a significant effect on the amount of traffic passed over the network. To save bytes but retain the structure of the data we can use JSON as our input format.

The same data shown in the XML example is presented below using JSON.

{ "data" :
	{
    "1" : { "sources" : [1],
            "dage" : 65,
            "matches" : [ { "recipient" : 2, "score" : 3},
                          { "recipient" : 3, "score" : 1},
                          { "recipient" : 4, "score" : 2} ]},
    "2" : { "sources" : [2],
            "dage" : 45,
            "matches" : [ { "recipient" : 1, "score" : 2},
                          { "recipient" : 5, "score" : 1} ]},

    "3" : { "sources" : [3],
            "dage" : 25,
            "matches" : [ { "recipient" : 1, "score" : 1},
                          { "recipient" : 5, "score" : 1} ]},

    "4" : { "sources" : [4],
            "dage" : 55,
            "matches" : [ { "recipient" : 2, "score" : 3},
                          { "recipient" : 3, "score" : 2},
                          { "recipient" : 5, "score" : 4} ]},

    "5" : { "sources" : [5,6],
            "dage" : 30,
            "matches" : [ { "recipient" : 2, "score" :	1},
                          { "recipient" : 4, "score" : 2} ]},
    
    "6" : { "altruistic": true,
            "dage" : 29,
            "matches" : [ { "recipient" : 7, "score" :	10} ]},

    "7" : { "sources" : [7],
            "dage": 29 }
	}
}

The JSON format consists of a key called data that holds a JSON object whose keys are formed from the list of donor ids. That is each entry in the JSON data array has the format:

      "integer_value_representing_a_donor" :
            { "sources" : an_array_of_ids_representing_the_patients_
                          that_the_donor_is_willing_to_donate_for

              "dage" : age_of_the_donor,

              "altruistic": true_if_the_donor_is_altruistic,

              "matches" : an_array_of_objects_where_each_object_
                          holds_information_about_compatible_patients }

Looking at the example above we have that the donor with id 1 is a donor for patient 1, his age is 65, and he is compatible with patients 2, 3 and 4 with scores 3, 1 and 2 respectively. As with the XML Input Format the sources key can hold more than one source and so is represented as an array - donor 5 shows a donor with multiple sources. Each non-altruistic donor must have at least one source, and every donor must supply a donor age in the dage field. Altruistic donors are identified by adding the altruistic key with the value true to a donor's JSON object. Donor 6 in the example above shows a typical entry for an altruistic donor - notice that an altruistic donor need not supply any sources (however you any source id supplied will be returned consistently in the output). To show the set of patients that are compatible with a donor we use the matches element. The matches element holds an array of JSON objects where each object in this array represents a compatible patient, whose id is stored in the recipient field, and the score between this donor and patient (stored in the score field). We can see this in the example above by noting that donor 4 is compatible with three patients: patient 2 with score 3, patient 3 with score 2, and patient 5 with score 4. Finally donor 7 shows a rare case where a donor need not have any matches key.

Data Validations

Below we outline the validations that are performed on the input data supplied to the service. These validations are undertaken for both the JSON and XML input formats above, if for any reason the data validation fails an error response will be returned. Any elements that are specific to one data format will be indicated in the text.

Checks that the syntax of any XML and JSON file is valid.
Checks that the input data is contained within a data key/tag.
For JSON format ensure that the value stored in data is an object and that the object's key, representing the donor id, can be converted to an integer.
Checks that each <entry> in XML has a donor_id attribute whose value is an integer
Checks that the sources key/tag is supplied and that it contains, in the case of JSON, an array of integer ids, and for XML a set of source elements whose values are integer ids.
Checks that the dage key/tag is supplied and contains an integer whose value is greater than zero but smaller than 150
Checks that the altruistic value is a boolean
Checks that a matches key/tag is present and consists of a single entry if the donor is altruistic
Checks that if a matches key/tag is supplied then it is non-empty
Checks that a match object in XML, and a matches object in JSON consists of a recipient and score
Check that any recipient keys/tags values are integers
Check that any score keys/tags are supplied and contains a double/float

Input Data Formats

Table of Contents

Introduction

Tab Delimited Input

XML

JSON

Data Validations