Browse and search developer information

Using Predictions

By Connecting for Health | 2012

Prediction Representation

An outcome prediction once created needs to be stored and used to be of utility. The representation of the prediction is usually in a proprietary format and in an implicit context. For example a prediction probability may be stored as a real number in a single column within a database table. The context of the prediction, that it represents say the probability of an emergency admission in the next 12 months and has used the CPM model, is not explicitly defined. It is implicit within the context of the users of this tool. However if you want to share a prediction with other systems and services, which are outside or your end to end solution, then the prediction context needs to be made explicit in the prediction representation.

Unfortunately there are no standards based prediction representations as there is for prediction model representation, PMML. The key benefit of having such a standards based prediction representation would be to allow interoperability between different prediction tools and between prediction tools and downstream business applications.

This article presents a set of candidate requirements which could be used to develop an outcome prediction representation standard within the context of existing health and social care technical standards such as HL7 V3.


The actual value of the prediction.


The type of the prediction value. For example a probability, percentage or risk score.

Value Range

The minimum and maximum values a prediction value can hold. For a probability the range is obviously 0.0 to 1.0. For a percentage the range is obviously 0.0 to 100.0. A risk score may have any arbitrary range, for example 1 to 15.

Is Missing

An indication that the prediction value is missing.

Is Unknown

An indication that the prediction value is unknown.

Creation Date Time

The date and time of when the prediction was created.

Is Relative

An indication that the prediction is relative rather than absolute.

Relative Reference

If the prediction is relative, a reference to a description of what it is relative to. This may have to be just a textual description although a set of basic codes could be devised around both geographical and health/social service boundaries. For example:

  • Geog.Country.England
  • Geog.Country.Scotland
  • NHS.CCG.<CCG id>
  • NHS.GP.<GP practice id>


A reference to a description of the event being predicted. This may have to be just a textual description although a set of basic codes could be devised. If the events relate to for example disease diagnoses or provider activities (such as emergency admission) then existing code sets could be used.

Event Range

If applicable the start and end date and times for the event being predicted. For example if the event was “emergency admission in the next 12 months”, the start date and time would be the Creation Date Time, and the end data and time would be Creation Date Time + 1 year.


A reference to a description of the prediction model used to make the prediction. This may have to be just a textual description. However if a prediction model uses a standards based representation such as PMML, and this is placed in a repository that is freely accessible, then a true reference in the form of a URN or URL could be used.


If the prediction should only be used up to a certain date (use by data) this defines the expiration date time.


A description of whom or what has created the prediction. For a prediction tool this will indicate vendor, product name and product version.


The same prediction outcome maybe recorded as different types. For example as a probability which is then turned into a national risk score and which is also turned into a CCG risk score.

Where the same prediction is represented as different types within the same structure, derivation references (if relevant) the prediction it is derived from.

Where relative and/or risk scores are used it becomes problematic how to compare predictions across domains. Is a relative probability (compared to the mean population probability) of 0.3 for CCG X worse or better than a relative probability (compared to the median population probability) of 0.2 for CCG Y? By being able to represent the same prediction as multiple types, with the base type being an absolute probability, and linking how these values are derived from each other, the prediction can be used for a greater variety of purposes.

Note neither the identity of the individual the prediction applies to (if known) nor any security or IG constraints are included in the candidate requirements as the prediction representation will be embedded within the a wider structure which will define this.

Prediction Patient Record Status

One application of creating predictions is to inform the decisions made by health and social care professionals to initiate interventions with individuals (case finding). As such it is recommended that outcome predictions are regarded with equal status to other information such as medical diagnosis, examination results, and medical opinions etc. which are included in a patient’s medical record. Therefore an outcome prediction (when identifiable with a patient) should be placed in a patient’s medical record.

Where a patient’s medical record is an EMR then this has the additional benefit of being able to use the prediction in a more flexible and powerful way when the EMR is being used for group based analysis. Many EMR’s have the capability to analysis and group patients into categories based on for example diagnosis related groups or treatments.  Medical professionals can then assess the medical information of individual patients within each group to help inform their subsequent management and decide on service delivery priorities and load. By having the predictions as an intrinsic part of the EMR this now also becomes available for assessment irrespective of what grouping criteria are used.


The storage and management of generated predictions although a downstream process is one that all prediction tools require.

As for model building and data marshalling it is recommended that a RDBMS is used for prediction outcome storage and management.

As the prediction outcome RDBMS acts as a distribution hub to potentially many different downstream business applications, it needs to have adequate levels of availability, reliability and performance to satisfy their demands. Therefore server computer(s) are more appropriate than desktop computers. Consideration should be given to security, backup/recovery and RDBMS product features.

Where you already have either a central corporate RDBMS or a central Data Warehouse it is recommended that you use it for model building, data marshalling and/or outcome repository.

Downstream Integration

Downstream business applications use the generated prediction values for different purposes. In Long Term Conditions the two most common applications are Case Finding and Risk Stratification.

Case finding is normally used by primary care providers to identify and track patients who have a predicted high risk for an outcome such as emergency admission in the next 12 months. Primary care providers can either:

  • Use a common centralised shared case finding system
  • Use a dedicated centralised case finding system – one case finding system instance per provider
  • Use a dedicated local case finding system

The centralised shared approach has the technical advantages that the integration between the outcome prediction storage and the business application is simplified as they are co-located and economies of scale can be achieved. However strict access control and security must be implemented so that data is logically partitioned between different providers.

The centralised dedicated approach also has the integration advantages and now physically partitions data between different providers. However economies of scale are reduced as multiple individual instances of a case finding system must now be supported either on multiple physical or virtualised infrastructures. Strict access control and security must still be implemented.

Both centralised approaches are dependent on reliable network connections between provider clients and the centralised applications. As most case finding is not time critical this should not be a major constraint.

The localised approach has the technical advantages that access control and security are implemented locally and therefore can use existing provider security management systems. However outcome prediction data must now be transferred from central storage to local provider systems which in terms of network utilisation and import/export effort can be onerous.

A case finding system should provide the following functionality:

  • Allow definition of multiple groups based on configurable prediction outcome ranges
  • Classify individual patients within a provider population into one of the groups based on their predictive value
  • For each group calculate basic statistics of; number of patients classified within the group, mean predictive value, standard deviation predictive value
  • Textually and graphically display the statistics for each group
  • List all patients within a group
  • Textually and graphically display historical group data – highlighting statistical trends
  • Textually and graphically display historical patient data – highlighting individual climbers and fallers
  • Drill down from any group function to an individual patient – ideally this should allow linkage into the provider’s EMR
  • Textually and graphically display historical data for an individual patient

Where urgent care clinical dashboards are being implemented, it is recommended that you consider linking your case finding system into it.

Technical approaches to risk stratification are similar to case finding. The major difference is that patients are not normally identifiable within risk stratification; it is the statistical properties of the groups or stratum that are of interest.

In both case finding and risk stratification it is recommended that measures to record both the usage and utility of the systems are implemented. Usage is normally relatively easy to measure as it relates to activities such as for example number of user logons per week. Utility is more difficult to measure, but is crucial to evaluating the benefits any system is delivering. ITIL V3 provides a useful definition of utility and distinguishes it from the concept of “warranty”:

Utility – fitness for purpose. Functionality offered by a product or service to meet a particular need. Utility is often summarized as “what it does”.
Warranty – fitness for use. A promise or guarantee that a product or service will meet its agreed requirements. The availability, capacity, continuity and information security necessary to meet the customer’s requirements.

Both soft measures of utility, asking users what they use within a product, and hard measures of utility, analysing product logs to see which product functions and features are actually used, can be used.