Skip to Content

Resource • Article

Can Big Data Rescue Clinical Trials?

If you had to pick one buzzword that perfectly encapsulated the current latest and greatest in innovation, you could do a lot worse than “big data”. The last few years have seen big data and data mining introduced into nearly every field imaginable, often with disruptive results. Now, that same set of tools is being leveraged to improve drug development, with huge potential implications in identifying and correcting issues that arise during clinical trials. While there are no set standards or best practices around marshalling big data to identify and correct failing trials, we wanted to throw out a few ideas and suggestions that could go a long way towards making clinical trial succeed.

We see big data making the biggest impact in the areas of patient recruitment, process monitoring, and safety and data handling. In this post, we will take a quick overview of what big data is, how we look at and define program failures and faltering trials, and do a quick fly-by of the topics we will be covering in the future. Over future blog posts, we will be exploring each of these in depth to identify where and how the promises of big data can best increase the efficiency and success of your clinical trials.

What is Big Data?

Before we consider what big data can do for the pharmaceutical industry, it’s important to understand what big data is and is not. Big data has volume and does not sample – it takes all observations and tracks every recorded data point with no bias. Big data has velocity – it is available quickly, often in real time. It has variety – the data sources are varied and include traditional data sets (numerical records on set scales) and non-traditional ones (text, images, unordered numerical sets). It utilizes machine learning – patterns and insights are discovered organically rather than through formal and structured queries. Finally, it is a “digital footprint” – the data is often a byproduct of standard activities and interactions, and not just collected as a primary activity.

It’s obvious that clinical trials are a prime candidate for big data practices and procedures. Clinical trials must save and collect all data generated without relying on sampling, so it has inherent volume. Thanks to the use of EDCs, eCTMSs and other data collection systems, the data is constant and available almost as soon as it’s collected. The variety of data collected, certainly across multiple trials but even within a single protocol, is staggering, lending it a great amount of variety. Finally, these studies make huge digital health footprints – a significant portion of patient data is not directly tied to a specific protocol, but exists as meta data ranging from information about site activities to patient vitals to secondary and tertiary data points about patient health and habits. All that’s missing is a machine learning approach towards making sense of it all.

What is a Faltering Trial?

Now that we’ve identified “big data” and how it relates to the clinical trial industry, it’s important to talk about what, exactly, a failing trial is. Doing even cursory research shows wide discrepancies in what is and is not considered trial failure. A simple google search can find statistics showing that anywhere from 20% to 50% of clinical trials fail, each correct within their own explicit definition of “failed trial” and with few actionable distinctions between them.

First, it’s important to note that a failed trial and a failed program are two completely different things. A failed program is the result of poor science, a mistake in the R&D step of development, sometimes financial reasons. A failed program fails because the therapeutic being developed either does not work (or does not work well enough), because the therapeutic fails safety requirements, or because it’s simply not commercially viable. A failed program will always, eventually, result in failed trials.

A failed trial, when it’s not the result of a failed or failing program, is always the result of a mistake in planning or project management. A failed trials, as we define it, is a discrete trial that does not produce usable results because of early cancellation or data quality issues. A failed trial does not always result in a failed program, though enough of them in a row can financially sink smaller organizations.

A failing trial, which is our focus here, is one that has missed key milestones or overshot key thresholds of time (missing enrollment deadlines, not locking the database on schedule, etc.), cost (significant budget overruns), or risk/quality (the data being collected is questionable). In our own independent research, we estimate that roughly a third of all currently active trials are currently failing to one extent or another, though there is some variance across venues.

Data-Driven Recruitment and Site Selection

One of the most immediate needs for big data solutions is in patient recruiting, selection, and inclusion/exclusion. Failure to meet recruiting deadlines is one of the most common reasons trials fall behind, and can lead to costly and time-consuming fixes. How can we apply the principles and techniques of big data towards optimizing recruiting strategies? Moreover, how can we use big data ideas to ensure that underperforming recruiting doesn’t tank your clinical trial?

Dynamic Process Monitoring and Improvement

Processes are only as good as the assumptions they are based on, and failure of process is usually only discovered when the trial starts failing. A big data approach to clinical trial processes and infrastructure monitoring reads in data from a variety of project sources and can find red flags long before a conventional analysis would, and perhaps before a conventional analysis is even possible. It can also draw attention to things that are working better than they should be, allowing proactive clinical operations personnel to improve efficiency across the board.

Safety and Data

Risk-based monitoring might be all the rage in the clinical trial data and safety space, but it’s hardly the be all and end all. Identifying potential pitfalls ahead of time and setting up monitoring thresholds and contingency plans for them is great, but what if the trouble comes from an unexpected angle? A big data approach allows for a more dynamic and proactive way to observe safety and data quality in an ongoing trial, and can find big problems that your CRO didn’t even consider.


A failing trial is not the same as a failed trial. If caught in time, failing trials can be rescued, and some CROs (Biorasi included) make an explicit point of specializing in rescuing trials that are missing milestones. However, recovering from failing is still an expensive proposition. A failing trial can also be so far gone that rescue is not even an option. Conventional tools are getting better at planning and executing studies, but a 33% failing rate means that they are simply not enough. Big data, or at least big data approaches, will have a “big” role to play decreasing that number substantially.

At Biorasi, we firmly believe that big data CAN prevent clinical trials from failing, and it can do so earlier and cheaper than a traditional intervention. Join us for the next installments in this series as we take a deep dive into how you can begin implementing these concepts immediately, as well as how you can prepare for a future where data science isn’t just a division inside a CRO – it’s the principal driver of trial success. And if you have any comments you’d like to share, or questions you’d like us to answer in the next few posts, please let us know!