Hero background decoration

Data Science for pharma R&D

Accelerated insights generation and real-world analytics
From accelerated insights generation and real-world analytics to tailor the treatment of patients as part of precision medicine.
We have a unique academic collaboration allowing us to build the next generation of high quality and reliable analytical platforms.
We apply our 10 years experience with clinical statistics and programming to finding the best clinical data solutions for clients.

We rely on graphs as a fundamental approach to structuring and analyzing clinical data. When combined with modern machine learning techniques to visualize complex relationships in clinical datasets, graphs can be an effective solution for accelerated insights generation and real-world analytics.

Topology-based Data Mining

Topology-based clinical data mining (TCDM) is an application of data mining techniques used to reveal hidden patterns in clinical datasets without first having to make assumptions and develop a hypothesis.

By discovering subgroups of related patients, this geometric data-driven approach provides robust solutions to the most challenging issues the industry is facing – from accelerated insights generation and real-world analytics to tailoring patient treatment as part of precision medicine.


A group of related mathematical methods that are collectively known as topological data analysis (TDA) have recently been applied in different branches of bioinformatics, epidemiology, neuroscience, and oncology with promising results. TDA is a rapidly expanding field that is being actively developed by leading academic centers, such as Stanford, Duke, UPenn, Princeton, among many others.

TDA methods are based on the underlying idea of using topology – the mathematical study of qualitative properties of space and spatial relations – to detect and display hidden robust relationships in complex datasets.

Clinical Data Science Platform

By discovering hidden subgroups of subjects in the clinical dataset, our HIPAA-compliant Data Science Web-based application provides robust solutions for enhanced exploratory analysis, new hypothesis generation, risk-based monitoring, and many other challenges faced by the drug development industry.

Our Team of Data Scientists

Intego data scientists employ modern machine learning algorithms, advanced statistical methods, and data visualization techniques to solve the most challenging issues faced by the pharmaceutical industry. Our expertise ranges from accelerated insights generation and real-world analytics to building the models for tailoring the treatment of patients as part of precision medicine.

It is the combination of expertise in the pharmaceutical industry, coupled with experience in knowledge extraction methodologies that separates Intego data scientists from other biometric CROs. We help our customers find data-related solutions to problems at each stage of drug development.

Intego data scientists have earned a strong reputation from the solutions they have provided for real-world analytics, exploratory analysis, data visualization, data mining, biomarker discovery, risk-based monitoring, and many other challenges. They employ a wide range of technologies, including Python, R, SAS, Spotfire, R-shiny, Matlab, and C++.

Case study

Winning the PHUSE/FDA Data Science Innovation Challenge

We’re excited to share that Intego Clinical’s team of Data Scientists and their research in the areas of modern machine learning algorithms and data visualization techniques brought home the PHUSE/FDA Innovation Challenge award 2022!

The proposed research leverages a novel approach involving the pure geometric properties within datasets to reveal hidden patterns: identifying signals and similarities in multiple datasets using graph-based machine learning algorithms.

Case study

Understanding the spread of COVID-19 in the US

Real-world data can be essential for our understanding of clinical data, especially with the emergence of phenomena such as the COVID-19 outbreak. Leveraging topology and machine learning for time series, we analyzed how the spread of the pandemic advanced across the United States.

We identified the underlying geometry of the datasets and discovered a set of unrelated features that could possibly be causing the similarity in the spread of COVID-19 across 3000+ counties.

Case study

Community search using graph-based machine learning

The expected outcome of the exploratory analysis is the identification of a sub-population of patients most responsive to treatment under the study. We applied machine-learning algorithms to a clinical study with 1,041 participants from the Childhood Asthma Management Program.

Automatic detection of sub-populations using a graph-based community search revealed several patient communities sharing similarities in terms of the pre-defined outcomes.

Case study

Handling missing data using Topological Data Analysis

Missing data is a major problem in clinical research. Topological Data Analysis (TDA) zooms in on robust patterns in the data and it is not affected by random noise. Therefore, topological data maps do not experience significant transformation if some data are missing at random.

We successfully applied TDA to a clinical dataset with 302 patients (~20% with missing values) to solve several problems, including the automatic selection of a suitable imputation method, validation of imputation methods, etc.

Case study

Cluster analysis vs. graph-based machine learning

Clustering methods divide data into smaller subgroups and subsequently unveil any patterns hidden in the data. Graph-based machine learning help researchers visualize hidden patterns by reducing the number of dimensions required to describe the original dataset and revealing robust geometric structures within the data.

We ran algorithms of both categories to identify the pros and cons of each category while discovering patterns in the clinical study dataset (839 subjects), comparing results, and evaluating how two categories of algorithms can complement each other.

Case study

Risk-based monitoring of clinical study sites

To effectively manage risk-based monitoring, we need a strategic approach to allocate resources across clinical study based on several key indicators, such as data criticality, patient safety, protocol compliance, and others. We applied topology-based data mining to a multicenter, double-blinded, placebo-controlled clinical study managed across 11 sites.

As a result, two sites were brought to our attention with the recommendation for further investigation for possible violation of a procedure for participants selection or violation of data collection protocol.

Case study

Exploratory analysis of clinical study dataset

Atlas TDA was applied to the dataset from a clinical study that was conducted with the support of the NIDA Clinical Trials Network. A topological data map was generated from the dataset for 115 patients with 27 outcomes and 44 predictors.

Visual inspection of the graph identified three subgroups of patients represented as “communities of nodes” and suggested that there were robust patterns within the data. Further statistical analysis of predictors helped identify the reasons for these patients being combined into these three communities.

1 / 7
Ready to see what we can do for you?
Contact Us