We rely on graphs as a fundamental approach to structuring and analyzing clinical data. When combined with modern machine learning techniques to visualize complex relationships in clinical datasets, graphs can be an effective solution for accelerated insights generation and real-world analytics.
Topology-based clinical data mining (TCDM) is an application of data mining techniques used to reveal hidden patterns in clinical datasets without first having to make assumptions and develop a hypothesis.
By discovering subgroups of related patients, this geometric data-driven approach provides robust solutions to the most challenging issues the industry is facing – from accelerated insights generation and real-world analytics to tailoring patient treatment as part of precision medicine.
A group of related mathematical methods that are collectively known as topological data analysis (TDA) have recently been applied in different branches of bioinformatics, epidemiology, neuroscience, and oncology with promising results. TDA is a rapidly expanding field that is being actively developed by leading academic centers, such as Stanford, Duke, UPenn, Princeton, among many others.
TDA methods are based on the underlying idea of using topology – the mathematical study of qualitative properties of space and spatial relations – to detect and display hidden robust relationships in complex datasets.
By discovering hidden subgroups of subjects in the clinical dataset, our HIPAA-compliant Data Science Web-based application provides robust solutions for enhanced exploratory analysis, new hypothesis generation, risk-based monitoring, and many other challenges faced by the drug development industry.
Intego data scientists employ modern machine learning algorithms, advanced statistical methods, and data visualization techniques to solve the most challenging issues faced by the pharmaceutical industry. Our expertise ranges from accelerated insights generation and real-world analytics to building the models for tailoring the treatment of patients as part of precision medicine.
It is the combination of expertise in the pharmaceutical industry, coupled with experience in knowledge extraction methodologies that separates Intego data scientists from other biometric CROs. We help our customers find data-related solutions to problems at each stage of drug development.
Intego data scientists have earned a strong reputation from the solutions they have provided for real-world analytics, exploratory analysis, data visualization, data mining, biomarker discovery, risk-based monitoring, and many other challenges. They employ a wide range of technologies, including Python, R, SAS, Spotfire, R-shiny, Matlab, and C++.
Real-world data can be essential for our understanding of clinical data, especially with the emergence of phenomena such as the COVID-19 outbreak. Leveraging topology and machine learning for time series, we analyzed how the spread of the pandemic advanced across the United States.
We identified the underlying geometry of the datasets and discovered a set of unrelated features that could possibly be causing the similarity in the spread of COVID-19 across 3000+ counties.
The expected outcome of the exploratory analysis is the identification of a sub-population of patients most responsive to treatment under the study. We applied machine-learning algorithms to a clinical study with 1,041 participants from the Childhood Asthma Management Program.
Automatic detection of sub-populations using a graph-based community search revealed several patient communities sharing similarities in terms of the pre-defined outcomes.
Missing data is a major problem in clinical research. Topological Data Analysis (TDA) zooms in on robust patterns in the data and it is not affected by random noise. Therefore, topological data maps do not experience significant transformation if some data are missing at random.
We successfully applied TDA to a clinical dataset with 302 patients (~20% with missing values) to solve several problems, including the automatic selection of a suitable imputation method, validation of imputation methods, etc.
For illustrative purposes, a simple two-dimensional dataset was constructed where data points were arranged in a “zero-like” shape. The proprietary algorithm was applied to build a graph in which every node corresponds to a single data point.
In order to show the robustness of the topological approach, some data points from the original dataset were intentionally omitted at random, and additional graphs were built after. As a result, the graphs showed geometrical stability and kept forming “zero-like” shapes even when up to 90% of data was missing.
To effectively manage risk-based monitoring, we need a strategic approach to allocate resources across clinical study based on several key indicators, such as data criticality, patient safety, protocol compliance, and others. We applied topology-based data mining to a multicenter, double-blinded, placebo-controlled clinical study managed across 11 sites.
As a result, two sites were brought to our attention with the recommendation for further investigation for possible violation of a procedure for participants selection or violation of data collection protocol.
Atlas TDA was applied to the dataset from a clinical study that was conducted with the support of the NIDA Clinical Trials Network. A topological data map was generated from the dataset for 115 patients with 27 outcomes and 44 predictors.
Visual inspection of the graph identified three subgroups of patients represented as “communities of nodes” and suggested that there were robust patterns within the data. Further statistical analysis of predictors helped identify the reasons for these patients being combined into these three communities.