Cross-sectional Functional Stratification Based on Psychometric Profiling and Machine Learning in Patients With Substance Use Disorders (SUD)
SISAP-TUS
Unsupervised Deep Representation Learning for Clinical Stratification in Substance Use Disorders
1 other identifier
observational
155
1 country
1
Brief Summary
Substance use disorders (SUDs) show considerable clinical heterogeneity that limits the usefulness of traditional categorical diagnoses. This observational, cross-sectional study aims to apply an unsupervised deep learning method - an autoencoder - to learn continuous latent representations from standardised psychometric data and to explore whether those representations can help stratify clinical subpopulations. The investigators will recruit 155 adults undergoing residential treatment for SUD. Participants will complete six validated instruments assessing impulsivity (BIS-11), anger regulation (STAXI-2), behavioural activation/avoidance (BADS), borderline symptomatology (BSL-23), generalised anxiety (GAD-7), and environmental reward (EROS). Demographic and clinical variables (age, sex, primary substance, years of use, prior treatments) will also be recorded. After data cleaning and standardisation (z-scores), a symmetric autoencoder with a 12-dimensional bottleneck (architecture 21-32-24-12-24-32-21) will be trained using mean squared error loss. Regularisation includes L2 weight decay and dropout. The model will be trained 30 times with different random seeds to assess stability; the five best models (by validation pseudo-R²) will be combined into a weighted ensemble. Five-fold cross-validation will evaluate generalisation. For comparison, principal component analysis (PCA) will be applied to the same data. Gaussian mixture models (GMM) will be fitted on the latent space to explore potential clinical subgroups. The primary outcome is the stability of the latent representation (coefficient of variation of validation MSE across runs). Secondary outcomes include reconstruction performance (pseudo-R²) of the ensemble, comparison with PCA, and the interpretability of latent dimensions via correlations with original variables. GMM results will be described using BIC, silhouette width, bootstrap stability, and clinical characterisation of clusters. This study does not involve any intervention. Results will be hypothesis-generating and require external validation. No automated clinical decisions will be made.
Trial Health
Trial Health Score
Automated assessment based on enrollment pace, timeline, and geographic reach
participants targeted
Target at P50-P75 for all trials
Started Mar 2024
Typical duration for all trials
1 active site
Health score is calculated from publicly available data and should be used for screening purposes only.
Trial Relationships
Click on a node to explore related trials.
Study Timeline
Key milestones and dates
Study Start
First participant enrolled
March 25, 2024
CompletedPrimary Completion
Last participant's last visit for primary outcome
February 18, 2026
CompletedStudy Completion
Last participant's last visit for all outcomes
April 22, 2026
CompletedFirst Submitted
Initial submission to the registry
May 9, 2026
CompletedFirst Posted
Study publicly available on registry
May 15, 2026
CompletedMay 18, 2026
May 1, 2026
1.9 years
May 9, 2026
May 14, 2026
Conditions
Keywords
Outcome Measures
Primary Outcomes (1)
Latent dimension scores
Twelve continuous latent dimensions derived from the bottleneck layer of a symmetric autoencoder trained on 21 standardized clinical variables. Each dimension represents a compressed, nonlinear combination of the original psychometric indicators (impulsivity, emotion regulation, behavioral activation, borderline symptoms, anxiety, and environmental reward). The dimensions are extracted for each participant after averaging the predictions of an ensemble of the five best autoencoder runs. Unit of Measure: Standardized z-score (mean = 0, SD = 1 in the training sample)
Baseline (single assessment, cross-sectional)
Secondary Outcomes (6)
Gaussian mixture model cluster membership
Baseline
Autoencoder reconstruction pseudo-R²
Baseline (computed on the validation split and on the full sample after training)
Autoencoder reconstruction mean squared error
Baseline
Coefficient of variation of reconstruction MSE
Baseline (after all runs are completed)
Cross-validated reconstruction R²
Baseline
- +1 more secondary outcomes
Study Arms (1)
Total sample (residential treatment)
Adult patients (N=155) with DSM-5 TR substance use disorder receiving residential treatment. All participants completed six psychometric scales (BIS-11, STAXI-2, BADS, BSL-23, GAD-7, EROS) and provided demographic/clinical data in a single cross-sectional session. No intervention was administered.
Interventions
This is a purely observational study. No drug, device, behavioral therapy, or other intervention was assigned. The study only involved standardized psychometric measurements.
Eligibility Criteria
Adult patients (≥18 years) with a diagnosis of Substance Use Disorder (SUD) admitted to a residential detoxification and rehabilitation center. Consecutive recruitment between February 2024 and March 2026. Estimated final sample size is 155 participants. No healthy volunteers are included.
You may qualify if:
- DSM-5 diagnosis of Substance Use Disorder (SUD), confirmed by a psychiatrist or clinical psychologist.
- Age ≥ 18 years.
- Currently admitted to a residential addiction treatment center at the time of assessment.
- Ability to complete the psychometric questionnaires independently.
- Written informed consent.
You may not qualify if:
- Active psychotic disorder (e.g., schizophrenia, delusional disorder) not stabilized pharmacologically.
- Severe cognitive impairment (dementia, severe brain injury) that prevents understanding the questionnaire items.
- Language barriers or illiteracy that prevent self-administration of the scales.
- Scheduled discharge from the center within 7 days of the assessment date.
Contact the study team to confirm eligibility.
Sponsors & Collaborators
Study Sites (1)
Under The Tree
Ajijic, Jalisco, 45920, Mexico
MeSH Terms
Conditions
Condition Hierarchy (Ancestors)
Study Officials
- PRINCIPAL INVESTIGATOR
Lauro Gutiérrez Castro
Under The Tree
Study Design
- Study Type
- observational
- Observational Model
- CASE ONLY
- Time Perspective
- CROSS SECTIONAL
- Target Duration
- 1 Day
- Sponsor Type
- OTHER
- Responsible Party
- SPONSOR INVESTIGATOR
- PI Title
- Principal Investigator
Study Record Dates
First Submitted
May 9, 2026
First Posted
May 15, 2026
Study Start
March 25, 2024
Primary Completion
February 18, 2026
Study Completion
April 22, 2026
Last Updated
May 18, 2026
Record last verified: 2026-05
Data Sharing
- IPD Sharing
- Will share
- Shared Documents
- STUDY PROTOCOL, SAP, ICF, CSR
- Time Frame
- Beginning 9 months and ending 36 months after article publication
- Access Criteria
- Data will be available to researchers who provide a methodologically sound proposal for purposes of replicating the results or conducting secondary analyses. Proposals should be directed to the corresponding author. Requestors will need to sign a data access agreement.
Individual participant data (IPD) that underlie the results reported in the manuscript will be shared after de-identification (anonymization). The data will include the 21 standardized clinical variables and the 12-dimensional latent representations for all 155 participants. Study protocol, statistical analysis plan, and R code will also be made available.