Real-World Data. Real-World Depth.
Explore the scale of our ambulatory dataset across key therapeutic areas. From top diagnosis codes to longitudinal patient journeys, get the evidence you need to drive discovery.
Primary Care & Family Medicine
Patient Reach: 12M+ Longitudinal Records
Top Diagnosis Codes:
I10 – Essential (primary) hypertension
E11.9 – Type 2 diabetes mellitus without complications
E78.5 – Hyperlipidemia, unspecified
Data Highlight: Decades of wellness visit trends and chronic disease management tracking.
Cardiology
Patient Reach: 1.6M+ Specialty Encounters
Top Diagnosis Codes:
I48.0 – Paroxysmal atrial fibrillation
I25.10 – ASHD of native coronary artery
I50.9 – Heart failure, unspecified
Data Highlight: Rich integration of ECG results, vitals history, and cardiac-specific medication adherence.
OBGYN & Women’s Health
Patient Reach: 4.3M+ Active Patients
Top Diagnosis Codes:
Z34.90 – Supervision of normal pregnancy, unspecified trimester
N80.0 – Endometriosis of uterus
N95.1 – Menopausal and female climacteric states
Data Highlight: Comprehensive maternal health tracking and longitudinal reproductive history.
Pediatrics
Patient Reach: 1.6M+ Pediatric Patients
Top Diagnosis Codes:
J06.9 – Acute upper respiratory infection, unspecified
J30.9 – Allergic rhinitis, unspecified
J45.909 – Unspecified asthma, uncomplicated
Data Highlight: Growth chart tracking, immunization records, and pediatric-specific dosing trends.
Gastroenterology
Patient Reach: 600k+ Specialized Records
Top Diagnosis Codes:
K21.9 – GERD without esophagitis
K58.0 – IBS with diarrhea
K50.90 – Crohn's disease, unspecified, without complications
Data Highlight: Rich integration of ECG results, vitals history, and cardiac-specific medication adherence.
Oncology
Patient Reach: 83k+ Active Patients
Top Diagnosis Codes:
C50 – Malignant neoplasm of breast
C61 – Malignant neoplasm of prostate
C73 – Malignant neoplasm of thyroid gland
Data Highlight: Comprehensive oncology treatment records, staging insights, biomarker data, and therapy-specific medication adherence trends.
Beyond the Diagnosis
Clinical Notes
De-identified SOAP notes for deep-context NLP.
Lab Results
Standardized LOINC codes and longitudinal lab values.
Medications
Prescribing patterns, fill data, and therapeutic switches.
Vitals
MI, Blood Pressure, and Heart Rate trends over time.

Where Does Our Data Come From?
Frequently Asked Questions
How is Sidus Insights data standardized for research use?
All data ingested into the Sidus platform is harmonized to a single proprietary curation standard regardless of its originating EHR or RCM system. Lab results are standardized to LOINC codes, diagnoses to ICD-10, and medications to RxNorm, ensuring interoperability with analytical tools and consistency across specialties. This pre-standardization allows researchers to begin analysis immediately without custom data cleaning pipelines.
Can Sidus Insights data be used for natural language processing (NLP) research?
Yes. Sidus Insights provides access to de-identified SOAP notes and unstructured clinical data that support NLP research, including clinical phenotyping, symptom extraction, and treatment pathway analysis. With approximately 165 million unstructured files, Sidus offers one of the largest ambulatory clinical text datasets in the U.S.
What is longitudinal patient data and why does it matter for research?
Longitudinal patient data tracks a patient's healthcare journey over time across multiple visits and care settings. It helps researchers study disease progression, treatment patterns, and outcomes, supporting more robust clinical and outcomes research.
How can a researcher access Sidus Insights data for a specific therapeutic area?
Researchers can contact Sidus Insights to discuss their data requirements. Sidus provides customized datasets and cohorts tailored to specific therapeutic areas, study criteria, and research objectives.
How many longitudinal patient records does Sidus Insights hold for Primary Care?
Sidus Insights holds more than 12 million longitudinal records in its Primary Care and Family Medicine dataset. This includes decades of wellness visit trends, chronic disease management data, and multi-year follow-up records covering conditions such as hypertension (ICD-10: I10), Type 2 diabetes mellitus (E11.9), and hyperlipidemia (E78.5).