A. Soledad Rosario

Abbey Rosario

Statistics · NLP · Satellite Data · Cybersecurity · Clinical Research · Information Theory

Building measurement tools, research pipelines, and products at the intersection of mathematics, language, and systems.

⌥ GitHub ✉ Email ↗ Resume

Complete

Active

Research

Idea

active builds

Planetary Visualizer · Personal Project

Bark Bark + Abbey Patty

Active

Color coded 3D geocentric and heliocentric planetary moment viewer. Abbey Patty's and Bark Bark's natal charts displayed simultaneously on the same orbit rings. Saturn stationed direct at 3°49′ Virgo the day Bark Bark was found, 0.64° from Abbey's natal Sun. Chesta bala (Vedic motional strength) for every planet.

Three.jsVedic AstrologyOrbital MechanicsWebGL

Deliverables

3D solar system — heliocentric + geocentric toggle

Chesta bala (motional strength) for all planets

Both charts simultaneously — gold (Abbey) + teal (Bark Bark)

Sidereal positions via Lahiri ayanamsa

Deploy to GitHub Pages

The origin

Saturn is the only planet at peak strength at its exact station. Saturn stationed direct at 3°49′ Virgo — 0.64° from Abbey's natal Sun.

tap to expand ↓

Satellite Dashboard · Product

GreenTrace

Active

GEE satellite dashboard monitoring NO₂ and VIIRS nighttime lights along the trans-Pacific shipping corridor LA → Shanghai. Independently verifiable emissions monitoring for green corridor policy.

Google Earth EngineSentinel-5PPythonSDG 9·14

Deliverables

GEE dashboard script (green_corridor_dashboard.js)

Statistical analysis notebook (regression, COVID natural experiment, 4 figures)

Slide deck content written (ISC_slide_content.md)

NO₂ layer rendering — needs verification + GEE screenshots

Real AIS corridor geometry (currently hand-drawn waypoints)

WDPA marine protected boundaries (currently placeholder)

Completed PPTX with screenshots inserted

Destinations

SpaceHACK 2027 · Remote Sensing journal short comm · ESG product pitch to logistics firms

tap to expand ↓

Chat Application · Product

Rewind

Active

A chat interface with conversation checkpointing. Save state at any point, then restore it — discarding everything that came after. When an AI conversation goes off track, rewind to before it happened instead of starting over.

FlaskVanilla JSAnthropic APIPWA

Deliverables

Flask backend — snapshot storage, rewind API (server.py)

Frontend — mobile-ready PWA (static/index.html)

Deploy to EC2

Persistent storage (SQLite)

Next.js + Supabase production version

The insight

Linear tail-redaction covers 95% of real use cases. No branching needed. The same failure mode it solves (AI assumption drift) is what motivated building it.

tap to expand ↓

Analytics Scaffold · Tool

DataSprint

Complete

Theme-agnostic datathon analytics scaffold. Five tabs, auto-detects date/geo/categorical columns, runs statistical tests automatically. Built for DataHacks 2026 — UC San Diego DS3 (April 18–19).

StreamlitPandasPlotlySciPy

Deliverables

Streamlit app — 5 tabs, auto-detection, statistical tests (app.py)

requirements.txt + README

Deploy to Streamlit Cloud before April 18

tap to expand ↓

research

Epidemiology · SDSU DiMoLab · $6k SURP Funded

COVID-19 Reproduction Number Estimation

Research

Funded undergraduate research estimating basic and effective reproduction numbers (R₀, Rt) across US regions during COVID-19. SDSU DiMoLab coordinated by Dr. Naveen Vaidya.

MATLABREpiEstimSDSU 2021–22

Deliverables

Computational estimation pipeline and tools

Poster — SDSU Summer Research Symposium 2021

Research paper — not yet written

Natural extension

Rt Estimation Benchmarking Suite — synthetic data with known true Rt values. Vaidya is the natural co-author for both.

tap to expand ↓

NLP · Meta-Science · Research Paper

AI Detection in Epidemiology Papers

Research · 60%

Sprint paper testing whether AI-generated methods sections are detectable via linguistic features, and whether AI generation correlates with lower methodological quality.

NLPPubMed APIscikit-learnspaCy

Deliverables

PubMed ingestion — 100+ methods sections, 7 quality indicators (pubmed_ingest.py)

AI generation pipeline — paired versions via Anthropic API

Classifier — logistic regression, 8 linguistic features, 4 figures (classifier.py)

Run on real PubMed data (needs API key)

Write the paper

Submit to PLOS ONE or JMIR

tap to expand ↓

Information Theory · Research + Product

Document Spectrometer

Research · 40%

Multi-resolution measurement framework decomposing information content into mathematically distinct components — topology, dynamical structure, statistical entropy, compression complexity. Like a prism for text.

Info TheoryTDARényiNCDConformal Prediction

Deliverables

Entropy framework — Shannon, Rényi spectrum, sample entropy, JSD, NCD, conditional entropy

Topological component (persistent homology via scikit-tda)

Dynamical component (Lyapunov exponents on embedding trajectories)

Conformal prediction wrapper (distribution-free uncertainty)

Web interface + API product

Paper — Entropy (MDPI) or NeurIPS theory track

tap to expand ↓

medicine & clinical research

R Package · Medicine

pvbench

Idea

Pharmacovigilance signal detection benchmarking. Synthetic FAERS-structured data with known true drug-event interaction signals. Realistic reporting biases: notoriety bias, Weber effect, missing fields, duplicates. Full evaluation metrics suite.

RFAERSSignal DetectionCRAN target

Functions

generate_pvdata() — synthetic FAERS-structured data

embed_signals() — known true interaction signals

evaluate_detection() — sensitivity, specificity, PPV, NPV, AUROC, partial AUROC

benchmark_report() — comparison output

CRAN submission

Paper — Pharmacoepidemiology and Drug Safety or Drug Safety

tap to expand ↓

R Package · Clinical Statistics

estimandr

Idea

ICH E9(R1) estimand framework implementation. Full coverage of five intercurrent event strategies: treatment policy, hypothetical, composite, while on treatment, principal stratum. Regulatory-submission-ready outputs. Known gap — no standard R package covers the full estimand framework.

RICH E9(R1)Clinical TrialsCRAN target

Deliverables

Five intercurrent event strategy implementations

Regulatory-submission-ready output formats

CRAN submission

Paper — Statistics in Medicine or Pharmaceutical Statistics

Why this exists

No standard R package covers the full ICH E9(R1) estimand framework. Every clinical statistician working in pharma hits this gap.

tap to expand ↓

Epidemiology · Extension of DiMoLab Work

Rt Estimation Benchmarking Suite

Idea

Synthetic outbreak data with known ground truth Rt values, for testing how accurately Rt estimation methods perform. Calibrated to POLYMOD contact patterns with heterogeneous mixing across age and region.

RPOLYMODEpiEstimBenchmarking

Deliverables

Synthetic generator with known ground truth Rt by region, age, time

Benchmarking suite comparing EpiEstim, EpiNow2, epidemia

Paper — Epidemics, PLOS Computational Biology, or BMC Infectious Diseases

Co-author

Dr. Naveen Vaidya (SDSU DiMoLab) — natural extension of existing COVID-19 work.

tap to expand ↓

Causal Inference · Benchmarking

Synthetic Epidemiological Cohort Generator

Idea

Observational health study simulator for causal inference benchmarking. Binary exposure, binary outcome, known confounders. Parameters from NHANES, Framingham, CDC BRFSS. Explicit DAG output. Realistic MNAR missing data. Ground truth: true causal effect size and true confounder set.

RCausal InferenceDAGNHANES

Deliverables

Synthetic cohort generator with explicit DAG output

Benchmarking suite — propensity score methods, LASSO, imputation

Paper — IJE or American Journal of Epidemiology

tap to expand ↓

drug discovery & cheminformatics

Cheminformatics · Statistical Method

Bias-Corrected MI for Molecular Features

Idea

Mutual information between molecular descriptors and biological activity. Naive MI estimator severely biased for small datasets — implement bias correction. R + Python packages. No chemistry required — stays in the statistical/information-theoretic lane.

RPythonMutual InformationJournal of Cheminformatics

Deliverables

Bias-corrected MI estimator (R + Python)

Paper — Journal of Cheminformatics

tap to expand ↓

Drug Discovery · Optimization Tool

Maximum Entropy Library Design

Idea

Given N compounds, select the subset that maximizes entropy of the property distribution. Convex optimization over molecular descriptor space. Streamlit app + Python package. Target: biotech startups, CROs, academic drug discovery groups.

PythonConvex OptimizationStreamlitDrug Discovery

Deliverables

Python package — convex optimization over descriptor space

Streamlit interface

Distribution strategy (build after research identity established)

tap to expand ↓

medicine & clinical research

Medicine · Data Quality

FAERS Data Quality Framework

Idea

Statistical framework for quantifying and correcting data quality issues in spontaneous AE reporting. Duplicate detection, inconsistent drug naming, reporting bias, Weber effect correction. Real gap — no comprehensive statistical QA framework for FAERS exists.

FAERSRSignal DetectionFDA

Deliverables

Statistical QA framework for FAERS

Weber effect correction implementation

Paper — Drug Safety or Pharmacoepidemiology and Drug Safety

Connection

Upstream data quality layer for pvbench — natural to build together.

tap to expand ↓

information theory

Information Theory · Bioinformatics

Information Theory Causal Discovery for Biological Networks

Idea

Conditional mutual information tests for causal graph learning. More powerful than correlation-based tests for nonlinear biological dependencies. Proper finite sample corrections and uncertainty quantification for inferred edges. R package.

RCausal DiscoveryCMIBioinformatics

Deliverables

R package — CMI tests with finite sample corrections

Paper — Bioinformatics or PLOS Computational Biology

Note

Do not write prematurely—Ch. 2, 7, 11 of Cover & Thomas foundation necessary

tap to expand ↓

products & tools

NLP · Knowledge Tool

Lemma

Idea

Terminology extraction for conversations. Takes a text corpus, extracts domain-specific terms, generates definitions with classifications for field types, maps concept relationships as an interactive knowledge graph.

NLPspaCyKnowledge GraphD3

Deliverables

Knowledge graph visualization designed

Extraction pipeline

Definition generation + field classification

Web interface

tap to expand ↓

NLP · Research Tool

Paper Comparison Venn

Idea

Takes two paper abstracts or PDFs, extracts claims, categorizes as unique or shared. Renders a Venn where visual design communicates corroboration vs contradiction. Claims drift spatially — font weight encodes evidence strength.

NLPD3Typographic UI

Deliverables

Claim extraction pipeline

Shared vs unique classification

Typographic Venn renderer

tap to expand ↓

Cloud Infrastructure · Mobile

Noether

Idea

Plain English → cloud infrastructure from your phone. Build, change, deploy entire AWS/GCP infrastructure via mobile. Dry-run preview before any action. Sandbox before production. Project memory across sessions.

MobileAWSTerraformLLM

Deliverables

NL → infrastructure intent parsing

Dry-run preview layer (terraform plan equivalent)

Sandbox environment before production

Mobile interface

tap to expand ↓

security

Text Generation · Consumer Tool · Gumroad

Netlog Generator

Complete

Automated network log generator for structured log output. Sold on Gumroad as a standalone tool. Freelance product built and distributed independently.

PythonLog GenerationGumroad

Deliverables

Generator script — structured netlog output

Gumroad listing — live and available

tap to expand ↓

nlp & automation tools

NLP · Data Quality · Freelance Tool

Fuzzy Matcher

Complete

Fuzzy string matching tool for deduplication and entity resolution. Built for messy real-world data — names, addresses, drug names. Freelance deliverable.

PythonFuzzyWuzzyNLPEntity Resolution

Deliverables

Matching pipeline — configurable threshold, multiple algorithms

Delivered to client

tap to expand ↓

Finance · Automation · Freelance Tool

Bank Reconciler

Complete

Automated bank statement reconciliation tool. Matches transactions across accounts or statements, flags discrepancies, outputs reconciliation report. Freelance deliverable.

PythonPandasFinance Automation

Deliverables

Reconciliation pipeline — transaction matching, discrepancy flagging

Report output — reconciliation summary

Delivered to client

tap to expand ↓

NLP · Pattern Detection

Motif

Idea

Recurring pattern and motif detection in text corpora. Identifies structural and semantic patterns across documents — useful for research integrity, stylometric analysis, and content auditing.

NLPPattern DetectionPython

Deliverables

Pattern extraction pipeline

Semantic motif clustering

Visualization + report

tap to expand ↓