A. Soledad Rosario

Abbey Rosario

Statistics · NLP · Satellite Data · Cybersecurity · Clinical Research · Information Theory

Building measurement tools, research pipelines, and products at the intersection of mathematics, language, and systems.

Complete
Active
Research
Idea
Planetary Visualizer · Personal Project
Bark Bark + Abbey Patty
Active

Color coded 3D geocentric and heliocentric planetary moment viewer. Abbey Patty's and Bark Bark's natal charts displayed simultaneously on the same orbit rings. Saturn stationed direct at 3°49′ Virgo the day Bark Bark was found, 0.64° from Abbey's natal Sun. Chesta bala (Vedic motional strength) for every planet.

Three.jsVedic AstrologyOrbital MechanicsWebGL
Deliverables
3D solar system — heliocentric + geocentric toggle
Chesta bala (motional strength) for all planets
Both charts simultaneously — gold (Abbey) + teal (Bark Bark)
Sidereal positions via Lahiri ayanamsa
Deploy to GitHub Pages
The origin
Saturn is the only planet at peak strength at its exact station. Saturn stationed direct at 3°49′ Virgo — 0.64° from Abbey's natal Sun.
tap to expand ↓
Satellite Dashboard · Product
GreenTrace
Active

GEE satellite dashboard monitoring NO₂ and VIIRS nighttime lights along the trans-Pacific shipping corridor LA → Shanghai. Independently verifiable emissions monitoring for green corridor policy.

Google Earth EngineSentinel-5PPythonSDG 9·14
Deliverables
GEE dashboard script (green_corridor_dashboard.js)
Statistical analysis notebook (regression, COVID natural experiment, 4 figures)
Slide deck content written (ISC_slide_content.md)
NO₂ layer rendering — needs verification + GEE screenshots
Real AIS corridor geometry (currently hand-drawn waypoints)
WDPA marine protected boundaries (currently placeholder)
Completed PPTX with screenshots inserted
Destinations
SpaceHACK 2027 · Remote Sensing journal short comm · ESG product pitch to logistics firms
tap to expand ↓
Chat Application · Product
Rewind
Active

A chat interface with conversation checkpointing. Save state at any point, then restore it — discarding everything that came after. When an AI conversation goes off track, rewind to before it happened instead of starting over.

FlaskVanilla JSAnthropic APIPWA
Deliverables
Flask backend — snapshot storage, rewind API (server.py)
Frontend — mobile-ready PWA (static/index.html)
Deploy to EC2
Persistent storage (SQLite)
Next.js + Supabase production version
The insight
Linear tail-redaction covers 95% of real use cases. No branching needed. The same failure mode it solves (AI assumption drift) is what motivated building it.
tap to expand ↓
Analytics Scaffold · Tool
DataSprint
Complete

Theme-agnostic datathon analytics scaffold. Five tabs, auto-detects date/geo/categorical columns, runs statistical tests automatically. Built for DataHacks 2026 — UC San Diego DS3 (April 18–19).

StreamlitPandasPlotlySciPy
Deliverables
Streamlit app — 5 tabs, auto-detection, statistical tests (app.py)
requirements.txt + README
Deploy to Streamlit Cloud before April 18
tap to expand ↓
Epidemiology · SDSU DiMoLab · $6k SURP Funded
COVID-19 Reproduction Number Estimation
Research

Funded undergraduate research estimating basic and effective reproduction numbers (R₀, Rt) across US regions during COVID-19. SDSU DiMoLab coordinated by Dr. Naveen Vaidya.

MATLABREpiEstimSDSU 2021–22
Deliverables
Computational estimation pipeline and tools
Poster — SDSU Summer Research Symposium 2021
Research paper — not yet written
Natural extension
Rt Estimation Benchmarking Suite — synthetic data with known true Rt values. Vaidya is the natural co-author for both.
tap to expand ↓
NLP · Meta-Science · Research Paper
AI Detection in Epidemiology Papers
Research · 60%

Sprint paper testing whether AI-generated methods sections are detectable via linguistic features, and whether AI generation correlates with lower methodological quality.

NLPPubMed APIscikit-learnspaCy
Deliverables
PubMed ingestion — 100+ methods sections, 7 quality indicators (pubmed_ingest.py)
AI generation pipeline — paired versions via Anthropic API
Classifier — logistic regression, 8 linguistic features, 4 figures (classifier.py)
Run on real PubMed data (needs API key)
Write the paper
Submit to PLOS ONE or JMIR
tap to expand ↓
Information Theory · Research + Product
Document Spectrometer
Research · 40%

Multi-resolution measurement framework decomposing information content into mathematically distinct components — topology, dynamical structure, statistical entropy, compression complexity. Like a prism for text.

Info TheoryTDARényiNCDConformal Prediction
Deliverables
Entropy framework — Shannon, Rényi spectrum, sample entropy, JSD, NCD, conditional entropy
Topological component (persistent homology via scikit-tda)
Dynamical component (Lyapunov exponents on embedding trajectories)
Conformal prediction wrapper (distribution-free uncertainty)
Web interface + API product
Paper — Entropy (MDPI) or NeurIPS theory track
tap to expand ↓
R Package · Medicine
pvbench
Idea

Pharmacovigilance signal detection benchmarking. Synthetic FAERS-structured data with known true drug-event interaction signals. Realistic reporting biases: notoriety bias, Weber effect, missing fields, duplicates. Full evaluation metrics suite.

RFAERSSignal DetectionCRAN target
Functions
generate_pvdata() — synthetic FAERS-structured data
embed_signals() — known true interaction signals
evaluate_detection() — sensitivity, specificity, PPV, NPV, AUROC, partial AUROC
benchmark_report() — comparison output
CRAN submission
Paper — Pharmacoepidemiology and Drug Safety or Drug Safety
tap to expand ↓
R Package · Clinical Statistics
estimandr
Idea

ICH E9(R1) estimand framework implementation. Full coverage of five intercurrent event strategies: treatment policy, hypothetical, composite, while on treatment, principal stratum. Regulatory-submission-ready outputs. Known gap — no standard R package covers the full estimand framework.

RICH E9(R1)Clinical TrialsCRAN target
Deliverables
Five intercurrent event strategy implementations
Regulatory-submission-ready output formats
CRAN submission
Paper — Statistics in Medicine or Pharmaceutical Statistics
Why this exists
No standard R package covers the full ICH E9(R1) estimand framework. Every clinical statistician working in pharma hits this gap.
tap to expand ↓
Epidemiology · Extension of DiMoLab Work
Rt Estimation Benchmarking Suite
Idea

Synthetic outbreak data with known ground truth Rt values, for testing how accurately Rt estimation methods perform. Calibrated to POLYMOD contact patterns with heterogeneous mixing across age and region.

RPOLYMODEpiEstimBenchmarking
Deliverables
Synthetic generator with known ground truth Rt by region, age, time
Benchmarking suite comparing EpiEstim, EpiNow2, epidemia
Paper — Epidemics, PLOS Computational Biology, or BMC Infectious Diseases
Co-author
Dr. Naveen Vaidya (SDSU DiMoLab) — natural extension of existing COVID-19 work.
tap to expand ↓
Causal Inference · Benchmarking
Synthetic Epidemiological Cohort Generator
Idea

Observational health study simulator for causal inference benchmarking. Binary exposure, binary outcome, known confounders. Parameters from NHANES, Framingham, CDC BRFSS. Explicit DAG output. Realistic MNAR missing data. Ground truth: true causal effect size and true confounder set.

RCausal InferenceDAGNHANES
Deliverables
Synthetic cohort generator with explicit DAG output
Benchmarking suite — propensity score methods, LASSO, imputation
Paper — IJE or American Journal of Epidemiology
tap to expand ↓
Cheminformatics · Statistical Method
Bias-Corrected MI for Molecular Features
Idea

Mutual information between molecular descriptors and biological activity. Naive MI estimator severely biased for small datasets — implement bias correction. R + Python packages. No chemistry required — stays in the statistical/information-theoretic lane.

RPythonMutual InformationJournal of Cheminformatics
Deliverables
Bias-corrected MI estimator (R + Python)
Paper — Journal of Cheminformatics
tap to expand ↓
Drug Discovery · Optimization Tool
Maximum Entropy Library Design
Idea

Given N compounds, select the subset that maximizes entropy of the property distribution. Convex optimization over molecular descriptor space. Streamlit app + Python package. Target: biotech startups, CROs, academic drug discovery groups.

PythonConvex OptimizationStreamlitDrug Discovery
Deliverables
Python package — convex optimization over descriptor space
Streamlit interface
Distribution strategy (build after research identity established)
tap to expand ↓
Medicine · Data Quality
FAERS Data Quality Framework
Idea

Statistical framework for quantifying and correcting data quality issues in spontaneous AE reporting. Duplicate detection, inconsistent drug naming, reporting bias, Weber effect correction. Real gap — no comprehensive statistical QA framework for FAERS exists.

FAERSRSignal DetectionFDA
Deliverables
Statistical QA framework for FAERS
Weber effect correction implementation
Paper — Drug Safety or Pharmacoepidemiology and Drug Safety
Connection
Upstream data quality layer for pvbench — natural to build together.
tap to expand ↓
Information Theory · Bioinformatics
Information Theory Causal Discovery for Biological Networks
Idea

Conditional mutual information tests for causal graph learning. More powerful than correlation-based tests for nonlinear biological dependencies. Proper finite sample corrections and uncertainty quantification for inferred edges. R package.

RCausal DiscoveryCMIBioinformatics
Deliverables
R package — CMI tests with finite sample corrections
Paper — Bioinformatics or PLOS Computational Biology
Note
Do not write prematurely—Ch. 2, 7, 11 of Cover & Thomas foundation necessary
tap to expand ↓
NLP · Knowledge Tool
Lemma
Idea

Terminology extraction for conversations. Takes a text corpus, extracts domain-specific terms, generates definitions with classifications for field types, maps concept relationships as an interactive knowledge graph.

NLPspaCyKnowledge GraphD3
Deliverables
Knowledge graph visualization designed
Extraction pipeline
Definition generation + field classification
Web interface
tap to expand ↓
NLP · Research Tool
Paper Comparison Venn
Idea

Takes two paper abstracts or PDFs, extracts claims, categorizes as unique or shared. Renders a Venn where visual design communicates corroboration vs contradiction. Claims drift spatially — font weight encodes evidence strength.

NLPD3Typographic UI
Deliverables
Claim extraction pipeline
Shared vs unique classification
Typographic Venn renderer
tap to expand ↓
Cloud Infrastructure · Mobile
Noether
Idea

Plain English → cloud infrastructure from your phone. Build, change, deploy entire AWS/GCP infrastructure via mobile. Dry-run preview before any action. Sandbox before production. Project memory across sessions.

MobileAWSTerraformLLM
Deliverables
NL → infrastructure intent parsing
Dry-run preview layer (terraform plan equivalent)
Sandbox environment before production
Mobile interface
tap to expand ↓
Text Generation · Consumer Tool · Gumroad
Netlog Generator
Complete

Automated network log generator for structured log output. Sold on Gumroad as a standalone tool. Freelance product built and distributed independently.

PythonLog GenerationGumroad
Deliverables
Generator script — structured netlog output
Gumroad listing — live and available
tap to expand ↓
nlp & automation tools
NLP · Data Quality · Freelance Tool
Fuzzy Matcher
Complete

Fuzzy string matching tool for deduplication and entity resolution. Built for messy real-world data — names, addresses, drug names. Freelance deliverable.

PythonFuzzyWuzzyNLPEntity Resolution
Deliverables
Matching pipeline — configurable threshold, multiple algorithms
Delivered to client
tap to expand ↓
Finance · Automation · Freelance Tool
Bank Reconciler
Complete

Automated bank statement reconciliation tool. Matches transactions across accounts or statements, flags discrepancies, outputs reconciliation report. Freelance deliverable.

PythonPandasFinance Automation
Deliverables
Reconciliation pipeline — transaction matching, discrepancy flagging
Report output — reconciliation summary
Delivered to client
tap to expand ↓
NLP · Pattern Detection
Motif
Idea

Recurring pattern and motif detection in text corpora. Identifies structural and semantic patterns across documents — useful for research integrity, stylometric analysis, and content auditing.

NLPPattern DetectionPython
Deliverables
Pattern extraction pipeline
Semantic motif clustering
Visualization + report
tap to expand ↓