RAG AI Agent for Multidimensional Research

Personal research AI assistant built on a 145-polysaccharide, 144-parameter extremophile database that answers domain questions, surfaces patterns, and automates up to 95% of manual data mining for cryobiology researchers.

Personal research AI assistant built on a 145-polysaccharide, 144-parameter extremophile database that answers domain questions, surfaces patterns, and automates up to 95% of manual data mining for cryobiology researchers.

Personal research AI assistant built on a 145-polysaccharide, 144-parameter extremophile database that answers domain questions, surfaces patterns, and automates up to 95% of manual data mining for cryobiology researchers.

About the project

This project focused on transforming a high-dimensional literature-derived polysaccharide database into an interactive AI research assistant for cryobiology and biopolymer science. By encoding 20,000+ structured hyperparameters into a queryable knowledge base, the assistant enables rapid hypothesis testing, pattern discovery, and evidence retrieval around cryoprotective mechanisms that would otherwise require hours of manual paper mining. The tool acts as a domain-aware copilot for researchers, narrowing the gap between raw multidimensional data and actionable scientific insight.

This project focused on transforming a high-dimensional literature-derived polysaccharide database into an interactive AI research assistant for cryobiology and biopolymer science. By encoding 20,000+ structured hyperparameters into a queryable knowledge base, the assistant enables rapid hypothesis testing, pattern discovery, and evidence retrieval around cryoprotective mechanisms that would otherwise require hours of manual paper mining. The tool acts as a domain-aware copilot for researchers, narrowing the gap between raw multidimensional data and actionable scientific insight.

This project focused on transforming a high-dimensional literature-derived polysaccharide database into an interactive AI research assistant for cryobiology and biopolymer science. By encoding 20,000+ structured hyperparameters into a queryable knowledge base, the assistant enables rapid hypothesis testing, pattern discovery, and evidence retrieval around cryoprotective mechanisms that would otherwise require hours of manual paper mining. The tool acts as a domain-aware copilot for researchers, narrowing the gap between raw multidimensional data and actionable scientific insight.

Date:

Jan 30, 2025

Client:

Myself

Services:

Project Details

The core of the assistant is a curated dataset of extremophilic microbial polysaccharides, split into dimensions capturing microorganism identity, growth and EPS production conditions, composition, structure, macromolecular fractions, physicochemical properties, biological functions, and cryoprotection outcomes. The notebook preprocesses the Excel dataset, subfilters relevant feature blocks, and encodes categorical and numerical variables for downstream modelling and semantic querying. On top of this, I designed an interaction layer where researchers can explore questions such as how habitat, temperature, salinity, or charge distribution relate to observed cryoprotective performance, leveraging multidimensional analysis and visual outputs to surface non-trivial structure–function relationships. The result is a personal AI assistant that replaces manual spreadsheet filtering and cross-referencing with guided analytical prompts, plots, and ranked insights.

“Instead of spending days cross-checking papers and spreadsheets, I can now ask targeted questions to the assistant and get structured, evidence-backed answers in seconds. This has fundamentally changed how we approach cryoprotective polymer design. We've saved 100s of hours of refining experimental parameters."

Filomena Freitas

Principal Investigator, BIOENG

Things I Did

I compiled and structured the underlying extremophile polysaccharide database by data scraping 300+ research papers, organizing 145 entries into 144 dimensions spanning identity, growth conditions, composition, structure, functional properties, and cryoprotection outcomes. I engineered the data model and preprocessing pipeline in Python (Pandas, NumPy, Plotly, Seaborn) to support multidimensional queries and visual analytics. I implemented the logic that links experimental variables (e.g., temperature ranges, salinity, polyanionicity) to hypothesized cryoprotective outcomes, enabling the assistant to surface mechanistic patterns consistent with the “natural selection” hypothesis described in the research. I designed the interactive dashboard interface and workflows that researchers used to explore the database, generate publication-ready figures, and dramatically reduce the manual burden of data mining and meta-analysis. This saved over 2000 hours in a 4-year period.

Create a free website with Framer, the website builder loved by startups, designers and agencies.