RAG AI Agent for Multidimensional Research
About the project
Date:
Jan 30, 2025
Client:
Myself
Services:
Project Details
The core of the assistant is a curated dataset of extremophilic microbial polysaccharides, split into dimensions capturing microorganism identity, growth and EPS production conditions, composition, structure, macromolecular fractions, physicochemical properties, biological functions, and cryoprotection outcomes. The notebook preprocesses the Excel dataset, subfilters relevant feature blocks, and encodes categorical and numerical variables for downstream modelling and semantic querying. On top of this, I designed an interaction layer where researchers can explore questions such as how habitat, temperature, salinity, or charge distribution relate to observed cryoprotective performance, leveraging multidimensional analysis and visual outputs to surface non-trivial structure–function relationships. The result is a personal AI assistant that replaces manual spreadsheet filtering and cross-referencing with guided analytical prompts, plots, and ranked insights.
“Instead of spending days cross-checking papers and spreadsheets, I can now ask targeted questions to the assistant and get structured, evidence-backed answers in seconds. This has fundamentally changed how we approach cryoprotective polymer design. We've saved 100s of hours of refining experimental parameters."
Filomena Freitas
Principal Investigator, BIOENG
Things I Did
I compiled and structured the underlying extremophile polysaccharide database by data scraping 300+ research papers, organizing 145 entries into 144 dimensions spanning identity, growth conditions, composition, structure, functional properties, and cryoprotection outcomes. I engineered the data model and preprocessing pipeline in Python (Pandas, NumPy, Plotly, Seaborn) to support multidimensional queries and visual analytics. I implemented the logic that links experimental variables (e.g., temperature ranges, salinity, polyanionicity) to hypothesized cryoprotective outcomes, enabling the assistant to surface mechanistic patterns consistent with the “natural selection” hypothesis described in the research. I designed the interactive dashboard interface and workflows that researchers used to explore the database, generate publication-ready figures, and dramatically reduce the manual burden of data mining and meta-analysis. This saved over 2000 hours in a 4-year period.



