Where messy experiments become usable AI systems.

I am Shijie (Jason) Chen, an MSCS student at Northwestern building agentic AI, evaluation tools, and research-facing software for deep thinkers, bold creators, and teams that need clarity inside complex data.

Begin JourneyGitHubLinkedIn

Northwestern MSCS · NYU Neural Science Honors + Computer Science · AI systems, ML evaluation, and research tooling

Selected work

Projects with visible systems thinking.

Each card is structured around the engineering problem, the build, and the signal it creates for research or product work.

Agentic healthcare AI

medical-agent

A medical AI assistant workspace focused on retrieval-grounded answers, agent workflows, and repeatable offline evaluation.

Problem

Healthcare-facing AI needs grounded context and reproducible validation, not only convincing chat demos.

Build

Built a Python agent/RAG workflow with evaluation scripts and tests for inspecting retrieval and response behavior offline.

Signal

Turns a medical assistant prototype into an evidence-aware system whose behavior can be measured and improved.

PythonRAGLangChainEvaluationPytest

Repository

Full-stack recommendation system

doctor-recommendation-system

A doctor recommendation platform that combines patient-facing search, backend recommendation flows, caching, and feedback capture.

Problem

Provider discovery gets slow and difficult to reason about when external lookups and recommendations are mixed together.

Build

Built Spring Boot services, a React interface, Redis-backed provider lookup caching, and benchmark scripts for cache behavior.

Signal

Shows production-minded engineering around API latency, cache instrumentation, and honest evidence boundaries before deployment.

Spring BootReactRedisAPI cachingBenchmarking

Repository

Scientific visualization

alife-companion

A local dashboard for comparing ALife-Sim robot outputs, saved generations, and fitness-history traces from training runs.

Problem

ALife experiments produce dense run artifacts that are difficult to compare quickly across generations.

Build

Built a browser-based analysis surface for loading saved outputs, visualizing robot runs, and scanning fitness trajectories.

Signal

Turns simulation output into a readable research workflow instead of a folder of disconnected artifacts.

PythonHTMLCSSVisualizationALife-Sim

Repository Demo

LLM evaluation research

thinkingVSperformance

A team project studying how reasoning length and model size affect GSM8K and DROP performance under controlled prompting.

Problem

Reasoning traces are often treated as universally helpful, but benchmark gains vary by task and model size.

Build

Designed controlled prompting experiments, evaluation scripts, and analysis notebooks for GSM8K and DROP.

Signal

Frames model behavior as a measurable systems question instead of a prompt anecdote.

PythonJupyterTransformersGSM8KDROP

Repository

NLP robustness

ner_model_eva

An evaluation workspace for testing Named Entity Recognition models across clean and noisy text domains.

Problem

NER systems can look strong on clean examples while failing when text becomes noisy or domain-specific.

Build

Compared traditional and transformer-based pipelines with evaluation utilities for robustness inspection.

Signal

Makes model reliability visible before an NLP pipeline reaches production-facing data.

PythonNERBERTCRFRobustness

Repository

Java application

BankSystem

A desktop banking prototype with customer registration, login, transfers, statements, expense tracking, and admin flows.

Problem

A banking interface needs clear role flows, transactional actions, and predictable navigation in a compact app.

Build

Implemented Java Swing screens for core customer operations, admin views, and account activity tracking.

Signal

Shows product thinking beyond scripts: stateful UI, role-based flows, and user-facing application structure.

JavaSwingJFrameDesktop UIRole flows

Repository

Technical stack

Built around reliable systems and measurable models.

The point is not just shipping code. It is building interfaces, services, and evaluation loops where behavior can be inspected and improved.

Agentic AI systems

Retrieval, orchestration, and interfaces that make model behavior inspectable.

Evaluation tooling

Scripts, notebooks, and dashboards for turning experiments into measurable signals.

Product engineering

Interfaces and backend flows that stay understandable under real user actions.

Neuroscience data

Scientific context for models, analysis surfaces, and research-facing software.

PythonJavaC/C++TypeScriptFastAPIReactNext.jsNode.jsSpringLangChainOpenAI APIRAGAWSDockerKubernetesRedisMySQLMongoDBFAISSChromaDBPyTorchTensorFlowPythonJavaC/C++TypeScriptFastAPIReactNext.jsNode.jsSpringLangChainOpenAI APIRAGAWSDockerKubernetesRedisMySQLMongoDBFAISSChromaDBPyTorchTensorFlow

Contact

Let's build intelligent software that is clear, useful, and measurable.

Email: shijie.jason.chen@gmail.com · Evanston, IL

Email me LinkedIn