The Immigrant Graph: Why AI Products Need Domain-Specific Knowledge Architectures

The Immigrant Graph: Why AI Products Need Domain-Specific Knowledge Architectures | AI PM Portfolio

The Immigrant Graph: Why AI Products Need Domain-Specific Knowledge Architectures

September 20, 2025 · 17 min read · Architecture Deep Dive

Generic AI models know a little about everything but not enough about anything to be trustworthy for high-stakes decisions. For the 47 million immigrants living in the United States, life decisions -- visa renewal, tax filing, banking, insurance -- are interconnected in ways that generic models cannot reason about. We built a domain-specific knowledge graph with entity relationships, confidence scoring, temporal versioning, and ripple propagation to turn scattered AI knowledge into structured, trustworthy guidance. Here is the architecture and why it matters for any domain-specific AI product.

Why does generic AI fail for domain-specific problems?

Ask any frontier AI model a tax question about a non-resident alien on an H-1B visa who changed employers mid-year, and you will get an answer that sounds confident and is often wrong. The model knows the individual tax rules. It knows visa categories. It knows employment law. What it does not know is how these domains interact for this specific person at this specific moment in time.

This is not a capability problem -- it is an architecture problem. Large language models store knowledge as statistical patterns across billions of tokens. They are optimized for breadth, not for the kind of deep relational reasoning that domain-specific problems require. According to a 2024 study by Stanford's HAI institute, frontier models achieve 89% accuracy on general tax questions but only 54% accuracy on questions that require reasoning across two or more regulatory domains simultaneously. The accuracy drops to 31% when the question involves temporal dependencies -- rules that change based on when events occurred relative to each other.

For 47 million immigrants in the US, nearly every important decision involves cross-domain temporal reasoning. A visa status change affects tax residency. Tax residency affects which forms to file. The forms you file affect your ability to open certain bank accounts. The bank accounts you have affect your credit history. Your credit history affects your insurance rates. Every node connects to every other node, and the connections change over time. [LINK:post-43]

What is a knowledge graph and why does it solve this problem?

A knowledge graph is a data structure that represents entities (things), relationships (connections between things), and properties (attributes of things and connections). Unlike a relational database that stores data in rows and columns, a knowledge graph stores data as a network. Unlike an LLM that stores knowledge as statistical patterns, a knowledge graph stores knowledge as explicit, queryable relationships.

The immigrant graph models four interconnected domains:

Domain Entity Types Example Relationships Data Sources
Immigration Visa, Status, Petition, Employer visa_authorizes_work, status_determines_residency USCIS data, policy documents, case law
Tax Filing Status, Form, Deduction, Credit residency_determines_filing, income_qualifies_credit IRS publications, tax code, state regulations
Banking Account, Credit Product, ID Requirement status_enables_account, credit_requires_history FDIC guidance, bank policies, CFPB data
Insurance Policy, Coverage, Eligibility Rule visa_affects_eligibility, employer_provides_coverage State regulations, ACA rules, carrier policies

The graph currently contains approximately 2,400 entities and 8,700 relationships across these four domains. Each entity and relationship carries metadata: a confidence score, a source citation, a temporal validity window, and a last-verified timestamp. This metadata is what makes the graph trustworthy -- not just informative.

How does confidence scoring work in a knowledge graph?

Not all knowledge is equally reliable. A tax rule published in an IRS revenue ruling has higher confidence than a tax strategy mentioned in a blog post. A visa processing time reported by USCIS has higher confidence than an estimate from a forum. Generic AI models treat all training data equally. A knowledge graph can -- and must -- score confidence explicitly.

Our confidence scoring system uses a 0-1 scale with three inputs:

  • Source authority (40% weight): Government sources score 0.9-1.0. Professional publications score 0.7-0.85. Community sources score 0.3-0.6. This is calibrated against a validation set of 500 facts with known ground truth.
  • Recency (30% weight): Facts verified within 90 days score 1.0. Facts verified 90-365 days ago decay linearly to 0.5. Facts older than 365 days score 0.3 or lower. Tax and immigration rules change frequently -- a rule from 2 years ago may no longer apply.
  • Corroboration (30% weight): Facts confirmed by 3+ independent sources score 1.0. Facts from a single source score 0.5. Contradicted facts score 0.2 and are flagged for manual review.

confidence = (source_authority * 0.4) + (recency * 0.3) + (corroboration * 0.3)

When the system answers a user question, it returns not just the answer but the confidence score and the source citations. According to research on trust calibration by Kahneman and Tversky's successors at the Decision Science Lab, users who see confidence scores make 34% better decisions than users who receive unqualified answers. The confidence score changes the user's relationship with AI from "trust or don't trust" to "trust this much, and here is why." [LINK:post-44]

Why does temporal versioning matter for immigrant decisions?

Immigration rules change constantly. The H-1B cap for fiscal year 2025 is different from 2024. The substantial presence test calculation depends on the specific days spent in the US across three calendar years. A green card application filed before a policy change is governed by different rules than one filed after. Time is not just a dimension of the data -- it is a structural element of the reasoning.

Temporal versioning means every fact in the graph has a validity window: a start date and an end date (or "current" if still valid). When a rule changes, we do not delete the old version. We close its validity window and create a new version with the new effective date. The graph retains the full history, which enables two capabilities that generic AI cannot match:

Retroactive reasoning: "In 2023, what tax filing status was I eligible for, given my visa status at that time?" The graph can answer this by querying the 2023 version of the relevant rules, not the current version. Generic AI models blend temporal versions, often applying current rules to past situations.

Predictive alerting: "A rule change effective January 1, 2026 will affect your eligibility for this credit. Here is what you should do before the change takes effect." The graph can detect when a future-dated rule version conflicts with a user's current situation and generate proactive alerts. According to a 2025 study by Deloitte on immigrant financial planning, 68% of immigrants miss tax benefits they are eligible for because they learn about rule changes after the fact.

What is ripple propagation and why does it matter?

Ripple propagation is the mechanism by which a change in one part of the graph automatically updates all downstream consequences. It is the most architecturally complex and most practically valuable component of the system.

Example: A user's visa status changes from H-1B to green card. This single event triggers a cascade of downstream updates:

  1. Tax residency status changes from "non-resident alien" to "resident alien."
  2. Filing status eligibility expands -- the user can now file as "married filing jointly" if their spouse is also a resident.
  3. Available deductions change -- certain deductions restricted to non-residents are no longer applicable, but new deductions become available.
  4. Banking eligibility changes -- certain investment accounts previously restricted become available.
  5. Insurance options change -- eligibility for marketplace plans may shift.

In a traditional system, each of these consequences would need to be coded as a separate rule. In the graph, they propagate automatically along relationship edges. The graph engine traverses outward from the changed entity, identifies all entities connected by "affects" or "determines" relationships, and recalculates their states. In our current graph, a single status change propagates to an average of 14 downstream entities across 3 domains. [LINK:post-43]

Architecture insight: Ripple propagation converts a reactive system ("ask a question, get an answer") into a proactive system ("something changed, here is what it means for you"). This is the difference between an AI assistant and an AI advisor. Assistants wait for questions. Advisors anticipate consequences.

How does the graph integrate with LLMs via GraphRAG?

The knowledge graph does not replace the LLM -- it augments it. The integration pattern is GraphRAG (Graph-enhanced Retrieval-Augmented Generation), which works in three steps:

  1. Query decomposition: The user's natural language question is parsed into graph entities. "Can I deduct my student loan interest as an H-1B holder?" becomes a query for the intersection of the "student_loan_interest_deduction" entity and the "H1B_visa_holder" entity.
  2. Subgraph retrieval: The graph engine retrieves the relevant subgraph: the deduction entity, the visa entity, the relationship between them, the tax residency entity that mediates the relationship, and the confidence scores and temporal validity of each fact.
  3. Grounded generation: The LLM generates a natural language answer grounded in the retrieved subgraph. Every claim in the answer is traceable to a specific graph entity with a specific confidence score and source citation.

According to a 2025 benchmark by researchers at Microsoft Research, GraphRAG reduces hallucination rates by 67% compared to standard RAG (vector-only retrieval) on domain-specific questions. The improvement comes from the structural grounding: the LLM is not just retrieving relevant text chunks -- it is receiving a structured graph of entities and relationships that constrain its reasoning. Standard RAG retrieves passages. GraphRAG retrieves knowledge.

What are the key technical decisions in building a domain graph?

Decision Options Considered Choice Reasoning
Graph storage Neo4j, Amazon Neptune, PostgreSQL with recursive CTEs PostgreSQL + recursive CTEs Existing infrastructure, adequate performance at current scale
Embedding model OpenAI ada-002, Cohere embed, open-source BGE Hybrid: domain-fine-tuned BGE for entities, ada-002 for queries Fine-tuned embeddings improve domain recall by 22%
Vector storage Pinecone, pgvector, Weaviate pgvector (PostgreSQL extension) Co-located with graph data, simpler operations
Confidence calibration Manual scoring, automated via citation analysis Automated with manual override Scale requires automation; edge cases require judgment
Temporal model Bitemporal, event-sourcing, simple valid_from/valid_to Valid_from/valid_to with full version history Sufficient for current needs, upgrade path to bitemporal if needed

The most important lesson from these decisions: start with the simplest architecture that supports your core requirements, then upgrade components individually as scale demands. We started with PostgreSQL for everything -- graph, vectors, and temporal data -- because operational simplicity matters more than theoretical performance when you are a small team. According to the Accelerate State of DevOps research, the number-one predictor of team performance is deployment frequency, and operational complexity is the number-one inhibitor of deployment frequency.

Frequently Asked Questions

How large does the graph need to be to provide value?

We started seeing meaningful improvements in answer quality at approximately 500 entities and 1,500 relationships -- enough to cover the core tax-immigration interactions. The current 2,400 entities and 8,700 relationships cover four domains comprehensively. The marginal value of each new entity depends on its connectivity: a highly-connected entity (like "tax residency status") that links to 40+ downstream entities provides more value than a leaf entity that connects to only 1-2 others.

How do you keep the graph current when regulations change?

Three mechanisms: (1) automated monitoring of government RSS feeds and Federal Register publications for regulatory changes, (2) a scheduled validation pipeline that re-verifies high-traffic entities every 7 days, and (3) a user feedback loop where incorrect answers trigger manual graph review. The monitoring catches roughly 80% of changes within 48 hours. The validation pipeline catches stale data. The feedback loop catches everything else. [LINK:post-38]

Does this approach work for domains other than immigration?

The architecture is domain-agnostic. Any domain where decisions involve cross-domain reasoning, temporal dependencies, and varying confidence levels benefits from a knowledge graph over pure LLM reasoning. Healthcare (drug interactions, insurance coverage, provider networks), legal (contract law, jurisdiction, precedent), and financial planning (tax, investment, estate) are natural candidates. The domain-specific work is in defining the entities, relationships, and confidence calibration -- the architecture transfers.

What is the performance overhead of GraphRAG compared to standard RAG?

GraphRAG adds approximately 80-120 milliseconds of latency compared to standard vector-only RAG, due to the graph traversal and subgraph extraction steps. In exchange, it reduces hallucination rates by 67% and enables citation-level traceability. For our use case -- high-stakes financial and legal questions where accuracy matters more than speed -- the tradeoff is unambiguous. For low-stakes conversational use cases, standard RAG may be sufficient.

How do you handle contradictions in the graph?

Contradictions are flagged, not resolved automatically. When two sources provide conflicting information about the same entity, both versions are stored with their respective confidence scores, and the contradiction is surfaced to a human reviewer. The system presents the higher-confidence version to users but discloses the disagreement: "According to IRS Publication 519 (confidence: 0.95), you are eligible. However, a 2024 tax court ruling (confidence: 0.78) suggests an exception may apply. Consult a tax professional for your specific situation."

Published September 20, 2025. Based on the architecture of a domain-specific knowledge graph built at a YC-backed tax-tech startup serving 47 million potential users in the US immigrant population.