🏆 2020 Innovation Bronze Prize  ·  1 National Patent
🧠

NLP Center Platform

Taiping Financial Technology (Shanghai) Co., Ltd. — Fortune Global 500

Sep 2019 – Mar 2022  ·  Role: NLP Platform Technical Project Manager

01
Project Overview
One-stop intelligent content service platform for a Fortune 500 insurance group

Problem Statement

Knowledge was fragmented across 8+ siloed systems — internal portal, OA, multiple knowledge bases, and data platforms — with no unified search or recommendation layer. Agents and staff had to switch between apps repeatedly, losing productivity. No intelligence layer existed to surface relevant content proactively.

Siloed knowledge No smart search Zero personalization Manual content mgmt

Solution & Mandate

Build an enterprise-grade NLP platform — branded as "Taiping Encyclopedia" — that crawls, indexes, and semantically understands content from all internal and external sources, then surfaces it intelligently across all user touchpoints (mobile, PC portal, OA) via smart search and AI recommendation.

Unified search layer AI recommendations Multi-app integration Innovation award winner
02
High-Level Architecture
6-layer platform: data sources → ingestion → storage → NLP services → application → users
USER LAYER
👔 40K Internal Staff
Staff Mobile App
🤝 400K+ Agents
Agent Sales App
👥 Customers
Customer App / WeChat
🖥️ PC Intranet
Internal Portal
↕ REST API · WebSocket · Lightweight SDK Integration
APP SERVICES
🔍 Smart Search
Intent + Full-text
✨ Recommendations
Daily Picks · Hot Rank
🎙️ Voice Input
ASR / TTS
💬 Smart Q&A
FAQ + NLU dialog
🃏 Card Display
Unified Content Card
↕ Internal Microservice Bus
NLP ENGINE
🎯 Intent Recognition
BERT-based routing
📊 BM25+ Ranking
Finance-domain tuned
🔗 Semantic Matching
Vector embeddings
🧩 CF Recommendation
Collab filtering
😊 Emotion-aware Rec
R&D innovation
↕ Data Access Layer · Permission Control · Content Penetration API
DATA STORES
🔎 ES Index
Full-text search
📐 Vector DB
Semantic embeddings
🕸️ Graph DB
Content relations
🏷️ Tag & Label DB
Content taxonomy
👁️ Behavior DB
User action logs
🔐 Auth Store
Org-level permissions
↕ Content Penetration Framework (Proprietary)
DATA SOURCES
🏢 Intranet Portal
📋 Workflow System
📚 Knowledge Bases
📈 Data Platform
🌐 External Sources
⚙️ Business Systems
Infrastructure: GPU Servers (inference) ElasticSearch Cluster MySQL · Redis MinIO Object Storage Kubernetes / Docker
03
Data Dashboard & Key Metrics
Platform-level analytics showing scale, usage, and business impact
NLP Platform — Operations Dashboard FY2021 Projections & Actuals
🔍
25,000
Daily Active Users (DAU)
↑ Primary staff portal (pilot)
~9.9M
Projected Annual Searches
↑ 4 searches + 1 rec / user / day
⏱️
33 min
Monthly Time Saved / User
↑ 33% info-retrieval time cut
💰
¥25.83M
Estimated Annual Labour Saving
↑ 21,000 person-days freed
User Base by Touchpoint
Agents
400K+
Internal Staff
40K
Staff Portal DAU
25K
Staff Portal MAU
35.6K
Data Source Coverage
Internal Portal
90%
Workflow System
75%
Knowledge Bases
100%
Data Platform
60%
Recommendation Strategy Mix
4 Types
User feature-based (33%)
Behaviour history (25%)
Similar audience CF (22%)
Content correlation (19%)
Project Resource Allocation
Internal (PM+Dev)
44 PM
Vendor (Algo)
62 PM
Vendor (Dev)
62 PM
Total Budget
¥3.2M
04
Core NLP API Modules
6 microservice APIs forming the platform's intelligence layer
API Module Category Core Technology Description Consumers
/search/semantic
Search ES + BM25 (finance-tuned) + vector similarity Multi-source full-text + semantic search with intent routing All 4 touchpoints
/intent/classify
NLP BERT fine-tuned on insurance domain Query intent recognition → routes to Q&A, search, or report lookup Search gateway
/recommend/feed
Rec Engine Collaborative filtering + content-based + feature correlation Personalised content feed using 4-strategy ensemble model Mobile apps, portal
/content/penetrate
Infra Custom crawl framework + permission binding Proprietary content penetration — crawls source systems without exposing raw access; permission-bound at org level. Indexing pipeline
/tags/extract
NLP Jieba + domain lexicon + TF-IDF Auto-extracts semantic tags and entity labels from indexed content Indexing pipeline
/behaviour/track
Infra Event probe + Kafka + Flink (streaming) Real-time user behaviour ingestion across all apps — feeds recommendation model Rec engine, analytics
05
Integrated Systems & Data Sources
8 front-end touchpoints + 6 back-end data sources = enterprise-wide coverage

Front-End Touchpoints (Consumer Systems)

📱
Staff Mobile App
Internal staff mobile app · 35,000+ users · Primary pilot touchpoint
💼
Agent Sales App
Agent-facing mobile app · 400K+ agents · Search + business data recommendation
🛡️
Customer App
Customer-facing mobile app · Content ecosystem + policy search
🖥️
Internal Web Portal
PC intranet portal · Embedded search widget + recommendation cards

Back-End Data Sources (10+ Feeds)

📋
Workflow & Document System
Internal documents, announcements, policies, approvals
📚
Specialist Knowledge Bases
Product knowledge, compliance, training materials
📈
Internal Data Platform
Business KPIs, reporting dashboards, performance metrics
🌐
External Authoritative Sources
Regulatory bodies, industry news, insurance standards
🏢
Intranet Portal + Additional Systems
Intranet articles, HR systems, legacy knowledge repositories
06
ROI & Business Value
Quantified cost savings and strategic value across all user segments

Annual Savings Calculation (Primary Portal Pilot)

25,000 DAU × (1 rec @50s saved + 4 searches @10s saved) × 22 working days ÷ 60 = 33 min/user/month saved. Total = 21,000 person-days/year. At ¥360K/person-year all-in cost → ¥25.83M gross saving minus ¥2.29M project cost = ¥23.5M net ROI from primary portal alone.

33 min
Monthly time saved per user
33% info-retrieval cut
21K
Person-days freed annually
25K staff × 22 days × 33min
¥25.83M
Gross annual labour saving
Primary portal only
11.3×
Projected ROI (Year 1)
Net saving / project cost
07
Project Management Approach
Full project lifecycle ownership — strategy, planning, cross-team coordination, and delivery
Q4 2019 – Q1 2020
Platform Inception & Architecture Design
Stakeholder alignment with group VP, requirements from 3 mobile app teams, architecture design, team formation (44 person-months internal staff).
Q2 2020 – Q4 2020
Core Platform Build & Innovation Prize
Content penetration framework, NLP engine, ES cluster setup. Won 2020 Innovation Bronze Prize.
Q1 2021 – Q3 2021
Pilot Launch on Primary Staff Portal
Integrated with primary staff portal (35K users). Deployed smart search + recommendation engine. Embedded behaviour tracking probes. Formalised business requirements to workflow & portal teams.
Q4 2021 – Q1 2022
Expansion & Handover
Rolled out recommendation engine v2 (4-strategy ensemble). Began agent sales app integration. Established operational runbooks and handed over to successor team.

Team & Governance

Internal core team8 engineers + PM
Vendor algo engineers5 FTEs (62 PM)
Vendor dev engineers5 FTEs (62 PM)
Total budget¥3.2M
Stakeholder groups5 business units
🎯
Stakeholder Management
Coordinated with group VP, 3 app teams, OA, intranet, and data platform. Delivered formal requirements to 5 related systems in Q3 2021.
⚠️
Risk Management
Identified schedule risk from 3rd-party system dependencies; mitigated by early formal BRD submission and bi-weekly sync cadence. Technical risk (new algo territory) mitigated via external benchmarking + A/B routing.
🔄
Agile + Milestone Hybrid
Used sprint-based internal development with quarterly milestone gates for exec reporting and budget review.
08
Technology Stack
Full-stack NLP platform built on open-source and proprietary AI components
🔍
ElasticSearch
Search & Indexing
🐍
Python
NLP + Backend
🧠
BERT / Transformers
Intent Recognition
📐
Vector DB (Milvus)
Semantic Search
🕸️
Neo4j / Graph DB
Knowledge Graph
Kafka + Flink
Event Streaming
🗄️
MySQL + Redis
Storage & Cache
🏗️
Kubernetes
Orchestration
🖥️
GPU Servers
Model Inference
✂️
Jieba + Domain Lex
Chinese NLP
📊
Spring Boot
API Gateway
🎙️
ASR / TTS
Voice Input