Conversation
…CI/CD - OpenAI streaming chat endpoint with get_events and provide_links tools - Server actions for knowledge CRUD, reseed, import, usage metrics - Event filtering/formatting utilities with timezone-aware LA time handling - System prompt builder with profile-aware personalization and prefix caching - Vector search context retrieval with retry/backoff - Knowledge base JSON (55 entries: FAQ, tracks, judging, submission, general) - CI/CD seed scripts for hackbot_knowledge to hackbot_docs - Auth session extended with position, is_beginner, name fields - Tailwind hackbot-slide-in animation keyframe - Dependencies: ai@6, @ai-sdk/openai
|
Closes #441 |
…o hackbot-server-core
There was a problem hiding this comment.
Pull request overview
Introduces the initial “HackBot” server core: a streaming chat API route backed by OpenAI + MongoDB vector search, plus supporting hackbot utilities/actions and deployment seeding.
Changes:
- Added
/api/hackbot/streamstreaming endpoint withget_events/provide_linkstools and custom data-stream output. - Implemented hackbot utilities (system prompt builder, event filtering/formatting, retry/backoff, embeddings) + server actions for knowledge/metrics.
- Added CI/CD seed scripts and workflow steps to embed knowledge docs into
hackbot_docs; added new AI SDK dependencies.
Reviewed changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| tailwind.config.ts | Adds hackbot-slide-in keyframe + animation for HackBot UI. |
| scripts/hackbotSeedCI.mjs | CI seeding script to upsert knowledge docs into hackbot_docs with embeddings. |
| scripts/hackbotSeed.mjs | Local interactive seeding script for hackbot docs. |
| package.json | Adds hackbot:seed script + AI SDK dependencies. |
| package-lock.json | Locks new dependency graph for ai / @ai-sdk/openai and transitive deps. |
| auth.ts | Extends NextAuth user/session/jwt fields for HackBot personalization. |
| app/_types/hackbot.ts | Adds shared HackBot types for docs/messages/events/links. |
| app/_data/hackbot_knowledge_import.json | Adds initial knowledge base content for importing. |
| app/(api)/api/hackbot/stream/route.ts | Implements streaming HackBot endpoint + get_events and provide_links tools. |
| app/(api)/_utils/hackbot/systemPrompt.ts | Adds system prompt builder with profile/page-context personalization and caching strategy. |
| app/(api)/_utils/hackbot/retryWithBackoff.ts | Adds retry/backoff helper used by vector search embedding step. |
| app/(api)/_utils/hackbot/eventFormatting.ts | Adds LA-timezone-aware date parsing/formatting helpers. |
| app/(api)/_utils/hackbot/eventFiltering.ts | Adds profile relevance/recommendation and time-filtering helpers. |
| app/(api)/_utils/hackbot/embedText.ts | Adds embedding helper using the ai SDK + OpenAI embedding model. |
| app/(api)/_datalib/hackbot/getHackbotContext.ts | Adds vector-search context retrieval from hackbot_docs. |
| app/(api)/_actions/hackbot/saveKnowledgeDoc.ts | Adds server action to create/update knowledge docs + embeddings. |
| app/(api)/_actions/hackbot/reseedHackbot.ts | Adds server action to re-embed all knowledge docs into hackbot_docs. |
| app/(api)/_actions/hackbot/importKnowledgeDocs.ts | Adds server action to bulk import knowledge docs + embeddings. |
| app/(api)/_actions/hackbot/getUsageMetrics.ts | Adds server action to aggregate token usage metrics. |
| app/(api)/_actions/hackbot/getKnowledgeDocs.ts | Adds server action to list knowledge docs for admin UI. |
| app/(api)/_actions/hackbot/getHackerProfile.ts | Adds server action to read profile fields from session for prompt personalization. |
| app/(api)/_actions/hackbot/deleteKnowledgeDoc.ts | Adds server action to delete knowledge docs and embedded docs. |
| app/(api)/_actions/hackbot/clearKnowledgeDocs.ts | Adds server action to clear knowledge + embedded docs. |
| .github/workflows/staging.yaml | Runs hackbot seeding during deploy and syncs OpenAI-related env vars. |
| .github/workflows/production.yaml | Runs hackbot seeding during deploy and syncs OpenAI-related env vars. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0387f1d to
3e4a8ff
Compare
|
Refactored How to run the no UI test: In a new terminal, run pretty stream test: It needs in the .env to work, unless you want a specific email/pwd to be used, which you can pass in by |
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 34 out of 35 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const sanitizedMessages: HackbotClientMessage[] = []; | ||
| for (const message of messages) { | ||
| const role = message?.role; | ||
| const content = message?.content; | ||
| if ( | ||
| !ALLOWED_MESSAGE_ROLES.has(role) || | ||
| typeof content !== 'string' || | ||
| !content.trim() | ||
| ) { | ||
| return Response.json( | ||
| { error: 'Invalid message history format.' }, | ||
| { status: 400 } | ||
| ); | ||
| } | ||
| sanitizedMessages.push({ | ||
| role: role as 'user' | 'assistant', | ||
| content, | ||
| }); | ||
| } |
There was a problem hiding this comment.
Only the last message is capped by length, but earlier messages entries can be arbitrarily long / numerous. That allows a client to send a huge history that is fully parsed/validated server-side even though you later slice to the last N messages, creating an avoidable memory/CPU/latency vector. Consider enforcing (1) a max messages.length and (2) a max content length for every message (or at least a total payload char budget) during validation.
| } | ||
|
|
||
| export function isSimpleGreetingMessage(content: string): boolean { | ||
| return /^(hi|hello|hey|thanks|thank you|ok|okay)\b/i.test(content.trim()); |
There was a problem hiding this comment.
isSimpleGreetingMessage matches any message that merely starts with "thanks"/"ok"/etc (e.g. "ok what’s the judging rubric?"), which will be misclassified as a greeting and will skip context retrieval. Tighten this to only treat messages as simple greetings when the entire content is just a greeting/ack (optionally with punctuation), or add a max-word/length constraint.
| return /^(hi|hello|hey|thanks|thank you|ok|okay)\b/i.test(content.trim()); | |
| const trimmed = content.trim(); | |
| const normalized = trimmed.replace(/[.!?]+$/, ''); | |
| return /^(hi|hello|hey|thanks|thank you|ok|okay)$/i.test(normalized); |
| function buildSearchPattern(search: string): string { | ||
| const q = search.trim(); | ||
| if (!q) return q; | ||
|
|
||
| // Day 2 meal phrasing is often "lunch" in user language, but schedule uses | ||
| // "brunch". Include both so meal queries still resolve correctly. | ||
| if (/\blunch\b/i.test(q)) { | ||
| return q.replace(/\blunch\b/gi, '(lunch|brunch)'); | ||
| } | ||
|
|
||
| return q; |
There was a problem hiding this comment.
buildSearchPattern returns user-/LLM-controlled text that is passed directly into MongoDB $regex. This allows regex meta-characters (and potentially catastrophic backtracking patterns) to be injected into the query, which can become a DoS/perf issue. Escape regex special characters in search before building the pattern (and then apply the lunch→(lunch|brunch) expansion on the escaped form).
| function buildSearchPattern(search: string): string { | |
| const q = search.trim(); | |
| if (!q) return q; | |
| // Day 2 meal phrasing is often "lunch" in user language, but schedule uses | |
| // "brunch". Include both so meal queries still resolve correctly. | |
| if (/\blunch\b/i.test(q)) { | |
| return q.replace(/\blunch\b/gi, '(lunch|brunch)'); | |
| } | |
| return q; | |
| function escapeRegex(input: string): string { | |
| // Escape characters with special meaning in regular expressions so that | |
| // user-controlled input is treated as literal text in MongoDB $regex queries. | |
| return input.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); | |
| } | |
| function buildSearchPattern(search: string): string { | |
| const trimmed = search.trim(); | |
| if (!trimmed) return trimmed; | |
| // First escape all regex meta-characters so the user input is treated literally. | |
| const escaped = escapeRegex(trimmed); | |
| // Day 2 meal phrasing is often "lunch" in user language, but schedule uses | |
| // "brunch". Include both so meal queries still resolve correctly. | |
| if (/\blunch\b/i.test(escaped)) { | |
| // Apply the lunch→(lunch|brunch) expansion on the escaped form. | |
| return escaped.replace(/\blunch\b/gi, '(lunch|brunch)'); | |
| } | |
| return escaped; |
| const typeFiltered = include_activities | ||
| ? dateFiltered | ||
| : dateFiltered.filter( | ||
| (ev: any) => (ev.type ?? '').toUpperCase() !== 'ACTIVITIES' | ||
| ); | ||
|
|
There was a problem hiding this comment.
If the model calls get_events with type: "ACTIVITIES" but leaves include_activities as null/false, the query will fetch only ACTIVITIES from Mongo and then typeFiltered will immediately filter them all out, returning empty results. Consider treating type === "ACTIVITIES" as implicitly include_activities: true (or changing the post-filter to only exclude activities when type is not ACTIVITIES).
| const typeFiltered = include_activities | |
| ? dateFiltered | |
| : dateFiltered.filter( | |
| (ev: any) => (ev.type ?? '').toUpperCase() !== 'ACTIVITIES' | |
| ); | |
| // Treat an explicit request for ACTIVITIES as implicitly including activities, | |
| // even if include_activities was not set to true. | |
| const activitiesRequested = (type ?? '').toUpperCase() === 'ACTIVITIES'; | |
| const typeFiltered = | |
| include_activities || activitiesRequested | |
| ? dateFiltered | |
| : dateFiltered.filter( | |
| (ev: any) => (ev.type ?? '').toUpperCase() !== 'ACTIVITIES' | |
| ); |
| ])}\n` | ||
| ); | ||
| } else if (part?.type === 'tool-result') { | ||
| if (textHasBeenOutput) suppressText = true; |
There was a problem hiding this comment.
Text suppression only activates after a tool-result if some text was already emitted (textHasBeenOutput). If the model ever does a bare tool call first (or emits tool results before its intro sentence), any subsequent text deltas will still stream and can violate the “no text after tool results” UI contract. Consider suppressing text after any tool result (or at least after get_events / provide_links results) regardless of prior output.
| if (textHasBeenOutput) suppressText = true; | |
| // Suppress any subsequent text after a tool result to honor the | |
| // "no text after tool results" UI contract. | |
| suppressText = true; |
| @@ -0,0 +1,271 @@ | |||
| import type { HackerProfile } from '@typeDefs/hackbot'; | |||
|
|
|||
| // TODO: StarterKit id's need to be updated | |||
There was a problem hiding this comment.
Spelling/grammar: "id's" should be "IDs" (no apostrophe).
| // TODO: StarterKit id's need to be updated | |
| // TODO: StarterKit IDs need to be updated |
| const { model, maxOutputTokens } = getModelConfig(); | ||
|
|
||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| const result = streamText({ | ||
| model: openai(model) as any, | ||
| messages: chatMessages.map((m: any) => ({ | ||
| role: m.role as 'system' | 'user' | 'assistant', | ||
| content: m.content, | ||
| })), | ||
| maxOutputTokens, | ||
| stopWhen: shouldStopStreaming, | ||
| tools: { | ||
| get_events: tool({ | ||
| description: | ||
| 'Fetch the live HackDavis event schedule from the database. Use this for ANY question about event times, locations, schedule, or what is happening when.', | ||
| inputSchema: GET_EVENTS_INPUT_SCHEMA, | ||
| execute: (input) => | ||
| executeGetEvents(input, profile, lastMessage.content), | ||
| }), | ||
| provide_links: tool({ | ||
| description: PROVIDE_LINKS_DESCRIPTION, | ||
| inputSchema: PROVIDE_LINKS_INPUT_SCHEMA, | ||
| execute: executeProvideLinks, | ||
| }), | ||
| }, | ||
| }); | ||
|
|
||
| const stream = createResponseStream(result, model); |
There was a problem hiding this comment.
streamText is typically async (returns a Promise). Here it’s called without await, so result may be a Promise and createResponseStream will later try to iterate result.fullStream, causing a runtime failure. If streamText is async in the ai@6 version used here, change this to const result = await streamText(...) before passing it to createResponseStream.
HackBot server core: API route, actions, utils, types, data, and CI/CD
What's Added:
All work was done on
hackbotbranch and moved to this one for PR purposes