BigQuery Logging for BioMCP¶
This document outlines how BioMCP uses Google BigQuery for logging user interactions and API usage.
Overview¶
BioMCP integrates with Google BigQuery to log user interactions, queries, and API usage. This logging provides valuable insights into how the system is being used, helps with debugging, and enables analytics for improving the service.
Prerequisites¶
- A Google Cloud Platform (GCP) account
- A BigQuery dataset and table created in your GCP project
- A GCP service account with BigQuery permissions
Setting Up BigQuery for BioMCP¶
-
Create a BigQuery Dataset and Table
-
In the Google Cloud Console, navigate to BigQuery
- Create a new dataset (e.g.,
biomcp_logs
) - Create a table within the dataset (e.g.,
worker_logs
) with the following schema: -
Adjust the schema as needed for your specific logging requirements
-
Create a Service Account
-
Navigate to "IAM & Admin" > "Service Accounts" in the Google Cloud Console
- Create a new service account with a descriptive name (e.g.,
biomcp-bigquery-logger
) - Assign the "BigQuery Data Editor" role to the service account
-
Create and download a JSON key for the service account
-
Configure BioMCP with BigQuery Credentials
-
Open
wrangler.toml
in the BioMCP project - Update the following variables with your BigQuery information:
- For the service account key, use Cloudflare's secret management: When prompted, paste the entire JSON content of your service account key file
How BigQuery Logging Works¶
The BioMCP worker uses the following process to log data to BigQuery:
- Authentication: The worker generates a JWT token using the service account credentials
- Token Exchange: The JWT is exchanged for a Google OAuth access token
- Data Insertion: The worker uses BigQuery's streaming insert API to log events
The implementation includes:
- Token caching to minimize authentication requests
- Error handling for failed logging attempts
- Automatic retry logic for transient failures
Logged Information¶
By default, the following information is logged to BigQuery:
- timestamp: When the event occurred
- userEmail: The email address of the authenticated user (if available)
- query: The query or request that was made
You can extend the logging schema to include additional information as needed.
Accessing and Analyzing Logs¶
To access and analyze the logs:
-
Query the BigQuery Table
-
Use the BigQuery console or SQL to query your logs
-
Example query to see recent logs:
-
Create Visualizations
-
Use Google Data Studio to create dashboards based on your BigQuery data
- Connect Data Studio to your BigQuery table and create visualizations
Security Considerations¶
- The service account key is sensitive information and should be protected
- Use Cloudflare's secret management to store the key securely
- Consider implementing field-level encryption for sensitive data
- Implement data retention policies to comply with privacy regulations
- IMPORTANT: Never include PHI (Protected Health Information) or PII (Personally Identifiable Information) in queries or logs
- Ensure all queries are sanitized to remove patient identifiers, medical record numbers, and other sensitive information
- Consider implementing automatic redaction of potential PHI/PII from logs
- Regularly audit logs to ensure compliance with HIPAA and other privacy regulations
- Remember that BigQuery logs are not designed for storing protected health information
Troubleshooting¶
- Authentication Failures: Verify that the service account key is correctly formatted and has the necessary permissions
- Insertion Errors: Check that the BigQuery table schema matches the data being inserted
- Missing Logs: Ensure that the worker has network access to the BigQuery API
Example Code¶
The worker includes the following key functions for BigQuery logging:
getBQToken()
: Fetches and caches a BigQuery OAuth tokeninsertEvent()
: Inserts a single row into BigQuery via streaming insert
These functions handle the authentication and data insertion process automatically.