Skip to content

BigQuery Logging for BioMCP

This document outlines how BioMCP uses Google BigQuery for logging user interactions and API usage.

Overview

BioMCP integrates with Google BigQuery to log user interactions, queries, and API usage. This logging provides valuable insights into how the system is being used, helps with debugging, and enables analytics for improving the service.

Prerequisites

  • A Google Cloud Platform (GCP) account
  • A BigQuery dataset and table created in your GCP project
  • A GCP service account with BigQuery permissions

Setting Up BigQuery for BioMCP

  1. Create a BigQuery Dataset and Table

  2. In the Google Cloud Console, navigate to BigQuery

  3. Create a new dataset (e.g., biomcp_logs)
  4. Create a table within the dataset (e.g., worker_logs) with the following schema:
    timestamp: TIMESTAMP
    userEmail: STRING
    query: STRING
    
  5. Adjust the schema as needed for your specific logging requirements

  6. Create a Service Account

  7. Navigate to "IAM & Admin" > "Service Accounts" in the Google Cloud Console

  8. Create a new service account with a descriptive name (e.g., biomcp-bigquery-logger)
  9. Assign the "BigQuery Data Editor" role to the service account
  10. Create and download a JSON key for the service account

  11. Configure BioMCP with BigQuery Credentials

  12. Open wrangler.toml in the BioMCP project

  13. Update the following variables with your BigQuery information:
    BQ_PROJECT_ID = "your-gcp-project-id"
    BQ_DATASET = "biomcp_logs"
    BQ_TABLE = "worker_logs"
    
  14. For the service account key, use Cloudflare's secret management:
    npx wrangler secret put BQ_SA_KEY_JSON
    
    When prompted, paste the entire JSON content of your service account key file

How BigQuery Logging Works

The BioMCP worker uses the following process to log data to BigQuery:

  1. Authentication: The worker generates a JWT token using the service account credentials
  2. Token Exchange: The JWT is exchanged for a Google OAuth access token
  3. Data Insertion: The worker uses BigQuery's streaming insert API to log events

The implementation includes:

  • Token caching to minimize authentication requests
  • Error handling for failed logging attempts
  • Automatic retry logic for transient failures

Logged Information

By default, the following information is logged to BigQuery:

  • timestamp: When the event occurred
  • userEmail: The email address of the authenticated user (if available)
  • query: The query or request that was made

You can extend the logging schema to include additional information as needed.

Accessing and Analyzing Logs

To access and analyze the logs:

  1. Query the BigQuery Table

  2. Use the BigQuery console or SQL to query your logs

  3. Example query to see recent logs:

    SELECT timestamp, userEmail, query
    FROM `your-project.biomcp_logs.worker_logs`
    ORDER BY timestamp DESC
    LIMIT 100
    

  4. Create Visualizations

  5. Use Google Data Studio to create dashboards based on your BigQuery data

  6. Connect Data Studio to your BigQuery table and create visualizations

Security Considerations

  • The service account key is sensitive information and should be protected
  • Use Cloudflare's secret management to store the key securely
  • Consider implementing field-level encryption for sensitive data
  • Implement data retention policies to comply with privacy regulations
  • IMPORTANT: Never include PHI (Protected Health Information) or PII (Personally Identifiable Information) in queries or logs
  • Ensure all queries are sanitized to remove patient identifiers, medical record numbers, and other sensitive information
  • Consider implementing automatic redaction of potential PHI/PII from logs
  • Regularly audit logs to ensure compliance with HIPAA and other privacy regulations
  • Remember that BigQuery logs are not designed for storing protected health information

Troubleshooting

  • Authentication Failures: Verify that the service account key is correctly formatted and has the necessary permissions
  • Insertion Errors: Check that the BigQuery table schema matches the data being inserted
  • Missing Logs: Ensure that the worker has network access to the BigQuery API

Example Code

The worker includes the following key functions for BigQuery logging:

  • getBQToken(): Fetches and caches a BigQuery OAuth token
  • insertEvent(): Inserts a single row into BigQuery via streaming insert

These functions handle the authentication and data insertion process automatically.