Blog | Entur Data Docs

Using BigQuery ML to categorise email inquiries

July 3, 2025 · 2 min read

Problem description

We want to categorize emails in order to either generate automatic responses, or decrease the amount of manual labor needed.

The Data

We have extracted and anonymized actual inquiries received from Enturs Kundesenter, and their corresponding category.

The model

To create (and train) the model we simply run:

CREATE OR REPLACE MODEL
  `entur-analytics-rtd.hackathon_2023_q1_lag4.auto_ml_all_top4` OPTIONS ( 
        model_type='AUTOML_CLASSIFIER',
        input_label_cols=['label'],
        budget_hours=1
        ) AS
SELECT
  ML.ngrams(REGEXP_EXTRACT_ALL(LOWER(Email), '[a-zæøå]+')) AS words,
  operation AS label
FROM
  `entur-analytics-rtd.hackathon_2023_q1_lag4.emails_top4`
WHERE
  operation IS NOT NULL

where we specify the column(s) we want to predict. This query creates a BigQuery ML model that we can query to obtain predictions on new data. We could for instance predict the category for the email: "Jeg ønsker å avbestille min billett fra Oslo til Bergen" in the following manner:

select * from ML.PREDICT(MODEL `entur-analytics-rtd.hackathon_2023_q1_lag4.logreg_train_top4`, 
  (
    select ML.NGRAMS(REGEXP_EXTRACT_ALL(LOWER("Jeg ønsker å avbestille min billett fra Oslo til Bergen"), '[a-zæøå]+'), [1, 2]) as words
  )
)

which returns the likelihood of each category from the model. In this case the model correctly predicts the category "avbestilling/refusjon" with a 40 % likelihood.

Kafka migration to VPC networks

July 3, 2025 · 3 min read

Virtual Private Cloud (VPC) peering is a method of connecting separate cloud(AWS, Google Cloud, or Azure) private networks with each other. It allows virtual machines in different private networks to talk to each other directly without going through the public Internet.

Aiven's VPC peering allows only private networks in the same cloud provider to talk to each other without going through the public internet. Which means that our Azure cloud users can only access Kafka services that are migrated to Google VPC network via only public URLs

Kafka platform changes

Team Data Platform and Team Platform have created the necessary VPC resources to migrate existing internal Kafka clusters to VPC networks.
Kafka clusters serving external Kafka users are NOT migrated and their usage remains same as before
1. entur-kafka-test-ext
2. entur-kafka-prod-ext
This migration has no impact on all Entur applications running in GKE clusters in dev, staging and production environments
Following are the clusters that will be migrated.
1. entur-kafka-test-int
2. entur-kafka-prod-int
Following are the Kafka users affected by this migration. Public URLs are created with a public- prefix to the existing bootstrap and schema registry server URLs for these users.
1. Entur applications running in other cloud networks like Azure Cloud etc
2. CI/CD applications
3. Entur developers
➡️ Switching to public URLs is mandatory after switching to VPC networks for the above users as the old/existing URLs are assigned to the private networks
The same Kafka user credentials should work as before

Kafka services URL lookup

Cluster ID	Bootstrap Server URL	Schema Registry Server URL	Access	VPC
`entur-kafka-test-ext`	`entur-kafka-test-ext-entur-test.aivencloud.com:11877`	`https://entur-kafka-test-ext-entur-test.aivencloud.com:11867`	public	no
`entur-kafka-prod-ext`	`entur-kafka-prod-ext-entur-prod.aivencloud.com:14019`	`https://entur-kafka-prod-ext-entur-prod.aivencloud.com:14009`	public	no
`entur-kafka-test-int`	`entur-kafka-test-int-entur-test.aivencloud.com:11877`	`https://entur-kafka-test-int-entur-test.aivencloud.com:11867`	private	yes
`entur-kafka-test-int`	`public-entur-kafka-test-int-entur-test.aivencloud.com:11878`	`https://public-entur-kafka-test-int-entur-test.aivencloud.com:11867`	public	yes
`entur-kafka-prod-int`	`entur-kafka-prod-int-entur-prod.aivencloud.com:14019`	`https://entur-kafka-prod-int-entur-prod.aivencloud.com:14009`	private	yes
`entur-kafka-prod-int`	`public-entur-kafka-prod-int-entur-prod.aivencloud.com:14020`	`https://public-entur-kafka-prod-int-entur-prod.aivencloud.com:14009`	public	yes

Changes in usage with `entur-kafka-spring-starter` library

There is no change in default configuration as long as your application is accessing

External Kafka clusters that are not migrated to VPC networks
Internal Kafka clusters accessing from Entur's GKE clusters

Here is an example configuration change for accessing entur-kafka-test-int cluster where one has to use public URLs with entur-kafka-spring-starter library.

entur:
  kafka:
    bootstrapServer: "public-entur-kafka-test-int-entur-test.aivencloud.com:11878"
    schemaRegistryUrl: "https://public-entur-kafka-test-int-entur-test.aivencloud.com:11867"
    schemaRegistryBasicAuth: "${KAFKA_USER_NAME}:${KAFKA_USER_PASSWORD}"
    sasl:
      username: "${KAFKA_USER_NAME}"
      password: "${KAFKA_USER_PASSWORD}"

Migration strategy

All Kafka applications accessing from outside Entur's GKE clusters can start using public URLs
1. Azure Cloud users
2. CI/CD pipeline users
3. Entur developers
entur-kafka-test-int cluster will be migrated to VPC network
entur-kafka-prod-int cluster will be migrated to VPC network in the end.

Problem description​

The Data​

The model​

Kafka platform changes​

Kafka services URL lookup​

Changes in usage with entur-kafka-spring-starter library​

Migration strategy​