Using BigQuery ML to categorise email inquiries
Problem description
We want to categorize emails in order to either generate automatic responses, or decrease the amount of manual labor needed.
The Data
We have extracted and anonymized actual inquiries received from Enturs Kundesenter, and their corresponding category.
The model
To create (and train) the model we simply run:
CREATE OR REPLACE MODEL
`entur-analytics-rtd.hackathon_2023_q1_lag4.auto_ml_all_top4` OPTIONS (
model_type='AUTOML_CLASSIFIER',
input_label_cols=['label'],
budget_hours=1
) AS
SELECT
ML.ngrams(REGEXP_EXTRACT_ALL(LOWER(Email), '[a-zæøå]+')) AS words,
operation AS label
FROM
`entur-analytics-rtd.hackathon_2023_q1_lag4.emails_top4`
WHERE
operation IS NOT NULL
where we specify the column(s) we want to predict. This query creates a BigQuery ML model that we can query to obtain predictions on new data. We could for instance predict the category for the email: "Jeg ønsker å avbestille min billett fra Oslo til Bergen" in the following manner:
select * from ML.PREDICT(MODEL `entur-analytics-rtd.hackathon_2023_q1_lag4.logreg_train_top4`,
(
select ML.NGRAMS(REGEXP_EXTRACT_ALL(LOWER("Jeg ønsker å avbestille min billett fra Oslo til Bergen"), '[a-zæøå]+'), [1, 2]) as words
)
)
which returns the likelihood of each category from the model. In this case the model correctly predicts the category "avbestilling/refusjon" with a 40 % likelihood.