使用SQL型Logistic回歸預測客戶流失
預測客戶流失可透過可操作的深入分析提高滿意度和忠誠度,協助企業保留客戶、最佳化資源及提高盈利能力。
探索如何使用SQL型Logistic回歸來預測客戶流失。 使用本全方位的SQL指南,根據關鍵行為量度(例如購買頻率、平均訂購值和上次購買間隔時間),將原始電子商務資料轉換為有意義的客戶深入分析。 本檔案涵蓋從資料準備和特徵工程到模型建立、評估和預測的整個過程。
使用本指南建立強大的流失預測模型,用於識別高風險客戶、改良保留策略,以及推動更好的業務決策。 其中包含逐步指示、SQL查詢和詳細解釋,協助您在資料環境中自信地套用機器學習技術。
快速入門
在建立流失模型之前,請務必探索關鍵客戶功能和資料需求。 以下各節概述精確模型訓練所需的基本客戶屬性和必要資料欄位。
定義客戶功能 define-customer-features
為了準確地分類流失率,此模型會分析購買習慣和趨勢。 下表概述模型中使用的主要客戶行為功能:
total_purchasestotal_revenueavg_order_valuecustomer_lifetimedays_since_last_purchasepurchase_frequency假設和必填欄位 assumptions-required-fields
若要產生客戶流失預測,此模型依賴webevents表格中擷取客戶交易詳細資料的主要欄位。 您的資料集必須包括下列欄位:
identityMap['ECID'][0].idproductListItems.priceTotal[0]productListItems.quantity[0]timestampcommerce.order.purchaseID資料集必須包含結構化歷史客戶交易記錄,每一列代表購買事件。 每個事件都必須包含與SQL DATEDIFF函式相容之適當日期 — 時間格式的時間戳記(例如YYYY-MM-DD HHSS)。 此外,每個記錄都必須在ECID欄位中包含有效的Experience Cloud ID (identityMap),才能唯一識別客戶。
建立模型 create-a-model
若要預測客戶流失,您必須建立以SQL為基礎的Logistic回歸模型,分析客戶購買記錄和行為量度。 模型會判斷客戶在過去90天內是否購買,以將其分類為churned或not churned。
使用SQL建立流失預測模型 sql-create-model
SQL型模型會處理webevents資料,方法是彙總關鍵量度,並根據90天非使用狀態規則指派流失標籤。 此方法可區分活躍客戶與高風險客戶。 SQL查詢還會執行功能工程來增強模型準確性並改善流失分類。 這些見解讓您的企業能夠實施針對性的保留策略、減少流失並最大化客戶期限價值。
DATEDIFF(CURRENT_DATE, MAX(timestamp)) > 90條件。使用下列SQL陳述式來建立具有指定功能和標籤的retention_model_logistic_reg模型:
CREATE MODEL retention_model_logistic_reg
TRANSFORM (
vector_assembler(array(total_purchases, total_revenue, avg_order_value, customer_lifetime, days_since_last_purchase, purchase_frequency)) features
-- Combines selected customer metrics into a feature vector for model training
)
OPTIONS (
MODEL_TYPE = 'logistic_reg', -- Specifies logistic regression as the model type
LABEL = 'churned' -- Defines the target label for churn classification
)
AS
WITH customer_features AS (
SELECT
identityMap['ECID'][0].id AS customer_id, -- Extract the unique customer ID from identityMap
AVG(COALESCE(productListItems.priceTotal[0], 0)) AS avg_order_value, -- Calculates the average order value, and handles null values with COALESCE
SUM(COALESCE(productListItems.priceTotal[0], 0)) AS total_revenue, -- The sum of all purchase values per customer
COUNT(COALESCE(productListItems.quantity[0], 0)) AS total_purchases, -- The total number of items purchased by the customer
DATEDIFF(MAX(timestamp), MIN(timestamp)) AS customer_lifetime, -- The days between first and last recorded purchase
DATEDIFF(CURRENT_DATE, MAX(timestamp)) AS days_since_last_purchase, -- The days since the last purchase event
COUNT(DISTINCT CONCAT(YEAR(timestamp), MONTH(timestamp))) AS purchase_frequency -- The count of unique months with purchases
FROM
webevents
WHERE EXISTS(productListItems, value -> value.priceTotal > 0) -- Filters transactions with valid total price
AND commerce.`order`.purchaseID <> '' -- Ensures the order has a valid purchase ID
GROUP BY customer_id
),
customer_labels AS (
SELECT
identityMap['ECID'][0].id AS customer_id, -- Extract the unique customer ID for labeling
CASE
WHEN DATEDIFF(CURRENT_DATE, MAX(timestamp)) > 90 THEN 1 -- Marks the customer as churned if no purchase occurred in the last 90 days
ELSE 0
END AS churned
FROM
webevents
WHERE EXISTS(productListItems, value -> value.priceTotal > 0)
AND commerce.`order`.purchaseID <> ''
GROUP BY customer_id
)
SELECT
f.customer_id,
f.total_purchases,
f.total_revenue,
f.avg_order_value,
f.customer_lifetime,
f.days_since_last_purchase,
f.purchase_frequency,
l.churned
FROM
customer_features f
JOIN
customer_labels l
ON f.customer_id = l.customer_id -- Join features with churn labels
ORDER BY RANDOM() -- Shuffles rows randomly for training
LIMIT 500000; -- Limit the dataset to 500,000 rows for model training
模型輸出 model-output
輸出資料集包含客戶相關量度及其流失狀態。 每一列代表客戶、其功能值及其流失狀態。 您可以使用此輸出來分析客戶行為、訓練預測模型並開發目標性的保留策略以保留有風險的客戶。 輸出表範例如下所示:
customer_id | total_purchases | total_revenue | avg_order_value | customer_lifetime | days_since_last_purchase | purchase_frequency | churned |
|--------------+-----------------+---------------+------------------+-------------------+--------------------------+--------------------+----------
100001 | 25 | 1250.00 | 50.00 | 540 | 20 | 10 | 0
100002 | 3 | 90.00 | 30.00 | 120 | 95 | 1 | 1
100003 | 60 | 7200.00 | 120.00 | 800 | 5 | 24 | 0
100004 | 15 | 750.00 | 50.00 | 365 | 60 | 8 | 0
100005 | 1 | 25.00 | 25.00 | 60 | 180 | 1 | 1
churned使用SQL評估模型 model-evaluation
接下來,評估流失預測模型,以判斷其在識別風險客戶方面的有效性。 使用測量精確度和可靠性的關鍵量度來評估模型效能。
若要測量retention_model_logistic_reg模型預測客戶流失的準確度,請使用model_evaluate函式。 以下SQL範例會使用如訓練資料結構化的資料集來評估模型:
SELECT *
FROM model_evaluate(retention_model_logistic_reg, 1,
WITH customer_features AS (
SELECT
identityMap['ECID'][0].id AS customer_id,
AVG(COALESCE(productListItems.priceTotal[0], 0)) AS avg_order_value,
SUM(COALESCE(productListItems.priceTotal[0], 0)) AS total_revenue,
COUNT(COALESCE(productListItems.quantity[0], 0)) AS total_purchases,
DATEDIFF(MAX(timestamp), MIN(timestamp)) AS customer_lifetime,
DATEDIFF(CURRENT_DATE, MAX(timestamp)) AS days_since_last_purchase,
COUNT(DISTINCT CONCAT(YEAR(timestamp), MONTH(timestamp))) AS purchase_frequency
FROM
webevents
WHERE EXISTS(productListItems, value -> value.priceTotal > 0)
AND commerce.`order`.purchaseID <> ''
GROUP BY customer_id
),
customer_labels AS (
SELECT
identityMap['ECID'][0].id AS customer_id,
CASE
WHEN DATEDIFF(CURRENT_DATE, MAX(timestamp)) > 90 THEN 1
ELSE 0
END AS churned
FROM
webevents
WHERE EXISTS(productListItems, value -> value.priceTotal > 0)
AND commerce.`order`.purchaseID <> ''
GROUP BY customer_id
)
SELECT
f.customer_id,
f.total_purchases,
f.total_revenue,
f.avg_order_value,
f.customer_lifetime,
f.days_since_last_purchase,
f.purchase_frequency,
l.churned
FROM
customer_features f
JOIN
customer_labels l
ON f.customer_id = l.customer_id); -- Joins customer features with churn labels
模型評估輸出
評估輸出包括關鍵效能量度,例如AUC-ROC、精確度、精確度和召回率。 這些量度可讓您深入瞭解模型有效性,以便用於調整保留策略及制定資料導向式決策。
auc_roc | accuracy | precision | recall
|---------+----------+-----------+--------
1 | 0.99998 | 1 | 1
auc_rocaccuracyprecisionrecall模型預測 model-prediction
評估模型後,使用model_predict將其套用至新資料集並預測客戶流失。 您可以使用這些預測來識別有風險的客戶,並實作目標性的保留策略。
使用SQL產生流失預測 sql-model-predict
下列SQL查詢使用retention_model_logistic_reg模型,透過如訓練資料結構化的資料集來預測客戶流失:
SELECT *
FROM model_predict(retention_model_logistic_reg, 1, -- Applies the trained model for churn prediction
WITH customer_features AS (
SELECT
identityMap['ECID'][0].id AS customer_id,
AVG(COALESCE(productListItems.priceTotal[0], 0)) AS avg_order_value,
SUM(COALESCE(productListItems.priceTotal[0], 0)) AS total_revenue,
COUNT(COALESCE(productListItems.quantity[0], 0)) AS total_purchases,
DATEDIFF(MAX(timestamp), MIN(timestamp)) AS customer_lifetime,
DATEDIFF(CURRENT_DATE, MAX(timestamp)) AS days_since_last_purchase,
COUNT(DISTINCT CONCAT(YEAR(timestamp), MONTH(timestamp))) AS purchase_frequency
FROM
webevents
WHERE EXISTS(productListItems, value -> value.priceTotal > 0) -- Ensures only valid purchase data is considered
AND commerce.`order`.purchaseID <> ''
GROUP BY customer_id
),
customer_labels AS (
SELECT
identityMap['ECID'][0].id AS customer_id,
CASE
WHEN DATEDIFF(CURRENT_DATE, MAX(timestamp)) > 90 THEN 1 -- Identify customers who have not purchased in the last 90 days
ELSE 0
END AS churned
FROM
webevents
WHERE EXISTS(productListItems, value -> value.priceTotal > 0)
AND commerce.`order`.purchaseID <> ''
GROUP BY customer_id
)
SELECT
f.customer_id,
f.total_purchases,
f.total_revenue,
f.avg_order_value,
f.customer_lifetime,
f.days_since_last_purchase,
f.purchase_frequency,
l.churned
FROM
customer_features f
JOIN
customer_labels l
ON f.customer_id = l.customer_id); -- Matches features with their churn labels for prediction
模型預測輸出 prediction-output
輸出資料集包含關鍵客戶功能及其預測的客戶流失狀態,可指出客戶是否可能流失。 運用這些見解實施主動式保留策略並減少客戶流失。
total_purchases | total_revenue | avg_order_value | customer_lifetime | days_since_last_purchase | purchase_frequency | churned | prediction
|-----------------+---------------+-----------------+-------------------+--------------------------+--------------------+---------+------------
2 | 299 | 149.5 | 0 | 13 | 1 | 0 | 0
1 | 710 | 710.00 | 0 | 149 | 1 | 1 | 1
1 | 19.99 | 19.99 | 0 | 30 | 1 | 0 | 0
1 | 4528 | 4528.00 | 0 | 26 | 1 | 0 | 0
1 | 21.84 | 21.84 | 0 | 90 | 1 | 0 | 0
1 | 16.64 | 16.64 | 0 | 268 | 1 | 1 | 1
prediction後續步驟
您現在已經瞭解如何建立、評估及使用以SQL為基礎的模型來預測客戶流失。 有了這個基礎,您就可以分析客戶行為、識別有風險的客戶,並實作主動保留策略以改善客戶保留率。 若要進一步增強並套用流失預測模型,請考慮下列步驟: