Skip to main content

Facebook Optimization with RL

In the fast-paced world of digital advertising, platforms like Meta dominate, generating over $100 billion in revenue as of 2024 [1]. For digital marketing agencies navigating Meta’s advertising ecosystem poses unique challenges. Meeting specific campaign goals such as impressions within a constrained budget often requires continuous monitoring and adjustment, making the process resource-intensive and prone to error.


Problem Statement

Unlike traditional advertising channels, where metrics such as impressions can be controlled directly, Meta’s platform relies on a performance-driven model influenced by budget allocation. Achieving specific goals — whether impressions or conversions — within budget constraints necessitates constant oversight. This manual process is labor-intensive and prone to inefficiencies.

pi_optimal addresses these challenges by leveraging Reinforcement Learning (RL) to automate and optimize advertising campaign management. Designed for data science teams—even those unfamiliar with advanced RL concepts — pi_optimal enables efficient control over campaigns to achieve consistent outcomes.

Our goal is twofold:

  • Develop a model that optimally configures Facebook campaign settings to meet target impressions.
  • Understand the model’s decision-making process to build trust in its predictions.

Dataset

The dataset, provided by a mid-sized German marketing agency, consists of over 2,574 Facebook campaigns conducted between 2019 and 2022. Campaign data is aggregated daily, with the following key features:

Dataset Features

  • campaign_id: Unique identifier for each campaign
  • date: Date of the data point
  • day_of_week: Day of the week
  • impression_goal: The target number of impressions for the campaign by the end of the runtime
  • daily_impressions: The number of impressions generated by the campaign on that day
  • total_impressions: The total number of impressions generated until that day
  • target_daily_impressions: The target number of impressions for that day
  • target_total_impressions: The target number of impressions until that day
  • missing_impressions: The difference between the target and actual impressions for that day
  • total_days: The total number of days the campaign has been running
  • remaining_days: The number of days remaining for the campaign to run
  • set_cpm: The cost per thousand impressions set for the campaign
  • feed_facebook_status: Indicated if the campaign was run on the Facebook feed at that day
  • instant_article_facebook_status: Indicated if the campaign was run on the Instant Article feed at that day
  • facebook_stories_facebook_status: Indicated if the campaign was run on the Facebook Stories feed at that day
  • marketplace_facebook_status: Indicated if the campaign was run on the Marketplace feed at that day
  • right_hand_column_facebook_status: Indicated if the campaign was run on the Right Hand Column feed at that day

The dataset provides a comprehensive view of campaign performance, enabling us to train and evaluate our RL agent.

import pandas as pd

df_historical_fb_camapigns = pd.read_csv('data/fb_history.csv', parse_dates=['date'])
df_historical_fb_camapigns.sort_values(['campaign_id', 'date'], inplace=True)
df_historical_fb_camapigns.head()
campaign_iddateday_of_weekimpression_goaldaily_impressionstotal_impressionstarget_daily_impressionstarget_total_impressionsmissing_impressionstotal_daysremaining_daysset_cpmfeed_facebook_statusinstant_article_facebook_statusfacebook_stories_facebook_statusmarketplace_facebook_statusright_hand_column_facebook_status
002022-05-30Monday210001244.01244.01400.01400.0156.015.013.03.0ActiveActiveActiveActiveInactive
102022-06-01Wednesday21000915.02159.01400.02800.0641.015.011.03.0ActiveActiveActiveActiveInactive
202022-06-08Wednesday210005225.07384.01400.04200.0-3184.015.04.03.0ActiveActiveActiveActiveActive
302022-06-09Thursday210004402.011786.01400.05600.0-6186.015.03.03.0ActiveActiveActiveActiveActive
402022-06-10Friday210003202.014988.01400.07000.0-7988.015.02.02.0ActiveActiveActiveActiveActive

Speeding up Training and Inference

To speed up training and inference, you can use the sklearnex package, which provides optimized implementations of certain scikit-learn functions for Intel CPUs. To enable this optimization, you need to install the sklearnex package and uncomment the following lines of code:

```python
# Uncomment the following lines for faster training and inference if you have sklearnex installed and are using an Intel CPU

#import numpy as np
#from sklearnex import patch_sklearn
#patch_sklearn()

Defining the Reward Function

To guide the RL model’s learning, we need to define a reward function. This function evaluates the quality of a given campaign state. The RL model then optimizes its decisions to maximize the reward.

In our case, the reward function minimizes the absolute difference between the target impressions and actual impressions (i.e., reducing over- or under-delivery). While our example reward function is straightforward, it can be extended to account for additional factors such as budget utilization or audience targeting.

Implementation

# Function to calculate reward
def calculate_reward(row, epsilon=1e-8):

# Reward is the negative of the absolute difference between target and actual impressions
reward = -abs(row.daily_impressions - row.target_daily_impressions)

return reward

Apply the reward function to the dataset

# Apply the reward calculation
df_historical_fb_camapigns['reward'] = df_historical_fb_camapigns.apply(calculate_reward, axis=1)
df_historical_fb_camapigns.head()
campaign_iddateday_of_weekimpression_goaldaily_impressionstotal_impressionstarget_daily_impressionstarget_total_impressionsmissing_impressionstotal_daysremaining_daysset_cpmfeed_facebook_statusinstant_article_facebook_statusfacebook_stories_facebook_statusmarketplace_facebook_statusright_hand_column_facebook_statusreward
002022-05-30Monday210001244.01244.01400.01400.0156.015.013.03.0ActiveActiveActiveActiveInactive-156.0
102022-06-01Wednesday21000915.02159.01400.02800.0641.015.011.03.0ActiveActiveActiveActiveInactive-641.0
202022-06-08Wednesday210005225.07384.01400.04200.0-3184.015.04.03.0ActiveActiveActiveActiveActive-3184.0
302022-06-09Thursday210004402.011786.01400.05600.0-6186.015.03.03.0ActiveActiveActiveActiveActive-6186.0
402022-06-10Friday210003202.014988.01400.07000.0-7988.015.02.02.0ActiveActiveActiveActiveActive-7988.0

Dataset Preparation

To train the RL agent, we prepare the dataset by specifying:

  • State columns:
    Variables describing the campaign’s current environment (e.g., daily_impressions, day_of_week, etc.).

  • Action columns:
    Variables representing the campaign settings that can be controlled (e.g., set_cpm, feed_facebook_status).

  • Reward Column:
    The reward function output, indicating the quality of a decision.

  • Unit and time columns:
    Identifiers for each campaign and the temporal sequence of data points.

Example Configuration

state_cols = [
'daily_impressions', 'day_of_week', 'impression_goal',
'total_impressions', 'target_daily_impressions', 'missing_impressions',
]

action_cols = [
'set_cpm', 'feed_facebook_status', 'instant_article_facebook_status',
'facebook_stories_facebook_status', 'marketplace_facebook_status', 'right_hand_column_facebook_status',
]

reward_col = 'reward'
timestamp_col = 'date'
unit_col = 'campaign_id'

Configuration

Beides the columns mentioned above, we could also adjust the number of lookback timesteps for predicting how the campaign will evolve in the future. In our case we consider the last 7 days of data to make a decision. Now the dataset will be initialized, the features will be preprocessed and the dataset will be prepared for training.

import pi_optimal as po

LOOKBACK_TIMESTEPS = 7
historical_dataset = po.datasets.timeseries_dataset.TimeseriesDataset(df=df_historical_fb_camapigns,
state_columns=state_cols,
action_columns=action_cols,
reward_column=reward_col,
timestep_column=timestamp_col,
unit_index=unit_col,
lookback_timesteps=LOOKBACK_TIMESTEPS)

Agent Initialization and Training

Using pi_optimal, we initialize and train an RL agent to optimize campaign performance. Constraints are added to ensure feasible and cost-effective recommendations, such as limiting CPM values or activation of placements.

from pi_optimal.agents.agent import Agent
import numpy as np

agent = Agent()


agent.train(dataset=historical_dataset,
constraints= {
'min': np.array([0.1, "Active", "Active", "Active", "Active", "Active"]), # Min values for: CPM, Feed, Instant Article, Stories, Marketplace, Right Hand Column
'max': np.array([9.0, "Inactive", "Inactive", "Inactive", "Inactive", "Inactive"]) # Max values for: CPM, Feed, Instant Article, Stories, Marketplace, Right Hand Column
})

Evaluation and Action Prediction

The trained RL agent is evaluated using unseen campaign data. For example, we select an underperforming campaign and analyze the agent’s recommendations for the last seven days.

Key Steps:

  1. Extract recent campaign data.
  2. Create an inference dataset using the same configuration as the training set.
  3. Use the agent to predict optimal actions.

Load Current Data

We select here two campaigns which were not part of the training set. One campiagn is underperforming, so the agent should increase the CPM. The other campaign is overperforming, so the agent should decrease the CPM. We will start with the underperforming campaign.

# Load the current adset control data

# df_current_adset_control = pd.read_csv('data/fb_current_over_deliver.csv', parse_dates=['date'])
df_current_adset_control = pd.read_csv('data/fb_current_under_deliver.csv', parse_dates=['date'])

# Apply the reward calculation
df_current_adset_control["reward"] = df_current_adset_control.apply(calculate_reward, axis=1)
from util_plot import plot_campaign

plot_campaign(df_current_adset_control.copy())

Example Social Campaign

Create Current Dataset

For our dataset we would like what happen if we would let the agent make decisions for the last 7 days. After we cutout the last 7 days of the campaign we can now create a dataset for the agent to predict the optimal actions.

df_current_adset_control = df_current_adset_control.iloc[:-7]

current_dataset = po.datasets.timeseries_dataset.TimeseriesDataset(df=df_current_adset_control,
dataset_config=historical_dataset.dataset_config,
lookback_timesteps=LOOKBACK_TIMESTEPS,
train_processors=False,
is_inference=True)

Predict Optimal Actions

best_actions = agent.predict(current_dataset)

Interpreting the Results

The agent provides a sequence of optimal actions for the time horizon. Here we print them:

for i in range(len(best_actions)):
print(f"Timestep {i}:")
print("Maximum CPM:", best_actions[i][0])
print("Feed Facebook Status:", best_actions[i][1])
print("Instant Article Facebook Status:", best_actions[i][2])
print("Facebook Stories Facebook Status:", best_actions[i][3])
print("Marketplace Facebook Status:", best_actions[i][4])
print("Right Hand Column Facebook Status:", best_actions[i][5])
print()
print("--------------------")
print()
    Timestep 0:
Maximum CPM: 4.1440664897204105
Feed Facebook Status: Inactive
Instant Article Facebook Status: Inactive
Facebook Stories Facebook Status: Active
Marketplace Facebook Status: Inactive
Right Hand Column Facebook Status: Active

--------------------

Timestep 1:
Maximum CPM: 5.8248877751997234
Feed Facebook Status: Inactive
Instant Article Facebook Status: Inactive
Facebook Stories Facebook Status: Active
Marketplace Facebook Status: Active
Right Hand Column Facebook Status: Active

--------------------

Timestep 2:
Maximum CPM: 5.629406419293266
Feed Facebook Status: Active
Instant Article Facebook Status: Inactive
Facebook Stories Facebook Status: Active
Marketplace Facebook Status: Active
Right Hand Column Facebook Status: Active

--------------------

Timestep 3:
Maximum CPM: 7.144393154823279
Feed Facebook Status: Inactive
Instant Article Facebook Status: Active
Facebook Stories Facebook Status: Inactive
Marketplace Facebook Status: Inactive
Right Hand Column Facebook Status: Active

--------------------

Visualization of Scenarios

pi_optimal provides visualization tools to analyze the agent’s recommendations. For instance, we can compare the agent’s suggested CPM values and placement statuses with actual performance, ensuring transparency and interpretability.

from pi_optimal.utils.trajectory_visualizer import TrajectoryVisualizer

trajectory_visualizer = TrajectoryVisualizer(agent, current_dataset, best_actions=best_actions)
trajectory_visualizer.display()

Trajectory Visualzation


Conclusion

This notebook demonstrates how pi_optimal could be use to automate the control of facebook campaigns to reach a specific impressions goal. Additional we have seen how we could use pi_optimal to run hypothetical scenarios. This could be used to understand how the agent would react to different settings or to understand how the agent would react to different scenarios.

Key Highlights

  • Dataset Preparation: The dataset was prepared for training by defining the key parameters required for the RL agent. There is no need to manually preprocess the data, as the pi_optimal package streamlines this process.
  • Agent Training: The RL agent was trained on the dataset, learning to optimize the campaign settings to reach the desired impressions goal.
  • Action Prediction: The agent predicted the optimal actions for two campaigns, one underperforming and one overperforming, to reach the desired impressions goal.
  • Simulation Visualization: The results of the agent were visualized to understand how the agent would react to different scenarios.

Next Steps

  1. Enhance Reward Function: Use custom reward functions that incorporate additional features, such as budget constraints, audience targeting, or creative performance. By designing more complex reward functions, you can guide the agent to make decisions that align with broader campaign objectives.

  2. Hyperparameter Tuning: Optimize the hyperparameters of the RL agent to improve its performance. By conducting a hyperparameter search, you can identify the optimal configuration that maximizes the agent's ability to reach the impressions goal.

  3. Real-Time Decision-Making: Implement the agent in a real-time decision-making system to automate the optimization of Facebook campaigns. By integrating the agent into your existing ad management platform, you can leverage its capabilities to enhance campaign performance and efficiency.

References

[1] https://investor.atmeta.com/home/