There are a lot of possibilities with your tool, but considering the current situation, what would you recommend we do first?
We strongly recommend beginning your journey into game personalization with a use case that delivers high impact while carrying minimal risk. This approach enables fast wins, builds internal confidence, and establishes the technical and organizational foundation needed to scale personalization across the game.
Ideal starting points include:
Interstitial Frequency Optimization
Dynamically adjusting interstitial ad frequency based on player context such as session length, engagement level, or churn risk can significantly enhance user experience while protecting monetization. For example, reducing ad exposure for highly engaged players or increasing it for at-risk users allows you to balance retention with revenue. This is a technically simple and low-risk use case that offers immediate insight into the value of personalization.
Bid Floor Personalization
Personalizing bid floors based on signals like predicted lifetime value, engagement behavior, or monetization potential lets you unlock more revenue from ad impressions without sacrificing player experience. By intelligently matching the bid floor to the value of each impression, you can drive measurable gains in ARPDAU with minimal implementation overhead.
When should we use Bandits vs A/B testing
A/B Testing: Suitable for situations where learning is the goal e.g. new features
Bandits: Best for real-time optimization of established features, and yield improvement where success metrics are well-defined.
Core Proposition & Methodology
How does your Machine Learning determine what & when to offer each variant?
The model continuously learns from user behavior. If a player doesn’t respond to Variant A, the system will learn to try Variant B more often when it next sees another player with similar behavioral traits. With every new interaction the model’s decisioning quality improves as it seeks to maximise its success metric.
How does it determine what a segment is?
Segments are not predefined. Instead, they’re dynamically formed based on observed behavioral patterns and user attributes (e.g., geography, UA source) that are selected as its context user properties. Two users are deemed similar on the fly if they share similar traits. In rare cases, some user properties like country will be pre-grouped by Metica using embeddings as a way to speed up the system’s learning. For example, Chile and Bolivia can be passed as inputs to the model as a single geo rather than distinct ones.
How does your agent differentiate randomness from causality?
Using statistical methods like Bayesian inference and significance testing, the system continuously updates its understanding of what causes changes in outcomes, as opposed to random fluctuations.
Are models based on uncertainty?
Yes. Uncertainty is a core feature. The model balances exploration (trying unknowns) with exploitation (using what’s known to work), adapting its strategy based on confidence levels.
Do we actually have uncertainty in those models, or is this more random?
The model actively manages uncertainty. For example, during early exploration, it purposefully tests different options to learn, then gradually shifts toward exploiting known best-performing variants.
For training, will you use our available data?
Yes. Both historical and real-time data from your game are used to refine learning. While the model can learn from scratch, historical data helps guide initial segmentation and feature development.
Can historical data be used for the AI learning?
Yes. Although the model has a 7-day exploration period to learn from scratch, historical data can accelerate learning and improve early performance.
Is it possible to isolate the uplift from a specific feature like the IAP offer?
We do report per-variant performance against holdout, but: Users may be in multiple test groups, so it's difficult to directly attribute results to one feature due to overlapping influences.
When a user is first assigned during the initial exploration period, will they be re-assigned at the end of their AssignmentDuration or directly after the exploration period?
After their Assignment Duration.
When a user installs for the first time after the initial global exploration period, will there be a “personal” exploration period where they are randomly assigned?
No, outside of the global exploration period users are assigned based on their context and the expected most valuable variant - in a small % of cases where the uncertainty for a context is large, the model might decide to deliver the second best variant in order to learn.
Data Integration
How long is the technical integration usually? What team size is required?
The data integration using the SDK takes most teams 1-2 days. The starter use cases Bid Floors and Interstitial Frequency take 1 day to integrate. More bespoke use cases can be more involved and the effort depends on your design choices.
Should we send events for all users or just those on the SDK?
For all users. This enables better historical analysis and more robust comparisons.
Does the letter case of the country code matter for registerCountry?
No. Case-insensitive.
For clientIpAddress, which IP should be used?
The IP address at the time of the visit, not the installation.
Are platform names fixed (e.g., ios, android)?
You may use your internal labels like ios, gp (Google Play), or az (Amazon).
Is there a fixed set of values for deviceType?
Yes. Use standard categories: tablet, phone, desktop.
Are the uaNetwork, uaCampaign, and uaCreative parameters mandatory?
No. Partial info is acceptable. Send what’s available; updates can be appended later.
Is there a specific event template required for real-time learning?
Yes, we provide a standardized format which is customizable to suit your game’s schema.
Platform Information
What is the Assignment Duration and Attribution Window?
Assignment Duration**: Time a user remains in their assigned group even if their context changes.
Attribution Window**: Duration post-assignment during which revenue is considered for performance measurement.
What is Context?
You can imagine Metica's system to be "segmentation on steroids"
Users are grouped by what we call "contexts" - these are basically segmentation criteria like platform, country, but also behavioural criteria including the player progression, as well as their recent ad and iap revenues
The platform then uses a reinforcement learning model that learns for each individual player based on their current context, which variant has the highest expected success metric.
The machine optimises directly for the success metric (mostly revenue); for monitoring we use additional metrics like games played, impressions, CPMs, retention
How much does each context contributes to the assignment position?
We have a set of internal reports that track the performance of each experiment. However, this functionality is not currently available on the platform.
What is the Primary Success Metric for?
The primary success metric is the main outcome used to judge if an experiment achieved its goal, chosen before the test and aligned with its objective.
How to decide which success metric is mandatory?
Revenue is recommended (90%+ of cases) as it's closest to what needs optimization.
How does the Secondary Metric work?
It’s used for monitoring and analysis but doesn’t influence the machine learning system. Only the primary success metric drives variant allocation decisions.
Does having both a primary and a secondary success metric affect performance?
Secondary metrics are used strictly for monitoring purposes and do not influence model learning or delivery. To incorporate multiple metrics into learning, a hybrid primary metric would need to be designed.
What should be the baseline variant?
Always use your current default experience as the baseline to properly compare uplift.
Can test parameters or variants be changed after launch?
Yes. Arms can be added/removed, and filters updated without affecting the model, unless bandit parameters themselves are altered. Contexts should remain the same.
What are the requirements for experimentation?
At least 100 responses per variant per week
7-day learning phase
3% statistically significant improvement
Can users participate in multiple use cases?
Yes, with appropriate configuration to avoid interference.
How is the holdout group assigned?
Randomized via a seed/user ID pair. Sample size and diversity are ensured via statistical checks like winsorisation and regression balancing.
When generating the holdout seed, should it be the same or different across experiments?
There are pros and cons to configuring a different holdout seed in each new experiment. When two experiments have a different holdout seed they become completely independent of one another. The benefit is that we believe it to be valuable to understand the isolated effect of any given use case on revenue.
Is there a way to view holdouts within the platform?
Not at the moment, but the team is actively working on adding this feature. In the meantime, we can provide Google Sheet outputs and periodic snapshots.
How do you handle outliers?
Using Winsorization techniques to limit skew from extreme data points.
How is uplift measured?
Comparing variant performance vs. control group using primary success metric, often revenue.
What is z_stat?
Statistical test value indicating confidence intervals: 1.65 = 90%, 1.95 = 95%.
How many variants should be tested?
As many as required however we recommend 3-5 is optimal to balance learning with exploration depth. The variants should be diverse for optimal performance.
How critical is device memory as a context?
Very important. It’s one of the most powerful predictors in performance analysis.
Can the holdout evolve over time?
Yes, with caution. Decreasing holdout size is safe. Increasing its risks including previously exposed users, potentially biasing results.
How does the allocation system work during the exploration phase?
During the exploration phase (the first 7 days), equally allocate users for the first 7 days of exploration and only then start adjusting allocation/move users around. Afterwards the bandits have a dynamic rate of exploitation (going for highest revenue), and exploration (learning more), which depends on the level of confidence the model has for specific contexts.
Bid Floor Personalization
Why do publishers need Metica for bid floor optimization?
Static, rule-based approaches are blunt tools that leave revenue on the table. Metica uses real-time user behavior to dynamically optimize bid floors at the individual level, unlocking significantly higher revenue and performance that manual segmentation simply can’t match.
How quickly does Metica return bid floor recommendations?
Within 200ms of receiving user data.
How often are bid floors updated?
Currently once per session. We’re developing more frequent update capabilities.
How does Metica refine bid floors over time?
By analyzing user behavior, engagement, fill rates, CPM trends, and more learning from each session.