Introducing Conductrics Market Research

The Fusion of Experimentation with Market Research

Customer centric 

For over a decade Conductrics has been providing innovative software for online Experimentation and Optimization. Innovations include the industry’s first API driven A/B Testing and multiarmed bandit software, as well as the first to introduce transparency in our machine learning predictive targeting with human-interpretable contextual bandits capabilities.

Working alongside our client partners, it became clear that even though tools for experimentation and optimization are extremely powerful, up until now they have been inherently limited.   Limited because no matter the type of A/B Test or ML algorithm, experimentation has lacked visibility into a key, and necessary, source of information – the voice of the customer. 

We are excited to announce that we have created Conductrics Market Research a first of its kind solution that integrates Optimization with Market research. Current approaches require separate tools and treat experimentation and research as separate solutions. We believe the true power of customer experience optimization can be only unlocked when experimentation and research are treated as one, like a film and its soundtrack. Full understanding and meaning can only happen when both are perfectly synced with one another.  

Top Level Benefits of combining research with experimentation and optimization include:

  1. Learn directly from the customer about what they care about and their unmet needs in order to feed your experimentation program and to drive drive the fruitful development of new product and new customer experiences.
  2. Now you can learn not only ‘What’ work, but ‘Why’ it works, by adding direct customer feedback alongside customer behavior in your A/B Test metrics. 
  3. Discover if new customer experiences and product features improve customer attitudes and long term loyalty (e.g. NPS) as well as improve sales and conversions.
  4. Tailor the customer’s  journey in real time and show them that you are actually listening to them, by delivering real-time customized online experiences based on their specific survey feedback.
  5. Strengthen your culture of experimentation by improving internal messaging by combining direct customer quotes and your tests’ impact on NPS alongside your current statistical reporting. 

New Platform Capabilities

  • Easy Set up and Launch of Native Surveys directly from Conductrics
  • Connect any A/B Test to any Survey: 
    • Numeric and Yes/No survey responses can be used directly as goals for A/B Test.
    • Collect and connect qualitative/open text customer responses back to each A/B Test treatment experience. 
  • Append in-session user data (e.g logged in, loyalty status, A/B test treatments etc.) to survey response data for enhanced survey reporting.
  • Create in-session user traits directly from survey responses and attach targeting rules to them to customize online user experience in real-time or to run targeted A/B tests.
  • Attach any standard behavioral goals, such as sales, to surveys to auto-generate A/B Tests to determine if offering the survey adversely affects sales or conversions.  

Conductrics Market Research

As we roll out the new integrated platform, customers will gain access to Conductrics Market Research and see it as a new top-level capability along side A/B Testing, Predictive Targeting, and Rule-Based Targeting.

For over a decade, Conductrics has been innovating and delivering industry-first capabilities. We are very excited to be the first to combine direct voice of consumer data alongside traditional experimentation data to provide you with the integrated capabilities needed to create better user experiences, drive long-term loyalty and increase direct revenue.

Get a first look at some of the unique capabilities that are available to your experimentation and research teams that are rolling out at Conductrics. 

To learn more contact us here.

Posted in Uncategorized | Leave a comment


Conductrics Market Research Features

A First Look

In this post we will take our first look at some of the new features the Conductrics Market Research release. We will follow up over the coming weeks and months with more peeks and details of the new capabilities. 

As we roll out the new integrated platform, customers will gain access to Conductrics Market Research and see it as a new top-level capability along side A/B Testing, Predictive Targeting, and Rule-Based Targeting.

When creating a new survey, along with the standard survey creation options, there are three primary additional features that are unique to Conductrics. 

1 Survey Responses as A/B Test Goals

When creating a numeric survey question you will be given the option to ‘Track as a Goal/Conversion’ event. If this option is selected then the value of this question can be used as a goal in any of your A/B Tests – just like any other behavioral measure.   

For example, say we have a three question survey and we would like to capture ‘How Likely To Recommend’ as a survey response, but we would also like to see how an upcoming A/B Test might affect the likelihood to recommend. By clicking the option to track the Recommend question as a goal, all new A/B Tests will be eligible to use this survey response as a conversion goal.

2 Auto Create NPS Goal

While any metric survey question can be used as an A/B Testing goal, often what the C-Suite really cares about is changes to NPS. By also selecting the optional ‘Enable for Net Promoter Score (NPS)’ as shown in the image above, Conductrics will auto create 4 NPS related goals:

  1. NPS score (% Promoters – % Detractors)*100
  2. Number Promoters
  3. Number Detractors
  4. Number of Passives. 

What is so powerful about this is that you can now see NPS scores, and their associated confidence intervals and testing statistics, in your A/B Testing reports along with your sales and conversion goals.

We believe this is the only survey solution, even including dedicated survey tools, that provides confidence intervals around NPS and the ability to use NPS in A/B Tests.

3 Real-Time Customer Experience Customization 

As part of the Survey creation workflow there is also the option of assigning survey responses as user traits for filtering and enriching reporting for A/B Tests, or to creating real-time customized in-session customer experiences.

For example, if we wanted to customize the customer experience based on how a user responded to the “What is the Purpose of Your Visit?” question, we just select the ‘Retain Response as Visitor Trait’ in the question setup and the value of ‘Purpose of Visit’ will automatically be sent into an in-session Conductrics visitor trait ‘Purpose’

This makes this information immediately available to all reporting, A/B Testing, and Predictive Targeting modules within Conductrics.

To customize the in-session experience we can use Conductrics Rules-Based Targeting module. Once the above survey is saved it will auto-populate Conductrics User Targeting Conditions builder.  Below we see that the rules builder auto generated a user trait called ‘Purpose’ that has the four associated survey response values as options.  These can be used either directly or in any logical combination with any other set of in-session visitor traits.

To keep it simple, we set up a collection of rules that will trigger the relevant in-session user experience based on just the user’s responses to the ‘Purpose of Visit’ question. In the follow screen shot, we show our variations, or user experiences, on the left, and on the right we have our targeting rules that will trigger and deliver the experiences based on the user’s response to the ‘Purpose of Visit’ question. These user experiences can be content from Conductrics web creation tools, backend content like feature flags, or even different server side algorithms.

For example if a customer submits a survey and answers the ‘What is the Purpose of Your Visit?’ question with ‘To Purchase Now’’, then they will immediately be eligible for the targeting rule that will deliver ‘Product Offers’ content. 

Survey Response Data

Of course, Conductrics Research also provides survey reporting to aide human decision making. Conductrics provides simple to use filtering and summary features to help you more easily understand what your customers are hoping to communicate with you.

Along with the tabular view, there is also a card view, so that you can ‘flip through’ individual survey responses and see all an individual customer’s responses enriched with all of their associated in-session data that has been made available to the Conductrics platform.

Optionally, you can download the in-session enriched survey data for use in your favorite statistical software package.

Are Surveys Affecting Conversions?

An important question that arises with in-session market research is if offering a survey in a sensitive area of the site might adversely affect customer purchase or conversion. Conductrics provides the answer. Simply assign a behavioral conversion goal, like sales, or sign-up, to the survey. Conductrics will automatically run a special A/B Test alongside the survey to test the impact that offering a survey has on conversion by comparing all eligible users who were offered a survey vs those who were not offered.

Now your market research teams can learn where and when it is okay to place customer surveys by determining if conversions actually are affected and if so, if it is by a large enough amount to offset the value of direct customer feedback.

For over a decade, Conductrics has been innovating and delivering industry-first capabilities. We are very excited to be the first to combine direct voice of consumer data alongside traditional experimentation data to provide you with the integrated capabilities needed to create better user experiences, drive long-term loyalty and increase direct revenue. This is just a first quick look at some of the unique capabilities that are available to your experimentation and research teams that are rolling out at Conductrics. 

To learn more contact us here.

Posted in Uncategorized | Leave a comment


AB Testing: Ruling Out To Conclude

Seemingly simple ideas underpinning AB Testing are confusing. Rather than getting into the weeds around the definitions of p-values and significance, perhaps AB Testing might be easier to understand if we reframe it as a simple ruling out procedure.

Ruling Out What?

There are two things we are trying to rule out when we run AB Tests

  1. Confounding
  2. Sampling Variability/Error

Confounding is the Problem Random Selection is a Solution

What is Confounding?

Confounding is when unobserved factors that can affect our results are mixed in with the treatment that we wish to test. A classic example of potential confounding is the effects of education on future earnings. While people who have more years of education tend to have higher earnings, a question Economists like to ask is if extra education drives earnings or if natural ability, which is unobserved, causes how may years of education and amount of earnings people receive. Here is a picture of this:

Ability as Confounder DAG

We want to be able to test if there is a direct causal relationship between education and earnings, but what this simple DAG (Direct Acyclic Graph) shows is that education and earnings might be jointly determined by ability – which we can’t directly observe. So we won’t know if it is education that is driving earnings or if earnings and education are just an outcome of ability.

The general picture of confounding looks like this:

General DAG

What we want is a way to break the connection between the potential confounder and the treatment.

Randomization to the Rescue
Amazingly, if we are able to randomize which subjects are assigned to each treatment we can break, or block, the effect of unobserved confounders and we can make causal statements about the treatment on the outcome of interest.

Randomization breaks Confounding

Why? Since the assignment is done based on random draw, the user, and hence any potential confounder is no longer mixed in with the treatment assignment. You might say that the confounder no longer gets to choose its treatment. For example, if we were able to randomly assign people to education, then high and low ability students each would be just as equally likely to be in the low and high education groups, and their effect on earnings would balance out, on average, leaving just the direct effect of education on earnings. Random assignment lets us rule out potential confounders, allowing us to focus just on the causal relationship between treatment and outcomes*.

So are we done? Not quite. We still have to deal with uncertainty that is introduced whenever we try to learn from sample observations.

Sampling Variation and Uncertainty

Analytics is about making statements about the larger world via induction – the process of observing samples from the environment, then applying the tools of statistical inference to draw general conclusions. One aspect of this that often goes underappreciated is that there is always some inherent uncertainty due to sample variation.  Since we never observe the world in its entirety, but only finite, random samples, our view of it will vary based on the particular sample we use. This is reason for the tools of statistical inference – to account for this variation when we try to draw out conclusions.

A central idea behind induction/statistical inference is that we are only able to make statements about the truth within some bound, or range,  and that bound only holds in probability.

For example, the true value is represented as the little blue dot But this is hidden from us.

Truth

Instead what we are able to learn is something more like a smear.

We Learn in Smears

The smear tells us that the true value of the thing we are interested in will lie somewhere between x and x’ with some P probability. So there is some P probability, perhaps 0.05, that our smear won’t cover the true value.

Truth not under Smear

This means that there are actually two inter related sources of uncertainty:

1) the width, or precision, of the smear (more formally called a bound)

2) the probability that the true value will lie within the smear rather than outside of its upper and lower range.

Given a fixed sample (and a given estimator), we can reduce the width of the smear (make it tighter, more precise), only by reducing the probability that the truth will lie within it  – and vice versa, we can increase the probability that the truth will lie in the smear only by increasing (make it looser, less precise) its width. This is a more general concept that the confidence interval is an example of – we say the treatment effect is likely within some interval (bound) with a given probability (say 0.95). We will always be limited in this way. Yes we can decrease the width, and increase the probability that it holds by increasing our sample size, but it is always with diminishing returns [in the order of O(1/sqrt(n)].

AB Tests and P-values To Rule Out Sampling Variations

Assuming we have collected the samples appropriately, and certain assumptions hold, by removing potential confounders there will now be just two potential sources of variation between our A and B interventions:

1) the inherent sampling variation that is always part of sample based inference that we discussed earlier; and
2) a causal effect – the effect on the world that we hypothesize exists when doing B vs A.

AB tests are a simple, formal process to rule out, in probability, the sampling variability.  Through the process of elimination if we rule out the sampling variation as the main source of the observed effect (with some P probability), then we might conclude the observed difference is due to a causal effect. The  P-value ( the probability of seeing the observed difference, or greater, just due to random sampling) – relates to the probability that we will tolerate in order to rule out that the sampling variation is a likely source for the observed difference. 

For example, in the first case we might not be willing to rule out sampling variability, since our smears overlap with one another – indicating that the true value of each might well be covered by either smear.

Don’t Rule Out Sampling Error

However in this case, where our smears are mostly distinct from one another, we have little evidence that the sampling variability is enough to lead to such a difference between our results and hence we might conclude the difference is due to a causal effect.

Result unlikely due to Sampling Error Alone – Rule Out

So we look to rule out in order to conclude**

To summarize, causal statements, via AB Tests/RCTs, randomize treatment selections to generate random samples from each treatment in order to block confounding so that we can safely use the tools from statistical inference to make causal statements .

* RCTs are not the only way to deal with confounding. When studying the effect of education on earnings, un able to run RCTs, Economists used the method of instrumental variables to try to deal with confounding in observational data.

**technically ‘reject the null’ – think of Tennis if ‘null’ trips you up – it’s like zero. We ask ‘Is there evidence, after we account for the likely difference due to sampling’ to reject that the difference we see, e.g the observed difference in conversion rate between B and A, is likely due to just sampling variations.

***If you want to learn about other ways of dealing with confounding beyond RCTs a good introduction is Causal Inference: The Mixtape – by Scott Cunningham.

Posted in Uncategorized | Leave a comment


Some are Useful: AB Testing Programs

As AB testing becomes more commonplace, companies are moving beyond thinking about how to best run experiments to how to best set up and run experimentation programs. Unless the required time, effort, and expertise is invested into designing and running the AB Testing program, experimentation is unlikely to be useful.

Interestingly, some of the best guidance for getting the most out of experimentation can be found in a paper published almost 45 years ago by George Box. If that name rings a bell, it is because Box is attributed with coining the phrase “All models are wrong, but some are useful”. In fact, from the very same paper that this phrase comes from we can discover some guiding principles for running an a successful experimentation program.

In 1976 Box published Science and Statistics in the Journal of the American Statistical Association. In it he discusses what he considers to be the key elements to successfully applying the scientific method. Why might this be useful for us? Because in a very real sense, experimentation and AB Testing programs are the way we implement the scientific method to business decisions. They are how companies DO science. So learning about how to best employ the scientific method directly translates to how we should best set up and run our experimentation programs.

Box argues that the scientific method is made up, in part, of the following:
1) Motivated Iteration
2) Flexibility
3) Parsimony
4) Selective Worry

According to Box, the attributes of the scientific method can best thought of as “motivated iteration in which, in succession, practice confronts theory, and theory, practice.” He goes on to say that, “Rapid progress requires sufficient flexibility to profit from such confrontations, and the ability to devise parsimonious but effective models [and] to worry selectively …”.

Let’s look at what he means in a little more detail and how it applies to experimentation programs. 

Learning and Motivated Iteration

Box argues that learning occurs through the iteration between theory and practice. Experimentation programs formalize the process for continuous learning about marketing messaging, customer journeys, product improvements, or any other number of ideas/theories. 

Box: “[L]earning is achieved, not by mere theoretical speculation on the one hand, nor by the undirected accumulation of practical facts on the other, but rather by a motivated iteration between theory and practice. Matters of fact can lead to a tentative theory. Deductions from this tentative theory may be found to be discrepant with certain known or specially acquired facts. These discrepancies can then induce a modified, or in some cases a different, theory. Deductions made from the modified theory now may or may not be in conflict with fact, and so on.”

As part of the scientific method, experimentation of ideas naturally requires BOTH a theory about how things work AND the ability to collect facts/evidence that may or may not support that theory. By theory, in our case, we could mean an understanding of what motivates your customer, why they are your customer and not someone else’s, and what you might do to ensure that they stay that way. 

Many times marketers purchase technology and tools in an effort to better understand their customers. However, without a formulated experimentation program, they are missing out on one half of the equation. The main takeaway is that just having AB Testing and other analytics tools are not going to be sufficient for learning. It is vital for YOU to also have robust theories about customer behavior, what they care about, and what is likely to motivate them. The theory is the foundation and drives everything else. It is then through the iterative process of guided experimentation, that then feeds back on the theory and so on, that we establish a robust and useful system for continuous learning. 

Flexibility

Box;  “On this view efficient scientific iteration evidently requires unhampered feedback. In any feedback loop it is … the discrepancy between what tentative theory suggests should be so and what practice says is so that can produce learning. The good scientist must have the flexibility and courage to seek out, recognize, and exploit such errors … . In particular, using Bacon’s analogy, he must not be like Pygmalion and fall in love with his model.”

Notice the words that Box uses here: “unhampered” and “courage”. Just as inflexible thinkers are unable to consider alternative ways of thinking, and hence never learn, so it is with inflexible experimentation programs. Just having a process for iterative learning is not enough. It must also be flexible. By flexible Box doesn’t only mean it must be efficient in terms of throughput. It must also allow for ideas and experiments to flow unhampered, where neither influential stakeholders nor the data science team holds too dearly to any pet theory. People must not be afraid of creating experiments that seek to contradict existing beliefs, nor should they fear reporting any results that do.  

Parsimony

Box: ”Since all models are wrong the scientist cannot obtain a “correct” one by excessive elaboration. On the contrary, following William of Occam [we] should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so over elaboration and overparameterization is often the mark of mediocrity.”

This is where the  “All Models are Wrong” saying comes from! I take this to mean that rather than spend effort seeking the impossible, we should instead seek what is most useful and actionable –  “how useful is this model or theory in helping to make effective decisions?”

In addition, we should try to keep analysis and experimental methods as simple as required for the problem.  Often companies can get distracted, or worse, seduced by a new technology or method that adds complexity without advancing the cause.  This is not to say that more complexity is always bad, but whatever the solution is, it should be the simplest one that can do the job. That said, the ‘job’ may be really for signaling/optics rather than to solve a specific task. For example, to differentiate a product or service as more ‘advanced’ than the competition, regardless if it actually improves outcomes. It is not for me to say if those are good enough reasons for making something more complex, but I do suggest being honest about it and going forward forthrightly and with eyes wide open. 

Worry Selectively

Box: “Since all models are wrong the scientist must be alert to what is importantly wrong. It is inappropriate to be concerned about mice when there are tigers abroad.”

This is my favorite line from Box. Being “alert to what is importantly wrong” is perhaps the most fundamental and yet underappreciated analytic skill. It is so vital not just in building an experimentation program but for any analytics project to be able to step back and ask “while this isn’t exactly correct, will it matter to the outcome, and if so by how much?” Performing this type of sensitivity analysis, even if informally in your own mind’s eye, is an absolutely critical part of good analysis. You don’t have to be an economist to think, and decide at the margin.

Of course if something is a mouse or a tiger will depend on the situation and context. That said, in general, at least to me, the biggest tiger in AB Testing is fixating on solutions or tools before having defined the problem properly. Companies can easily fall into the trap of buying, or worse, building a new testing tool or technology without having thought about: 1) exactly what they are trying to achieve; 2) the edge cases and situations where the new solution may not perform well; and 3) how the solution will operate within the larger organizational framework. 

As for the mice, they are legion.  They have nests in all the corners of any business, whenever spotted causing people to rush from one approach to another in the hopes of not being caught out.  Here are a few of the ‘mice’ that have scampered around AB Testing:

  • One Tail vs Two Tails (eek! A two tailed mouse – sounds horrible)
  • Bayes vs Frequentist AB Testing
  • Fixed vs Sequential designs
  • Full Factorial Designs vs Taguchi designs

There is a pattern here. All of these mice tend to be features or methods that were introduced by vendors or agencies as new and improved, frequently over-selling their importance, and implying that some existing approach is ‘wrong’.  It isn’t that there aren’t often principled reasons for preferring one approach over the other. In fact, often, all of them can be useful (except for maybe Taguchi MVT – I’m not sure that was ever really useful for online testing) depending on the problem. It is just that none of them, or others, will be what makes or breaks a program’s usefulness.

The real value in an experimentation program are the people involved, and the process and culture surrounding it – not some particular method or software.  Don’t get me wrong, selecting software and the statistical methods that are most appropriate for your company matters, a lot, but it isn’t sufficient.  I think what Box says about the value of the statistician should be top of mind for any company looking to run experimentation at scale:
 “… the statistician’s job did not begin when all the work was over-it began long before it started. …[The Statistician’s] responsibility to the scientific team was that of the architect with the crucial job of ensuring that the investigational structure of a brand new experiment was sound and economical.” 

So too for companies looking to include experimentation into their workflow. It is the experimenter’s responsibility to ensure that the experiment is both sound and economical and it is the larger team’s responsibility to provide an environment and process, in part by following Box, that will enable their success.  

If you are looking to upgrade your AB Testing software and improve your experimentation program please contact us here to learn more.

Posted in Uncategorized | Leave a comment


Conductrics and ITP

What’s the impact on your Conductrics implementation?


As you are likely aware, many browsers have begun to restrict the ability to “track” visitor behavior, in an effort to protect visitor privacy. The focus is especially on third-party scripts that could track users as they move between totally unrelated sites on the Internet.

Apple’s efforts in this regard are particularly well-known, with the introduction of their Intelligent Tracking Prevention (ITP) policies.

  • ITP has gone through several revisions, each placing additional restrictions on how visitor information can be stored and for how long.
  • While ITP is most well-known for affecting Safari users on iPhone and iPad, it also affects other browsers on those devices, such as Firefox for iOS / iPadOS. Safari users on the Mac are also affected.

While Conductrics has never engaged in any sort of visitor-tracking outside of your own agents/tests, ITP and other similar restrictions are now a fact of life. There will be some impact on what you can do with Conductrics (or other similar service).

When Using Client-Side Conductrics in Web Pages


When you use Conductrics Express or our Local JavaScript API in your pages, your Conductrics tests/agents do their work locally in the browser. They also store their “visitor state” information locally in the browser.

This visitor state information includes the variation assignments per agent (such as whether the visitor has been selected for the “A” or “B” variation for each of your tests), and some other information such as which goals/conversions have been reached and whether any visitor traits have been set.

ITP says that this visitor state information will be cleared between a user’s visits to your pages, if more than 7 days have passed since their last visit. For visit types, the visitor state will be cleared after one day (the one-day rule is triggered if visitors arrive at your site via link decoration, common in social media campaigns).

What does this all mean?

  • If a visitor goes 7 days (*) or more between visits, your client-side Conductrics implementation will see the visitor as a “new” visitor.
  • So, after 7 days (*), the visitor might get a different variation than on their previous visit(s). They would also be counted again in the Conductrics reporting.
  • If the visitor hits a goal/conversion event, but it’s been 7 days (*) since they were exposed to a variation for a given test/agent, the goal/conversion will not be counted.

How should we change our testing or optimization practice?

  • For A/B or MVT testing, try to focus on tests that would reasonably be expected to lead to a conversion event within 7 days (*).
  • Generally speaking, focus on tests or optimizations where it wouldn’t be “absolutely terrible” if the visitor were to be re-exposed to a test, possibly getting a different variation.
  • You could consider using rules/conditions within your Conductrics agents such that Safari browsers are excluded from your tests. However, that will likely reduce the number of visitors that you do expose (particularly on mobile), probably requiring you to run the test for longer, and also possibly “skewing” your results since you’d be likely excluding most visitors on Apple devices.
  • You could consider looking at ITP vs non-ITP browsers (or iOS / iPadOS vs non) when evaluating the results of a test, to see if there are any important, noticeable differences between the two groups for your actual visitors.
  • Conversely, Conductrics could be configured to treat all visitors as if they were subject to ITP’s “7 day rule” (or one day), even on browsers that don’t currently impose such restrictions by default (thus leveling the field between ITP and non-ITP browsers). Contact Conductrics to discuss.

 

(*) The “7 day rule” might actually be one day, depending on whether the visitor lands on your site via “link decoration” (that is, via a URL that has identifiers included in query parameters or similar). See this article from the WebKit team for details.

Frequently Asked Questions


These frequently asked questions are about client-side Conductrics usage.

Q: How is the Visitor State information stored?

A: By default, client-side Conductrics stores its visitor state information in the browser’s “Local Storage” area. Alternatively, it can be configured to instead store the information as a cookie. The main reason to choose the cookie option is to allow the visitor state to be shared between subdomains. ITP is in effect in equal measure regardless of whether your Conductrics implementation is set to use Local Storage or Cookies.

Q: What does Conductrics do to work around ITP?

A: We don’t try to defeat or “work around” ITP or similar policies. The WebKit team has made it very clear that ITP is a work in progress and will continue to address any loopholes. Rather than try to implement workarounds that would likely be short-lived, we think our customers will be better served if we focus on helping you make sense of your test results in an ITP world.

Q: Which browsers are affected by ITP?

A: Technically, ITP is implemented in WebKit browsers, which includes Safari on mobile and desktop. That said, other browsers such as Firefox have similar policies, so often the term “ITP” is used colloquially to refer to any browser restrictions on cookies and other browser-based storage and tracking.

When Using the Conductrics REST API


All of the above is about using client-side Conductrics. If you use the Conductrics REST-style API, the “visitor state” information is retained on the Conductrics side, using an identifier that you pass in when calling the API.

Because the identifier is “on your side” conceptually, whether your REST API tests are affected by ITP will depend on how you store the identifier that you provide to Conductrics.

  • If the identifier is associated with a “logged in” or “authenticated” session, it is probably stored in some way on your side such that it can live for a long time, and thus would not be affected by ITP.
  • If the identifier is being stored as a first-party cookie by your client-side code, it is also subject to ITP or similar policies, so your Conductrics tests will be affected to more or less the same degree as discussed above. However, if the identifier is being stored as “server cookie” only (with the HttpOnly and Secure flags on the cookie itself), then it is probably not affected by ITP.
  • For native mobile apps (or other devices or systems such as kiosks, set-tops, or IVRs), you probably have an identifier that is stored “natively”, without having anything to do with web pages, so your implementation would probably not be affected by ITP.

 Questions?

As always, feel free to contact Conductrics regarding ITP, visitor data, or any other questions you may have. We’re here to help!

Posted in Uncategorized | Leave a comment


Conductrics Announces Search Discovery as a Premier Partner

December 1, 2020 – Austin, Texas – 

Conductrics, a digital experimentation and artificial intelligence (AI) SaaS company, announces its partnership with Search Discovery, a premier data transformation company. Search Discovery will now offer Conductrics optimization technology along with industry-leading optimization consulting support.

Together, the two companies will offer clients a superior integrated solution that satisfies a market need: Clients across industries are searching for both technological solutions and strategic guidance to help drive their internal innovation and growth. This partnership will make it simple for clients to work smarter and faster, use better experimentation techniques, and leverage both Conductrics’ and Search Discovery’s core competencies to build best-in-class optimization and experimentation programs.

Conductrics offers robust and flexible experimentation software that supports the specific requirements of Marketing, Product, and IT departments. Teams are able to seamlessly manage and quickly deploy their experiments using Conductrics’ integrated communication and implementation tools. 

Search Discovery provides strategic consulting for clients to manage and run successful experimentation and personalization programs at scale.

“Aside from the natural business fit, our two teams work well together. The expert team at Search Discovery has an impressive track record of helping A-list clients build and grow world-class optimization programs,” comments Matt Gershoff, Conductrics’ co-founder and CEO. “This new partnership will enable us to provide clients with the optimal combination of technology and experimentation expertise.”

“The Conductrics platform has optimal flexibility, transparency, and the power needed to help us support our clients’ data-driven decision-making across every digital experience—even in today’s increasingly complex privacy environment,” says Kelly Wortham, Search Discovery’s Senior Optimization Director. “The Conductrics team’s ability to quickly customize the platform with our clients’ rapidly changing requirements makes this partnership even more exciting for Search Discovery.”

About Conductrics

In 2010, Conductrics released one of the industry’s first REST APIs for delivering AB Testing, multi-arm bandits, and predictive targeting to empower both Marketing and IT professionals. With Conductrics, marketers, product managers, and consumer experience stakeholders can quickly and easily optimize the customer journey, while IT departments benefit from the platform’s simplicity, ease of use, and integration with existing technology stacks. Visit Conductrics at www.conductrics.com

About Search Discovery

Search Discovery is a data transformation company that helps organizations use their data with purpose to drive measurable business impact. Their services and solutions help global organizations at every stage of data transformation, including strategy, implementation, optimization, and organizational change management. Search Discovery delivers efficient operations, deeper insights, and improved-decision making across marketing, sales, finance, operations, and human resources. Visit searchdiscovery.com.

Posted in Uncategorized | Leave a comment


Video: Conductrics Platform Overview

Nate Weiss, Conductrics CTO, demonstrates key new features and functionalities of the latest software release.  This latest release focuses on streamlining and improving the user experience for faster and easier execution of optimization programs.
The new features that are demonstrated include: 
  • A revamped and simplified user experience
  • Streamlined A/B Testing/Optimization workflows
  • Improved Program Management tools
  • Upgraded API and mobile developer libraries

Posted in Uncategorized | Leave a comment


Conductrics Announces Updated Release of its SaaS Experimentation Platform

 SEPTEMBER 24, 2020 – AUSTIN, TX – Digital experimentation and Artificial Intelligence (AI) SaaS company Conductrics, today announced its latest major release of its experience optimization platform built expressly for marketers, developers, and IT professionals. The updated platform, which shares the company’s name, is a cloud-based A/B testing and decision optimization engine. This latest release focuses on streamlining and improving the user experience for faster and easier execution of optimization programs.   

These upgrades make it easier for clients to scale optimization programs across different organizational and functional teams in order to deliver ideal digital experiences for their customers. The new features include: 

  • A revamped and simplified user experience (UX), 
  • Streamlined A/B Testing and Optimization workflows, 
  • Improved Program Management tools,  
  • Upgraded API and mobile developer libraries.

“Since our start in 2010, our goal has been to make it faster and easier for developers and marketers to work together in order to confidently discover and deliver the best customer experiences,” says Matt Gershoff, Conductrics’ co-founder and CEO. “As other technologies morph and become increasingly more complex, we remain focused on developing accessible, leading-edge optimization and experimentation technology.” 

The new release will be available in mid-October.  Current clients will have the option to use the legacy platform or the new platform – no action is needed on their part. A webinar will be held on October 13th to demonstrate the new features and benefits – a link to register is on the company website.

About Conductrics

In 2010, Conductrics released one of the industry’s first REST APIs for delivering AB Testing, multi-arm bandits and predictive targeting to empower both Marketing and IT professionals. With Conductrics, marketers, product managers, and consumer experience stakeholders can quickly and easily optimize the customer journey, while IT departments benefit from the platform’s simplicity, ease of use, and integration with existing technology stacks. Visit Conductrics at www.conductrics.com. 

For more information, contact press@conductrics.com

Posted in Uncategorized | Leave a comment


Headline Optimization at Scale: Conductrics Macros

Conductrics has a history of providing innovative approaches to experimentation, testing and optimization.  In 2010, we introduced one of the industry’s first REST API for experimentation which also supported multi-arm bandits and predictive targeting.  Continuing our goal to provide innovative solutions, we’ve developed Express Macros and Templates to safely and easily build tests at scale.

To illustrate, let’s say you frequently run headline testing on a news page or change certain areas of a landing page over and over. In such situations, you don’t want test authors to have full page-editing capabilities – you just want them to have access to specific sections of the page or the site. Additionally, because variations of the same basic test will be conducted over and over, it is imperative that the test setup is simple, easy to repeat and scaleable.

Conductrics Templates and Express Macros

Conductrics Templates and Express Macros are an answer to this. They are an easy way to create reusable forms that allow your team to set up and run multiple versions of similar tests just by filling out a simple form and a click of a button. 

EXAMPLE: When to Use Conductrics Express Macros

One of our national media clients wanted to optimize News headlines for their news homepage. This meant that rather than just provide a single Headline for each news article, the client wanted to try out several potential Headlines for each story, see which worked best, and then use the winning headline going forward. To do this, they wanted to take advantage of Conductrics multi-armed bandits, which automatically discover, and deploy the most effective Headline for each story. 

However, they have scores of articles each day, so they needed to be able to run hundreds of these tests every month. In addition, these tests needed to be set up by multiple, non-technical editors safely, so as to not risk breaking the homepage. 

This is where Express Macros helped. Macros let the client extend and customize the Conductrics test creation process by:

1) creating simple custom forms to make it easy to set up each test; and

2) applying custom JavaScript to that form in order for the test to execute properly. 

How the Express Macro Works

For example, this macro will create a form with two input fields, “Post Id” and “Article Headline”. You can, of course, create any number of fields that are needed.

Now that we have specified what data to request from the test creator, we now just need to provide the JavaScript that will use the values taken in by the Macro’s form to run the test. In this case we will want to use the ‘Post Id’ (alternatively, this could be a URL or some other identifier) to tell Conductrics on which article to run the test. We also include in our JavaScript logic to swap in the alternative Headline(s) for the test. 

Here is an example of what that might look like: 

While this might look complicated if you don’t know JavaScript, don’t worry, this is something any front-end developer can do easily (or you can just ask us for help). 

All that there is left to do is to name and save it. I have named it ‘Headline Optimization’.

There is just one last step before we can let our Headline editors start to run tests, and that is to assign the Macro to a Template. 

Template Example

Express Macros was developed to bridge the workflows of programmers and non-technical users.  So now that the Macro has been created by the programmer, it has been assigned/converted to a template for use by non-technical users. This makes the process easy-to-use, scaleable, reproducible, and secure.  

Creating a Template is just like setting up any Conductrics Agent. The only difference is that by assigning the Agent to a Macro, it will become a Template that can be used to generate new tests easily. 

For example, here I have created an Agent name ‘Headline Optimization’. In the bottom portion of the set-up page, I select ‘Template’. This brings up a list of all of the Macros I am authorized to use. In this case, there is just the ‘Headline Optimization’ Macro we just created. By selecting this Macro, the Agent will be converted into a Template for all of the ‘Headline Optimization’ tests going forward. 

Now comes the amazing part. All the test creator needs to do is go to the Conductrics Agent List Page, and they will see the custom button created for Headline Tests. 

Clicking the ‘Headline Optimization’ button will bring up the custom form. For our simple ‘Headline Optimization’ example, it looks like this:

Notice that it asks for two pieces of information, the Post Id and the alternative Article Headline to test (you can add multiple alternative headlines to each test using the ‘Add Another’ option).

Once the Post Id and the alternative headline are entered, the test author just clicks ‘Create’ and that’s it! The test will be scheduled and pushed live. 

Not only does this make it super simple for non-technical users to set up hundreds of these tests easily, but it also provides guard rails to prevent accidental, or unintended, selections of erroneous page sections. 

Notifications

Communication of test results are automated with Conductrics notification streams.  Users receive top-level results of each ‘Headline Optimization’ test directly to their Slack channel including company members who are not Conductrics users.  So all relevant stakeholders can also be part of the discussion around what types of Headlines seem to be most compelling and effective.

Here is a simple example – once the Conductrics bandit algorithm has selected a winner, a notification like the following is sent to the client’s Slack with the following summary information.

The winning variation is noted, along with the number of visitors and the click through rate for each headline. 

Conclusion

In this example, the client was able to scale from a handful of tests per month to hundreds of tests per month, and the guard rails allowed multiple non-technical users to have more control over the testing while freeing the developers to work on more complex problems. 

Express Macros and Templates are the ideal solution for digital marketers and CX professionals who have multiple, repeatable versions of a particular test design. They streamline the process, allow for set up in an easy-to-use form, and ensure compliance by controlling what can be modified. Express Macros solve the problem of so many ideas, so little time. If you would like to learn more about Conductrics Express Macros and Templates, please contact us.

Posted in Uncategorized | Leave a comment


Getting Past Statistical Significance: Foundations of AB Testing and Experimentation

How often is AB Testing reduced to the following question:  ‘what sample size do I need to reach statistical significance for my AB Test?’  On the face of it, this question sounds reasonable. However, unless you know why you want to run a test at particular significance level, or what the relationship is between sample size and that significance level, then you are most likely missing some basic concepts that will help you get even more value out of your testing programs.

There are also a fair amount of questions around how to run AB Tests; what methods are best; and the various ‘gotchas’ to look out for.  In light of this, I thought it might be useful to step back, and just review some of the very basics of experimentation and why we are running hypothesis tests in the first place. This is not a how to guide, nor a collection of different types of tests to run, or even a list of best practices.

What is the problem we are trying to solve with experiments?

We are trying to isolate the effect on some objective result, if any, of taking some action in our website (mobile apps, call centers, etc.). For example, if we change the button color to blue rather than red, will that increase conversions, and if so, by how much?

What is an AB test?

AB and multivariate tests are versions of randomized controlled trials (RCT). A RCT is an experiment where we take a sample of users and randomly assign them to control and treatment groups. The experimenter then collects performance data, for example conversions or purchase values, for each of the groups (control, treatment).

I find it useful to think of RCTs as having three main components:  1) data collection; 2) estimating effect sizes; and 3) assessing our uncertainty of the effect size and mitigating certain risks around making decisions based on these estimates.

 

Collection of the Data

What do you mean by sample?

A sample is a subset of the total population under investigation. Keep in mind that in most AB testing situations, while we randomly assign users to treatments, we don’t randomly sample. This may seem surprising, but in the online situation the users present themselves to us for assignment (e.g. they come to the home page).  This can lead to selection bias if we don’t try to account for this nonrandom sampling in our data collection process. Selection bias will make it more difficult, if not impossible, to draw conclusions about the population we are interested in from our test results. One often effective way to mitigate this is by running our experiments over full weeks, or months etc. to try to ensure that our samples look as much as possible like our user/customer population.

Why do we use randomized assignments?

Because of “Confounding”. I will repeat this several times, but confounding is the single biggest issue in establishing a causal relation between our treatments and our performance measure.

What is Confounding?

Confounding is when the treatment effect gets mixed together with the effects from any other outside influence. For example, consider we are interested in the treatment effect of Button Color (or Price, etc.) on conversion rate (or average order size, etc). When assigning users to button color we give everyone who visits on Sunday the ‘Blue’ button treatment, and everyone on Monday the ‘Red’ button treatment. But now the ‘Blue’ group is comprised of both Sunday users and the Blue Button, and the ‘Red’ group is both Monday users and the Red Button. Our data looks like this:

Sunday:Red 0%   Monday:Red 100%
Sunday:Blue 100%   Monday:Blue 0%

We have mixed together the data such that any of the user effects related to day are tangled together with the treatment effects of button color.

What we want is for each of our groups to both: 1) look like one another except for the treatment selection (no confounding); and 2) to look like the population of interest (no selection bias).

If we randomly assign the treatments to users, then we should on average get data that looks like this:

Sunday:Red 50%   Monday:Red 50%
Sunday:Blue 50%   Monday:Blue 50%

Where each day we have a 50/50 split of button color treatments.  Here the relationship between day and button assignment is broken, and we can estimate the average treatment effects without having to worry as much about influences of outside effects (this isn’t perfect of course, since it holds only on average – it is possible due to sampling error that for any given sample we don’t have a balanced sample over all of the cofactors/confounders.)

Of course, this mixing need not be this extreme – it is often much more subtle. When Ronny Kohavi advises to be alert to ‘sample ratio mismatch’, (See: https://exp-platform.com/hbr-the-surprising-power-of-online-experiments/), it is because of confounding. For example, say a bug in the treatment arm breaks the experience in such a way that some users don’t get assigned. If this happens only for certain types of users, perhaps just for users on old browsers, then we no longer have a fully randomized assignment.  The bug breaks randomization and lets the effect of old browsers leak in and mix with the treatment effect.

Confounding is the main issue one should be concerned about in AB Testing.  Get this right and you are most of the way there – everything else is secondary IMO.

Estimating Treatment Effects

We made sure that we got random selections, now what?

Well, one thing we might want to do is use our data to get an estimate of the conversion rate (or AOV etc.) for each group in our experiment.  The estimate from our sample will be our best guess of what the true treatment effect will be for the population under study.

For most simple experiments we usually just calculate the treatment effect using the sample mean from each group and subtract the control from the treatment –  (Treatment Conversion Rate) – (the Control Conversion Rate) = the Treatment Effect.  For example, if we estimate that Blue Button has a conversion rate of 0.1 (1%) and Red Button has a conversion rate of 0.11 (1.1%), then the estimated treatment effect is -0.01.

Estimating Uncertainty

Of course the goal isn’t to calculate the sample conversion rates, the goal is to make statements about the population conversion rates.  Our sample conversion rates are based on the particular sample we drew. We know that if we were to have drawn another sample, we almost certainly would have gotten different data, and would calculate a different sample mean (if you are not comfortable with sampling error, please take a look at https://blog.conductrics.com/pvalues).

One way to assess uncertainty is by estimating a confidence interval for each treatment and control group’s conversion rate. The main idea is that we construct an interval that is guaranteed to contain, or trap, the true population conversion rate with a frequency that is determined by the confidence level. So a 95% confidence interval will contain the true population conversion rate 95% of the time.  We could also calculate the difference in conversion rates between our treatment and the control group’s and calculate a confidence interval around this difference.

Notice that so far we have been able to: 1) calculate the treatment effect; and 2) get a measure of uncertainty in the size of the treatment effect with no mention of hypothesis testing.

Mitigating Risk

If we can estimate our treatment effect sizes and get a measure of our uncertainty around the size, why bother running a test in the first place? Good question.  One reason to run a test is to control for two types of error we can make when taking an action on the basis of our estimated treatment effect.

Type 1 Error – a false positive.

  1. One Tail: We conclude that the treatment has a positive effect size (it is better than the control) when it doesn’t have a real positive effect (it really isn’t any better).
  2. Two Tail: We conclude that the treatment has a different effect than the control (it is either strictly better or strictly worse) when it doesn’t really have a different effect than the control.

Type 2 Error – a false negative.

  1. One Tail: We conclude that the treatment does not have a positive effect size (it isn’t better than the control) when it does have a real positive effect (it really is better).
  2. Two Tail: We conclude that the treatment does not have a different effect than the control (it isn’t either strictly better or strictly worse) when it really does have a different effect than the control.

How to specify and control the probability of these errors?

Controlling Type 1 errors – the probability that our test will make a Type 1 error is called the significance level of the test. This is the alpha level you have probably encountered. An alpha of 0.05 means that we want to run the test so that we only make Type 1 errors up to 5% of the time. You are of course free to pick whatever alpha you like – perhaps an alpha of 1% may make more sense for your use case, or maybe an alpha of 0.1%. It is totally up to you! It all depends on how damaging it would be for you take some action based on a positive result, when the effect doesn’t exist. Also keep in mind that this does NOT mean that if you get a significant result, that only 5% (or whatever your alpha is) of the time it will be a false positive.   The rate that a significant result is a false positive will depend on how often you run tests that have real effects.  For example, if you never run any experiments where the treatments are actually any better than the control, then all of your significant results will be false positives.  In this worse case situation, you should expect to see significant results in up to 5% (alpha%) of your tests, and all of them will be false positives (Type 1 errors).

You should spend as much time as needed to grok this idea, as it is the single most important idea you need to know in order to thoughtfully run your AB Tests.

Controlling Type 2 errors – this is based on the power of the test, which is turn based on the beta. For example, a beta of 0.2 (Power of 0.8) means that of all of the times that the treatment is actually superior to the control, your test would fail, on average, to discover this up to 20% of the time. Of course, like the alpha, it is up to you, so maybe power of 0.95 makes more sense, so that you make a type 2 error only up to 5% of the time.  Again, this will depend on how costly you consider this type of mistake.  This is also important to understand well, so spend some time thinking about what this means.  If this isn’t totally clear, see a more detailed explanation of Type 1 and Type 2 errors here https://blog.conductrics.com/do-no-harm-or-ab-testing-without-p-values/.

What is amazing, IMO, about hypothesis tests is that, assuming that you collect the data correctly, you are guaranteed to limit the probability of making these two types of errors based on the alpha and beta you pick for the test. Assuming we are mindful about confounding, all we need to do is collect the correct amount of data. When we run the test after we have collected our pre-specified sample, we can be assured that we will control these two errors at our specified levels.

 

“The sample size is the payment we must make to control Type 1 and Type 2 errors.”

 

What about Sample Size?

There is a relationship between alpha, beta, and the associated sample size. In a very real way, the sample size is the payment we must make in order to control Type 1 and Type 2 errors. Increasing the error control on one means you either have to lower the control on the other or increase the sample size.  This is what power calculators are doing under the hood — calculating the sample size needed, based on a minimum treatment effect size, and desired alpha and beta.

 

What about Sample Size for continuous conversion values, like average order value?

Calculating sample sizes for continuous conversion variables is really the same as for conversion rates/proportions. For both we need to have some guess of the both the mean of the treatment effect and the standard deviation of the effect.  However, because the standard deviation of a proportion is determined by its mean, we don’t need to bother to provide it for most calculators. However, for continuous conversion variables we need to have an explicit guess of the standard deviation in order to conduct the sample size calculation.

What if I don’t know the standard deviation?

This isn’t exact, and in fact it might not be that close, but in a pinch, you can use the range rule  as a hack for the standard deviation.  If you know the minimum and maximum values that the conversion variable can take (or some reasonable guess), you can use standard deviation ⩰ (Max-Min)/4 as a rough guess.

What if I make a decision before I collected all of the planned data?

You are free to do whatever you like. Trust me, there is no bolt of lightning that will come out of the sky if you stop a test early, or make a decision early. However, it also means that the Type 1 and Type 2 risk guarantees that you were looking to control for will no longer hold. So to the degree that they were important to you and the organization, that will be cost of taking an early action.

What about early stopping with sequential testing?

Yes, there are ways to run experiments in a sequential way.  That said, remember how online testing works. Users present themselves to the site (or app or whatever), and then we randomly assign them to treatments. That is not the same as random selection.

Why does that matter?

Firstly, because of selection bias. If users are self selecting when they present themselves to us, and if there is some structure to when different types of users arrive, then our treatment effect will be a measure of only the users who we have seen, and won’t be a valid measure of the population we are interested in. As mentioned earlier, often the best way to deal with this is to sample in the natural period of your data – normally this is weekly or monthly.

Secondly, while there are certain types of sequential tests that don’t bias our Type 1 and 2 errors they do, ironically, bias our estimated treatment effect – especially when stopping early, which is the very reason you would run a sequential test in the first place. Early stopping can lead to a type of magnitude bias – where the absolute value of the reported treatment effects will tend to be too large.  There are ways to adjust try to adjust for this,  but it adds even more approximation and complexity into the process.
See https://www.ncbi.nlm.nih.gov/pubmed/22753584  and http://journals.sagepub.com/doi/abs/10.1177/1740774516649595?journalCode=ctja

So the fix for dealing with the bias in Type 1 error control due to early stopping/peaking CAUSES bias in the estimated treatment effects, which, presumably, are also of importance to you and your organization.

The Waiting Game

However, if all we do is just wait –  c’mon, it’s not that hard 😉 – and run the test after we collect our data based on the pre-specified sample size, and in weekly or monthly blocks, then we don’t have to deal with any issues of selection bias or biased treatment effects. This is one of those cases where just doing the simplest thing possible gets you the most robust estimation and risk control.

What if I have more than one treatment?

If you have more than one treatment you may want to adjust your Type 1 error control to ‘know’ that you will be making several tests at once. Think of each test as a lottery ticket. The more tickets you buy, the greater the chance you will win the lottery, where ‘winning’ here means making a Type 1 error.

The chance of making a single Type 1 error over all of the treatments is called the Familywise Error Rate (FWER). The more tests, the more likely you are to make a Type 1 error at a certain confidence level (alpha). I won’t get into the detail here, but to ensure that the FWER is not greater than your alpha, you can use any of the following methods:  Bonferroni; Sidak; Dunnetts etc. Bonferroni is least powerful (in the Type 2 error sense), but is the simplest with the least assumptions, so a good safe bet, esp if Type 1 error is a very real concern. One can argue which is best, but it will depend, and for just a handful of comparisons it won’t really matter what correction you use IMO.

Another measure of familywise error is the False Discovery Rate (FDR) (See “>https://en.wikipedia.org/wiki/False_discovery_rate).  To control for FDR, you could do something like the Benjamini–Hochberg procedure.  While controlling for the FDR means a more powerful test (less Type 2 error), there is no free lunch, and it is at the cost of allowing more Type 1 errors. Because of this, researchers often use the FDR as a first step to screen out possibly interesting treatments in cases where there are many (thousands) of independent tests. Then, from the set of significant results, more rigorous follow up testing occurs.  Claims about preference around controlling for either FDR or FWER are really implicit statements about relative risk of Type 1 and Type 2 error.

Wrapping it up

The whole point of the test is to control for risk – you don’t have to run any tests to get estimates of treatment effects, or a measure of uncertainty around those effects. However, it is often is a good idea to control for these errors, so the more you understand their relative costs, the better you can determine how much you are willing pay to reduce the chances of making them. Rather then look at the sample size question as a hassle, perhaps look at it as an opportunity for you and your organization to take stock and discuss what the goals, assumptions, and expectations are for the new user experiences that are under consideration.

 

Posted in Uncategorized | Leave a comment