Are you drowning in data but struggling to find insights? Maybe you have valuable data, but can’t seem to give the right people access to it, or the quality of your data is unreliable.
Data is a valuable asset
Most organisations understand the critical importance of data, but many struggle to realise its full potential. Common data challenges can slow down decision making and lead to unnecessary costs and inefficiency.
That’s where our Data Health Check service can help. Our team of data experts evaluate the data strategy for your product and company, and help to identify and address any challenges or gaps in your data process.
Do you need a data health check?
We can help with most data challenges, thanks to years of experience supporting clients to solve data problems such as:
- Lots of data, but slow insights: Many companies have large amounts of data, but struggle to gather insights and make effective decisions quickly. This can be due to a lack of clarity around business goals, unclear data pipelines, or inadequate analytics tools.
- Locked data: Some companies have highly valuable data, but it’s only accessible to a small group. This can limit the impact of the data and prevent it from being used to drive business decisions.
- Difficulties accessing data: End users may find it difficult to access the data they need, leading to data silos and duplication of work. This can be due to inadequate data access controls or clarity around data ownership.
- Unreliable data quality: Inaccurate data can lead to mistrust and confusion, with different teams using different metrics to measure success. This can be due to data quality issues, such as inconsistent data sources or poor validation processes.
- Unnecessary costs: Some companies are running data pipelines which are more capable than the actual needs, which leads to wasted resources and increased costs.
- Lack of a clear data strategy: Without a clear strategy, companies may find themselves moving towards what they’ve been asked to do rather than what is best for the business. This can lead to missed opportunities and misaligned priorities.
- Lack of governance: Companies may be generating large amounts of data, but without clear ownership and governance, it can be difficult to use it effectively.
How our service works
The data health check addresses these challenges through a three-stage approach:
- First, we work with you to understand your business goals and desired outcomes from the data of a specific domain.
- Second, we will evaluate your current data pipelines and analytics processes.
- Third, we’ll present actionable recommendations for improvement.
A data health check usually lasts for two weeks, and will be conducted by a pair of Equal Experts practitioners who can cover data architecture, data engineering and data science (if required).
What can you expect from a data health check?
A Data Health Check service can guide your company to overcome common data challenges and leverage the full potential of your data.
We’ll identify and address gaps in your data processes, so that you can make faster, more informed decisions that drive better business outcomes. For example, we recently helped one Equal Experts client to save $3 million by implementing the recommendations from a Data Health Check.
At Equal Experts, we’re committed to helping our clients succeed, and we’d love to help your company do the same. Contact us today to learn more about our Data Health Check.
In my first blog post on ChatGPT I noticed that, to get good results for most of the work I wanted, I needed to use the latest model, which is tuned for instruction/chatgpt type tasks (text-davinci-003); the base models just didn’t seem to cut it.
I was intrigued by this, so I decided to have another go at fine-tuning ChatGPT, to see if I could find out more. Base models (e.g. davinci as opposed to text-davinci-003) use pure GPT-3. (Actually, I think they are GPT-3.5 but let’s not worry about that for now). They generate completions based on input text; give them the start of a sentence and they will tell you what the next words will typically be. However, this is different from answering a question or following an instruction.
In the following example you can see the top 5 answers to the prompt “Tell me a story” when using the base GPT-3 model (which is davinci – currently the most powerful we have access to):
You can see from these answers how GPT-3 works; it has found completions based on examples in the training corpus. ‘Tell me a story , mama,” the youngster said. is likely a common completion in several novels. But this is not what I wanted! What I wanted was for ChatGPT to actually tell me a story.
If I give the same query to a different model – text-davinci-003 – I get the sort of results I was looking for. (The answers have been restricted to a short length, but it could carry on extensively.)
Clearly the text-* models have significantly enhanced functionality compared with the standard GPT-3 models. In fact, one of the important things that ChatGPT has done is focus explicitly on understanding the user’s intention when they create a prompt. And as well as allowing ChatGPT to understand the intention of the user, the approach also prevents it from responding in toxic ways.
How they do this is fascinating; they use humans in the loop, and reinforcement learning. First, a group of people create the sort of training set you’d expect in a supervised learning approach. For a given prompt “Tell me a story”, a person creates the response “Once, upon a time there was a …” These manually created prompt-response pairs are used to fine-tune ChatGPT’s base model. Then, people (presumably a different group of people) are used to evaluate the performance by ranking outputs from the model – this is used to create a separate reward model. Finally, the reward model is used in a reinforcement learning approach to update the fine-tuned model made in the first stage. (Reinforcement learning is an ML technique where you try lots of changes and see which ones do better against some reward function. It’s the approach that DeepMind used to train a computer to play video games.)
It’s worth noting the sort of investment required to do this. Using people to label and evaluate machine learning outputs is expensive, but it is a great way to improve the performance of a model. In recent LinkedIn posts I have seen people claiming to have been offered jobs to do this sort of work, so it looks like OpenAI are continuing to refine the model in this way. Reinforcement learning is usually a very expensive way of learning something; reinforcement learning on top of deep learning is even more expensive, so there has clearly been a lot of time, effort and money expended in developing the model.
ChatGPT are open in their approach – here’s their take on how to use ChatGPT; I really applaud their openness. I’m aware there are a lot of negative reactions to the tool, but it’s worth pointing out that the developers are clearly aware that the model is not perfect. If you look at the limitations section of the document they note several things including:
- It can still generate toxic or biased outputs
- It can be instructed to produce unsafe outputs
- It is trained only in English, so will be culturally-biased to English-speaking people
In my opinion, generative models are not deterministic in their outputs so these sorts of risks will always remain. But I think serious thought and expense has been applied in order to reduce the likelihood of problematic results.
I started this investigation into fine-tuning ChatGPT because I noticed that I wasn’t able to fine-tune on the text-* versions of the model. In fact, it clearly states in the ChatGPT fine-tuning guide that only the base models can be fine-tuned. Now I understand why. Unfortunately, I also understand better that the text-* models are those which have been improved with the human in the loop process. And these are the ones that have a lot of the secret sauce for ChatGPT; it is the refinement using human guidance that gives it that fantastic human-like ability which seems so impressive.
The implication for the use-cases I had in mind – fine-tuning ChatGPT to specific contexts, such as question answering about Equal Experts – is that you cannot use the model which understands your intention. So you will not get the sort of natural-sounding responses we see in all the examples people have been posting on the internet. That’s a bit sad for me, but at least now I know.
Pretty much everyone by now has heard of – and probably played with – ChatGPT (If you haven’t, go to https://openai.com/blog/chatgpt/ to see what all the fuss is about.); it’s a chatbot developed by OpenAI based on an extremely large natural language processing model, which automatically answers questions based on written prompts.
It’s user friendly, putting AI in the hands of the masses. There are loads of examples of people applying ChatGPT to their challenges and getting some great results. It can tell stories, write code, or summarise complex topics. I wanted to explore how to use ChatGTP to see if it would help with the sorts of technical business problems that we are often asked to help with.
One obvious area to exploit is its ability to answer questions. With no special training I asked the online ChatGPT site “What is Continuous Delivery?”
That’s a pretty good description and, unlike traditional chatbots, it has not been made from curated answers. Instead, it uses the GPT-3 model which was trained on a large corpus of documents from CommonCrawl, WebText, Wikipedia articles, and a large corpus of books. GPT-3 is a generative model – if you give it some starter text it will complete it based on what it has encountered in its training corpus. Because the training corpus contains lots of facts and descriptions, ChatGPT is able to answer questions. (Health warning – because the corpus contains information that is not always correct, and because ChatGPT generates responses from different texts, in many cases the answers sound convincing but might not be 100% accurate.)
This is great for questions about generally known concepts and ideas. But what about a specific company. Can we use ChatGPT to tell us about what an individual company does?
- Can it immediately answer questions about a company, or does it need to be provided with more information? Can customers and clients use ChatGPT to find out more about an organisation?
- If we give it specific texts about a company can it accurately answer questions about that company?
Let’s look at the first question. We’ll pretend I’m a potential customer who wants to find out about Equal Experts.
That’s disappointing. Let’s try rephrasing:
It’s still not the answer I was hoping for. Although I do now know that I need a diverse team with a range of skills!
Clearly ChatGPT is great for general questions but won’t help potential customers find out about Equal Experts (or most other businesses.) So maybe I can improve things. ChatGPT has the ability to ‘fine-tune’ models. You can provide additional data – extra documents which contain the information you want – and then retrain the model so it also uses this new information.
So, can we fine-tune GPT to create a model which knows about Equal Experts? I found some available, good quality text on Equal Experts – the case-studies on our website – and used it as training material.
I wrote a small number of sample questions and answers for individual case-studies and submitted them for training. I used a simplified approach to the one given by OpenAI. (For those interested I didn’t use a discriminator because of the low training data volumes.) I then asked some questions for which I knew the answers were in the training data, and displayed the top 5 answers from our new model via the ChatGPT API:
I think we can agree that these are not great results. I should caveat with the fact that creating training examples takes a lot of time. (It is often the biggest activity in an AI or machine learning initiative.) The recommendation is to use 100s of training examples for ChatGPT and I only created 12 training prompts. So results could well be improved with a bigger training set. On the other hand, one of the big selling points of GPT-based models is that they facilitate one-shot or few-shot learning (only needing a few examples to train the model on a new concept – in this case Equal Experts). So I’m disappointed that training has made no difference and that the answers are poor compared to the web interface.
In fact the results are also affected by the training procedure. The training approach to using ChatGPT gives a context first (some text), then a question, then the answer. If I give context with the question (e.g. some relevant text such as an EE case study) and then ask a question, ChatGPT responds much better.
These are much better answers (although far from perfect), but I have had to supply relevant text in the first place, which rather defeats the object of the exercise.
So, could I get ChatGPT to identify the relevant text to use as part of the prompt? You can, in fact, use ChatGPT to generate embeddings – a representation of the document as a mathematical vector – which can be used to find similar documents. So this suggests a slightly different approach:
- For each case-study – generate embeddings
- For a given question – find the case-studies which are most similar to the question (using the embeddings)
- Use the similar case-studies as part of the prompt
This gives some pretty good responses; this image shows the top 5 responses for some questions about our case studies:
These look like pretty relevant responses but they’re a bit short, and they definitely don’t contain all the available information in the case-studies. It turned out that the default response size is 16 tokens (sort of related to word count). When I set it to 200 I get these results for what is a data health check:
Well that’s a much better summary, and it gave good results for other queries also:
Throughout this activity, I found a number of other notable points:
- Using the right model in ChatGPT is really important. I ended up using the latest model, which is tuned for instruction/chatgpt type tasks (text-davinci-003), and got good results for most of the work. Base models (e.g. davinci) just didn’t seem to cut it. This matters when you are thinking of fine-tuning a model because you cannot use the specialised models as the basis of a new one.
- The davinci series models used a lot of human feedback as part of their training, so they give the best results compared to simpler models. (Although I accept the need to experiment more.)
- Quite often the model would become unavailable and I would get the following message. If you want to use the API in a live service, be aware that this can happen quite regularly.
- Each question cost about 5-6 cents.
I have noticed a wide range of reactions from people using ChatGPT, from ‘Wow- this will change the world!’ to ‘It’s all nonsense – it’s not intelligent at all.’ – even amongst people with AI backgrounds. Having played with it, I think there are lots of tasks it can help with; organisations that can figure out where it will help them and how to get the most out of it will really benefit. Watching how things play out with Chat GPT and its successors (hello GPT-4) is going to be a fascinating ride.
What is generative research?
Generative research is an exploratory technique used to understand users’ experiences, needs and behaviours and generate insights to support design and delivery. The goal is to identify problems to be solved first, rather than jumping straight into solutions and validating the idea with users afterwards, which risks teams building the wrong thing. By gaining an in-depth understanding of users’ motivations and challenges, teams can build empathy with their user base and explore solutions that meet user needs.
Generative discovery can take different forms:
Interviews: At the Department for Health and Social Care we conducted video interviews with overweight people with comorbid health conditions to understand barriers to accessing weight management services. This helped us to understand the needs of a wide range of users and make recommendations for improved signposting into digital services.
Contextual interviews: At the Ministry of Justice, we visited prisons to see how staff accessed and used data, to help understand how to replace a legacy reporting service. This allowed us to see user challenges first hand whilst using existing tools, and understand the environment. Is it noisy, what technology are people using, what workarounds do people use, and how could they be turned into opportunities for a new service?
Observations: At Pret a Manger, we visited coffee shops to observe the flow of customers, and to help understand how new digital services would intersect with in-store processes. If your prospective service has an offline touchpoint or will be used in a specific environment, immersing yourself in the environment and trying out services can help you to design a service that solves a whole problem for users, on and offline. In this example, seeing the flow of customers in store meant that we were able to make recommendations for how collection points could work for a new online ordering and digital messaging service.
Ethnographic research: At Jio we shadowed broadband installers in Mumbai to understand staff and customer experiences of broadband installation. This helped us to identify opportunities for new digital tools that track installation. We created customer journey maps to better understand the challenges faced by end users, and support the design of a new app for installers.
Surveys: One to one conversations are the preferred method for generative research so you can dig in with follow up questions and see context. However, surveys can be used to supplement this information and gain insight from a broader range of users. At the Department for Health and Social Care we also sent out a questionnaire with exploratory questions, which enabled us to gain insights from a broad range of users that were harder to reach through interviews.
I’m replacing a legacy service, do I need to do this?
Yes, often working practices have evolved around systems rather than actually supporting users to do what they need to do. Investing time in some formative research helps to uncover unmet user needs and pain points, ensuring existing inefficiencies are not baked into a new system.
Can’t we just do prototype testing?
It’s tempting to jump ahead to prototyping solutions but there is a risk of making assumptions about the right problem to solve and partially deciding the solution. Once you start down this road it can feel much harder to go back from here once money has been spent and stakeholders are invested in an idea. It’s easy to come up with ideas; the trick is selecting the good ones that solve real problems and add value. Generative research will give you the evidence and confidence you need to choose the right solution.
Even if you have a couple of different ideas to solve an assumed problem, there are risks in showing users multiple prototypes and asking which they want most. Cognitive psychology has taught us that humans don’t have reliable insight into why they make the choices they make, so we can have little confidence in asking users what they want and why.
We have a few ideas already – can’t we just get started?
David Travis and Philip Hodgson share a useful anecdote illustrating this issue in a brilliant book called ‘Think like a UX researcher’. In a consumer study, people were presented with four pairs of identical ladies’ tights labelled A, B, C or D. All of the tights were identical but the majority of users chose option D. This was because of a known position effect, where people have a tendency to choose things from the right hand side. More interesting is the fact that people were able to give reasons for why they had chosen pair D, for example that they were better quality or they had more elasticity.
By exploring needs, rather than wants, through carefully curated open questions and observations we can start to uncover users’ challenges. This helps us to be confident that the solutions we’re proposing will solve real problems and delight users. In the example above, if tights are our ‘product’, we haven’t learned if people actually need to buy tights, because we forced a decision upon them in the research. We can conclude that if people want to buy tights, they want them to be good quality with good elasticity, but what if the solution people needed wasn’t actually tights? Perhaps the real problem is something else: keeping legs warm, dressing modestly or making a fashion statement. We won’t know unless we ask open questions. By giving people a solution before we fully understand the problem and asking them to choose A, B, C or D, we limit our opportunity to identify the right thing.
How to conduct generative research
Successful generative research should follow the steps below. You can read about these steps in more detail in this blog post.
- Identify your goals – what are you trying to achieve?
- Create an interview plan to meet these goals
- Identify users from each subsection of your target audience
- Run the research including interviews and observations
- Analyse and synthesise the data into themes
- Create actionable insights from the analysis
How to make sense of the data
You’re going to generate a lot of qualitative data through the research and this can feel overwhelming. Avoid analysis paralysis by remembering the goals you defined in step one. To make sense of data, you can run a ‘thematic’ analysis. You’re looking for:
Commonalities in the kinds of things users say – this can help you identify the biggest pain points or opportunities.
Differences in what different groups of users say – this can help to establish different personas for your target product or service. There are a number of practical ways to collate and sort this information, including:
- Capture insights in a spreadsheet, and turn them into Post It notes using a digital whiteboard tool such as Miro, then start to group them into themes
- Use a specialist digital tool such as Dovetail to tag the data and identify themes.
- Go old school and stick insights on Post It notes on a big wall
Whichever method you use to collate data, start by identifying big themes like ‘pain points’ and ‘plus points’. Perhaps the pain points theme can be broken down into further subgroups, like time management, technology challenges or quality of data.
The words that users choose are important here, so try to flag some of these in your analysis, as they can help you understand the mental model users have of the problem area. Natural language can be really helpful in the later design phase to produce content and name new products and services in a way that resonates for the target user.
Analysing findings can easily take double the time taken to run each interview, so it’s important to budget adequate time to explore data.
How to turn insights into something useful
Your insights can help you to identify the needs that must be met to solve users’ problems. You can communicate these as user needs or as jobs to be done. Be careful not to solutionise at this stage; it’s important to get to the roots of the problem before trying to solve it. Involving your team can be helpful in identifying appropriate actions from your insights.
A well written user need communicates the problem to be solved without solutionising. For example, as an events planner, I need to know how many people will be attending my event two weeks in advance, so I can plan how many staff I need to run the event.
When you have the problems formulated, you use techniques like ‘How Might We’ statements to start generating ideas. Let’s imagine we have found through research that users are reluctant to buy from smaller retailers online because they are worried about the returns process. Turning these problems into statements can highlight opportunities for a product or service that we are confident meet real user needs.
You might also want to map these user needs visually to bring the information to life for the team. This could be as personas which communicate user needs for different types of users and user journey maps which can show the experience at different touch points across a given task.
Insight: some customers are nervous to shop online with small retailers because they are worried about the time and costs associated with the returns process
How might we: How might we create a fast and seamless returns process?
How might we: How might we provide all the information users need to feel confident in placing an order knowing that they can return it if it’s not right?
Can we design something yet?
At the end of this activity you might find that people don’t really need to solve a problem in the area you were expecting. Don’t be disheartened, money has been saved on building something that wouldn’t add value and you can now use the insights to pivot to a different area with confidence
If you identify a problem to be solved you can start sketching ideas for solutions at this point. For every solution you and the team come up with to solve the problem, cross reference it against how well it meets the core user needs you have identified, as well as technical feasibility and complexity to deliver.
You can do this in a simple table, working as a whole team to formulate potential solutions and weigh them up against their feasibility and potential to help and delight end users.
The next step will be validating these ideas with your end users through the opposite of generative research- evaluative research. Methods like prototype testing and A/B testing can help you to validate with users that you have got the solution right. This can then be fine tuned through ongoing feedback loops with your end user.
This sounds long winded, do we have to do it?
Generative research can help reduce the risk of delivering something that users don’t want, or missing the opportunity to identify solutions that would add the most value. By spending time gathering insights and evidence about user behaviour you can build confidence in the solutions you are proposing. You may feel reluctant to commit to the cost of a discovery but this can be done with a fairly small team. It may be easier to justify when you consider the alternative costs of building the wrong solution, lost revenue, reputational damage, staff inefficiencies as well as development costs incurred by staffing engineers, QAs, maintenance and support for the wrong solution.