When our client receives project requests from their customers, a lot of time and cost is spent on resourcing project management – determining which teams should conduct the work – before we even start delivering the work. Here’s how we used data science and dashboarding to speed up this processing time and provide up-to-date metrics on the project delivery process.
The problem of resourcing project management
Our client runs an ever-growing department of over 800 people, delivering numerous projects in parallel, and the number of projects grows year after year. However, the client’s method of distributing work to the relevant teams (what they call their impacting process) hasn’t scaled with the success the client is having.
Impacting is a resource-intensive process requiring each team to read multiple documents – sometimes up to 25k words – to identify whether they are required for the project, and often they’re not. This results in a slow, manual process that requires multiple redundant points of contact.
After a project has been through the impacting process and is being delivered, there is no automated reporting. Typically, reporting is triggered by a status request from a senior leader, at which point the data is manually collected, creating slow and infrequent feedback loops.
This is an intensive process which puts tremendous strain on an already busy department, especially as they currently have to process over 100 project requests a week.
Our aim is to reduce the number of people involved in a project impact to only the most relevant individuals, and to streamline the amount of reading required to understand the project.
Leveraging data science for improved project resourcing and reporting
As the client had no clear insight on in-progress projects, we determined that the most useful first step was to provide reporting on these projects using data from their Jira ticketing system. This allows senior leaders to access project delivery information quickly and interactively, enabling them to identify issues and bottlenecks before they become problems.
We then focused on reducing the resource overhead in the impacting process. Project impacting is designed to determine which teams are required to work on a project. In this case, it involved a lot of people reading large documents which were potentially irrelevant to their team’s specialism.
So we sought to improve the impacting process in two ways:
- Can we reduce the amount of time needed to understand the project?
- Can we highlight the project to only the relevant teams?
The scope of data science
Reducing time to understanding
With a typical design document being approximately 25,000 words, it takes a person roughly 3-4 hrs to read. Reducing the amount of text needed to understand the document would result in significant time savings per person.
This was done in a variety of ways; firstly we used an AI model to summarise the text while retaining important information, allowing users to control the degree of summarisation. This summarisation method is also being used to create executive summaries for the senior leaders who constantly switch context between pieces of work, and need to very quickly understand different projects.
Secondly, we extracted keywords from the text so the user can rapidly determine important terms within the document.
These tools have proved very useful in enabling individuals to quickly establish whether they need to read the document in full, and can slim down reading time from a few hours to a few minutes.
Identifying Relevant People
Typically 12+ people can end up reading these documents, meaning that each project takes 6+ days of work just to impact – and many of these people are not even relevant to the project. Therefore, reducing the number of people reading these documents to only the most relevant compounds the savings given through document summarisation.
To do this we developed a machine learning classifier to determine which teams were relevant to a project, reducing the people required for impacting. Additionally, we identified similar existing projects and the teams involved in those, to further assist in establishing the right teams for the work.
A future enhancement we wish to add is building a recommender system that automatically alerts people if new projects arrive that are similar to previous projects they have delivered, further reducing the operational overhead.
The business value of improving project resourcing and reporting through data science
The client is now able to direct incoming projects to the relevant teams much faster, reducing the delay between a project’s request and work starting, and improving new customer satisfaction. The people involved in impacting now have time freed up to lead the deliveries of in-progress projects, which also benefits existing customers and team efficiency.
In the mid 2010’s there was a step change in the rate at which businesses started to focus on gaining valuable insights from data.
As the years have passed, the importance of data management has started to sink in throughout the industry. Organisations have realised that you can build the best models, but if your data isn’t qualitative, your results will be wrong.
There are many, varied job roles within the data space. And I always thought the distinction of the roles were pretty obvious. However, recently a lot has been written about the difference between the different data roles, and more specifically the difference between Data Scientists and Data Engineers.
I think it’s important to understand that not knowing these differences can be instrumental in teams failing or underperforming with data. Which is why I am writing this article. To attempt to clarify the roles, what they mean, and how they fit together. I hope that this will help you to understand the differences between a Data Scientist and a Data Engineer within your organisation.
What do the Data Engineer and Data Scientist roles involve?
So let’s start with the basics. Data Engineers make data available to the business, and Data Scientists enable decisions to be made with the data.
Data Engineers, at a senior level, design and implement services that enable the business to gain access to its data. They do this by building systems that automagically ingest, transform and publish data, whilst gathering relevant metadata (lineage, quality, category, etc.), enabling the right data to be utilised.
Data Scientists not only utilise the data made available, but also uncover additional data that can be combined and processed to solve business problems.
Both Data Scientists and Data Engineers apply similar approaches to their work. They identify a problem, they look for the best solution, then they implement the solution. The key difference is the problems they look at and, depending on their experience, the approach taken to solving it.
Data Engineers like Software Engineers, or even more generally engineers, tend to use a process of initial development, refinement and automation.
Initial development, refinement and automation explained, with cars.
In 1908 Henry Ford released the Model T Ford. As you can see, it has many of the same features as a modern car – wheels on each corner, a bonnet, a roof, seats, a steering wheel, brakes, gears.
In 1959 the first Mini was released. It had all the same features as the Model T Ford. However, it was more comfortable, cheaper, easier to drive, easier to maintain, and more powerful. It also incorporated new features like windscreen wipers, a radio, indicators, rear view mirrors. Basically, the car had, over 50 years, been incrementally improved.
Step forward in time to 2010, and Tesla released the Models S and X. These too have many features we can see in the Model T Ford and the Mini. But now they also contain some monumental changes.
The internal combustion engine is replaced with electric drive. It has sat-nav, autopilot, and even infotainment. All of which combine to make the car much easier and more pleasurable to drive.
What we are seeing is the evolution of the car from the initial production line – basic but functional – through multiple improvements in technology, safety, economy, driver and passenger comforts. All of which improve the driving experience.
In other words we are seeing initial development, refinement and automation. A process that Data Engineers and Data Scientists know only too well.
For Data Engineers the focus is on data, getting it from source systems to targets, ensuring the data quality is qualified, the lineage captured, the attributes tagged, and the access controlled.
What about Data Scientists? They absolutely follow the same pattern, but they additionally look to develop analytics along the Descriptive, Diagnostic, Predictive, Prescriptive scale.
So why is there confusion between the Data Scientist and Data Engineer roles?
There is of course not a single answer but some of the common reasons include:
- At the start, both Data Scientist and Data Engineers spend a lot of time Data Wrangling. This means trying to get the data into a shape where it can be used to deliver business benefits.
- At first, the teams are often small and they always work very closely together, in fact, in very small organisations they may be the same person – so it’s easy to see where the confusion might come from.
- It’s often given to Data Engineers to “productionise” analytics model created by Data Scientists.
- Many Data Engineers and Data Scientists dabble in each other’s areas, as there are many skills both roles need to employ. These can include data wrangling, automation and algorithms..
As the seniority of data roles develop, so do the differences.
When I talk to and work with Data Engineers and Data Scientists, I can often categorise them into one of three categories – Junior, Seasoned, Principal – and when I work with Principals, in either space, you can tell they are a world apart in their respective fields.
So what differentiates the different levels and roles?
That’s it. I hope this article helps you to more easily understand the differences between a Data Scientist and a Data Engineer. I also hope this helps you to more easily identify both within your organisation. If you’d like to learn more about our Data Practice at Equal Experts, please get in touch using the form below.
What do Data Science and User Experience have in common?
On the surface, you might expect very little as they appear to oppose one another. How about when attempting to understand human behaviour? Both UX and Data Science specialists try and solve these problems, but with different approaches. On a recent engagement, we found that combining techniques from both disciplines yielded powerful results.
Our client wanted to understand their users’ needs while using a job-posting website. User personas are a popular tool for communicating user needs off the back of conducting user research. On this engagement, we wanted to see if we could use some data science techniques to provide quantitative validation of the initial qualitative user research
The Tension Model
We worked in partnership with Koos Service Design. One of the techniques Koos use to develop personas is to investigate conflicting user needs, called “Tensions”. For example, a tension when applying for a job could be the conflict between ‘finding the perfect job’ and ‘finding a job quickly’. Initial research to capture user needs was conducted through in-depth interviews, surveys and exploratory data analysis of user logs.
From this small pool of data, an initial set of tensions was identified onto which personas (detailed below) are placed that encompass the different needs groups of users.
This approach was based on low-volumes of qualitative user research data. To enhance and refine the personas we would need to conduct further testing and experimentation with a much larger dataset.
With the information gathered during the initial user research, we developed a small survey asking True/False questions aimed at testing our hypotheses about the combination of needs people experienced.
This created an extremely large dataset on which we were able to use machine learning to group users together based on similarity.
The technique utilized was unsupervised k-means clustering. The aim of this is to group (or cluster) data that behaves similarly. An optimal number of 5 clusters was identified using the elbow method to minimise the error in the model without creating too many clusters. So the number of personas was revised to reflect this new information.
There was a lot of similarity between the initial personas and the final data-driven personas. The key divergence was the removal of one persona. However, there were sets of behaviours which persist between the initial and data-driven personas. For example, as the Survivor and the Quick Win, both have a desire to get a well-paid job quickly without any other preferences.
With these personas, the client was able to tailor individual user experiences based on their needs, ultimately improving customer satisfaction and engagement with the system.
This highlights how Data Science can bolster insights from UX design, leading to an end product more useful than using either technique in isolation.