Data-Driven Market Formation in On-Demand Transport

As on-demand transport providers (e.g., Uber) are adopting increasingly sophisticated mechanisms to allocate and price both passengers and drivers, new issues are arising. In a series of posts (starting here), I have been describing different aspects of these issues including the ways to allocate and price (the mechanism design) and also simulation tools to evaluate performance in realistic environments (capturing both the road network and the behavior of passengers and drivers).

In this post, I want to turn to a different aspect: the market formation problem.

In practice, many providers need to allocate and price a large number of passengers and drivers over short periods of time. This can be difficult as many available drivers can potentially service each passenger. The market mechanisms I described in this post have different ways of dealing with this issue:

  • In posted price (e.g., Uber) and hybrid PP/A mechanisms, one request is dealt with at a time. This means that a set of drivers must be offered a journey with each passenger. Problem:  how should this set of drivers be selected?
  • In double auction mechanisms, multiple drivers and passengers are allocated simultaneously. Problem: how should a provider decide which groups of passengers and drivers can be matched?

In each mechanism, the challenge is to find compatible passengers and drivers. This is important for two reasons:

  1. not all passengers can be serviced by all drivers; and
  2. it is undesirable for available drivers to deal with a large number of offers as it can be a distraction from driving safely.

As such, we need to decompose the initial market consisting of all drivers and passengers into submarkets consisting of compatible drivers and passengers. Note that this problem is not usually addressed in the economic or computer science literature on mechanism design, where it is assumed that all agents in the initial market are compatible. This is due to the homogeneous goods assumption. However, in practice ensuring each market is compatible is an important problem.

So, when are drivers and passengers compatible? Or, alternatively, when is a given passenger-driver pair incompatible?

There are two factors that affect whether of not a given driver and a passenger are not compatible:

  • Hard constraints are not satisfied; for example, the driver cannot reach the passenger at a desired pick-up time as the pair are too far apart, or the driver’s vehicle type is not acceptable to the passenger.
  • Soft constraints are not satisfied; for  example, a passenger is not likely to be accepted by a driver because the driver does not typically serve passengers with the requested pick-up or drop-off locations.

In traditional taxi services, hard constraints are usually enforced; however, soft constraints are more difficult to quantify and use in submarket formation. Often providers use expert knowledge to design heuristic rules, but these are typically not rigorously verified.

An alternative approach is data-driven market formation. In this approach, historical data available to the provider is used to develop statistical models to quantify the probability that a given passenger-driver pair is compatible. This data can be obtained via transaction records, which is now possible due to the ubiquitous use of sophisticated smartphone apps.

Such a data-driven market formation approach is highly desirable as it means that the market formation rule can exploit a range of features that may not be obvious even to experts. It can lead to an increase in the proportion of passengers that are served while reducing the number of offers to each driver—promoting safe driving.

How are the statistical models for data-driven market formation developed? In a project lead by Jan Mrkos and Jan Drchal at the Czech Technical University in Prague, a data-driven market formation algorithm has been proposed to aid providers in finding compatible submarkets.

Here is the basic idea:

  1. Select a number of features that potentially influence when a driver will respond to a passenger request.
  2. Using a data set of historical transactions, determine the influence of each feature on the probability that a driver will respond to a request; i.e., construct the statistical model.
  3. For each passenger-drive pair compute the probability that the pair is compatible.
  4. For each request, select the set of drivers that are offered the journey so that the probability at least one driver responds is greater than a threshold (e.g., 90%).

This algorithm has been tested on a data set kindly provided by Liftago, which is based in the Czech Republic. A limited data set is also available upon request. Liftago’s mechanism falls in the hybrid PP/A class, which is described in this earlier post.

Here are some of the observations, which were obtained by learning the statistical model and simulation on Liftago’s data set. Further discussion is available in the technical report here.

Feature ranking: Let’s start by looking at some different features that might be important in forming submarkets. Clearly, factors like distance will matter, but how much do other factors such as driver histories affect compatibility? In the following table is a complete list of features that were considered.

Table 1:  Features derived from the Liftago transactions dataset

Feature                          Description
pickup_distance          direct Euclidean distance to pickup (km)
ride_distance                direct Euclidian distance from pickup to destination (km)
pickup_center              pickup Euclidian distance from the Prague center (km)
ride_center                    pickup Euclidian distance from the Prague center (km)
hour                                 time of day (h)
day                                    day of the week (0 – 6)
mean_accept_rate      driver’s mean accept rate over all transaction records

Fig. 1 shows the relative impact of each feature on the probability a given driver will respond to a passenger. Observe that the pickup distance and the mean accept rate are the most important features, which means that driver histories play an important role.


Fig 1: Ranking of features for submarket formation.

Performance of the Statistical Model: The model outperforms Liftago’s initial market formation algorithm in terms of the average ratio of responses per request (0.867 for the model and 0.476 for Liftago’s original approach) and the average number of drivers that are offered a journey (less than 4 drivers for the model and more than for Liftago’s original approach).

A key conclusion of this study is that the data-driven maket formation approach appears to outperform the heuristic algorithm initially adopted by Liftago, based on results from the available data set. This suggests that adopting a data-driven approach—as opposed to purely expert-based heuristics—is a promising way to find compatible markets in on-demand transport.

For more details, a technical report is available on arXiv:

Mrkos, J., Drchal, J., Egan, M. and Jakob, M., “Liftago on-demand transport dataset and market formation algorithm based on machine learning,” available at, (2016).

For other blog posts in this series on market-based approaches to on-demand transport, see:

Mechanism design for on-demand transport
Market-based on-demand transport
A simulation tool for market-based on-demand transport

For a collection of research papers related to market-based on-demand transport, see here.


3 thoughts on “Data-Driven Market Formation in On-Demand Transport

Comments are closed.