The useful comments above about surface area, velocity, and volume of air treated make sense, and it got me thinking further about how we would measure these (and other) metrics.
For prototypes that produce an airflow into their technology (e.g. fan based) it would be possible to have an airflow monitor fitted to measure the actual airflow. If this is desired then we should tell the participants in advance that this is one of the requirements (and we would probably want to use standardised/identical monitors).
For mechanically passive surface technologies that have no artificially created airflow, on poor air quality days with no wind (or vertical air currents) the airflow will be low, and in a real-world on-street scenario difficult to measure. Under such conditions pollutants are being dispersed by flowing traffic (turbulent eddy currents) and diffusion. Turbulent air flows, by their nature, do not have a uniform or fixed velocity. This means that real-world airflow velocities will be low and hard to measure on the days when pollution hot spots (e.g. near high traffic flows, junctions, and street canyons) are at their worst - and when the air pollution removal technology is most required (e.g. to keep pollutant concentrations below legal limits).
The above real-world challenges inspired this thought: perhaps we should have two stages in the XPRIZE challenge: laboratory testing followed by real-world testing.
The lab testing allows us to monitor all of the metrics of interest in a controlled environment, which will be the same for all participants. Putting mechanically passive surface technologies in a wind tunnel would allow (an artificial) wind velocity and airflow to be measured, thus allowing measurements to be derived for the quantity of mass extracted per unit volume, if desired. Lab testing has the benefits of providing a well controlled and measured environment, and giving each prototype identical conditions to perform in.
The second phase of real-world testing is clearly of interest to see how prototypes perform in reality under a range of challenging environmental conditions. A one year test across all seasons would also indicate how robust and effective the prototypes are in reality. Looking at the following sketch, we should also put pollution sensors at the head height of pedestrians. [Note that, ironically, young children are often at the most vulnerable height - near the exhaust emissions.]

The micro-grams of pollutant per unit surface area (of the technology’s interface) per minute seems to be a useful metric for those real-world tests on passive surface technologies - for the reasons mentioned above. In the lab it is possible to also use mass extracted per unit volume, as mentioned above.
Another potentially useful metric is mass extracted per unit of energy consumed by the prototype (e.g. grams per kWh). This represents an environmental efficiency type rating. We want to extract a large quantity of pollutants, whilst consuming little energy (because even renewable energy has some environmental impact over its complete lifetime).
It is assumed that air quality monitoring professionals will be used in the lab and real-world trials. They’ll probably want to record additional metrics such as meteorological values (e.g. temperature, pressure, humidity, and wind speed). Note: Prototypes might function at different levels of efficiency as these values change.
Yet another factor to reflect on is the efficiency of a prototype when exposed to different levels (and types) of pollution.