Engineer Boards

# 85th Percentile Speed Question

## Recommended Posts

If you'd send to penotes@gmail.com, I'd appreciate it. Thanks!

##### Share on other sites

Major: I think your randomization gives exactly the results expected - thanks for setting it up.

I think sac is trying to copy another problem's solution without really understanding what 85th percentile means. You can't take the mean of a bin and say it's the representative speed of the whole bin's percentile - that's just not how histograms work.

##### Share on other sites

jaa046's explanation is correct. Also, we're not talking about histograms, we're talking about percentiles based on average speed.

Also, no one can assume the specific speeds of the vehicles, so for all we know, all 24 cars in the 40-44 mph category could have been travelling at 40 miles per hour. If this is the case (i.e. based on actual speeds), then the 85th percentile is 41 mph.

I'm not disputing the question is ambiguous, I'm just saying that the method of computing the 85 percentile is not the same as just adding the number of cars.

##### Share on other sites

Also, we're not talking about histograms, we're talking about percentiles based on average speed.

Yes, we are. A histogram is a way or representing the distribution of data by their frequency.

Also, no one can assume the specific speeds of the vehicles, so for all we know, all 24 cars in the 40-44 mph category could have been travelling at 40 miles per hour. If this is the case (i.e. based on actual speeds), then the 85th percentile is 41 mph.

Not so... even if all 24 vehicles were traveling at 40 mph, the 85th vehicle is still in the 45-49 bin. Really, this is the point of our disagreement, so it would go a long way towards resolution if you could explain why you think the 85th vehicle (after arranging the observed speeds from slowest to fastest) could be anywhere in the 40-44 bin. Would you please address just this difference in our perspectives?

P.S. Major made the assumption that the speeds are randomly distributed, which is as good an assumption as any.

##### Share on other sites

This is quite the debate! I hope I don't complicate matters even further but I come up with an 85th percentile speed of 42.4 mph, so I guess my answer would be C) 43.

I am trying to attach my calcs, hopefully it will work. Please forgive the poor picture quality - I used the camera on the back of a tablet PC.

What do you guys think?

##### Share on other sites

Let me try another angle... if the 85th percentile was in the 40-44 bin, don't you see that the cumulative frequency for that bin would have to be 25? And even still, the 85th percentile would be the highest speed observed in that bin, which is almost assuredly not the mean of the bin.

Really, this comes down to a fundamental understanding of what percentiles means. I wish I could explain it more clearly so sac and others could understand. But so long as they just keep insisting they're right, rather than explaining what the fundamental approach they're taking is, I don't think we'll come to a meeting of the minds.

##### Share on other sites

Ill, relax dude - this is a friendly discussion.

I see what you guys are saying. And I know for sure the source of our confusion is because the problem statement provides speed intervals instead of 100 exact speeds. But, in the absence of additional data, and not knowing exactly what is going on within each interval, I think it is fair to take the average speed for each interval.

So, in my view (and I could be wrong of course), the 84th speed % of the vehicles was in the range of 40-44 mph, or an average of 42mph in the absence of additional data (in other words, 84% of the observed vehicles were traveling at or below 42mph). Similarly, the 97th speed % was 45-49, or 47mph average. Thus, the speed at which 85% of the observed vehicles are traveling at or below is somewhere between 42mph and 47mph but will be closer to 42 (84th speed %) than 47 (97th speed %).

This, of course, only works if you can make the assumption that, for example, all 18 cars traveling in the 30-34mph interval were traveling at 32mph. Which they probably weren't but, without additional data, I think this is a valid assumption. For all we know, all 18 cars could have been traveling at 30mph, all at 34mph, etc.

I think if it weren't for the speed interval ranges, we'd all be able to easily calculate and agree on an answer. I think it's the interpretation of how to handle the ranges that's wigging us out.

##### Share on other sites
This, of course, only works if you can make the assumption that, for example, all 18 cars traveling in the 30-34mph interval were traveling at 32mph. Which they probably weren't but, without additional data, I think this is a valid assumption. For all we know, all 18 cars could have been traveling at 30mph, all at 34mph, etc.

But it still isn't correct. Okay, let's do the old excel trick on the assumption that every vehicle in each bin is traveling at the average speed for that bin. In that case the 85th percentile is 47 mph. Do it, go ahead, test your hypothesis.

MH. I totally see what you are saying. And, yes, if you were to simply take the 85th fastest car using my averaging suggestion, that car would be going 47mph. But (and maybe I am looking at this incorrectly) that's not my understanding of how the 85th percentile is established. I was taught to use the formula I showed in my post #32 above.

SD = [ (PD - PMin) / (PMax - PMin) ] (SMax - SMin) + SMin

Where SD = Speed based on your chosen percentile; PD = Your selected percentile (85% in our case); PMin = The cumulative percentage below PD in your distribution table (84% in our case); PMax = The cumulative percentage above PD (97% in our case); SMin = The speed corresponding to PMin (42mph in our case) and SMax = The speed corresponding to PMax (47mph in our case).

You do not simply take the 85th fastest car.

For example, let's say you recorded 25 (out of 100 ) cars at 35mph and 15 cars at 38mph with the 35mph speed falling in the 75% cumulative percent column of your frequency distribution table and the 38mph speed falling in the 88% cumulative percent column on your table. Well, even though the 85th fastest car was going 38mph, that is not the 85th percentile speed. In this example:

SD = [ (85 - 75) / (88 - 75) ] (38 - 35) + 35 = 37.3mph

Applying this formula to the original data provided in post #1 (and using an averaged speed for each interval), SD = 42.4mph

Anyway, that's how I look at it. I hope I made sense.

##### Share on other sites

I was taught to use the formula I showed in my post #32 above.

SD = [ (PD - PMin) / (PMax - PMin) ] (SMax - SMin) + SMin

Where SD = Speed based on your chosen percentile; PD = Your selected percentile (85% in our case); PMin = The cumulative percentage below PD in your distribution table (84% in our case); PMax = The cumulative percentage above PD (97% in our case); SMin = The speed corresponding to PMin (42mph in our case) and SMax = The speed corresponding to PMax (47mph in our case).

You do not simply take the 85th fastest car.

OK... I think I understand where your confusion lies. The procedure you've given (linear interpolation) is used to establish the percentile when there is not enough data (i.e. observations) such that one of the speeds is not exactly equal to the percentile of interest. Take a look at http://en.wikipedia.org/wiki/Percentile for an explanation of other methods and note the exception for linear interpolation:

If there is some integer k for which , then we take .

By *definition*, when there are 100 values (as in this problem), the 85th percentile is the 85th largest value when put in rank order. So, yes, you *DO* simply take the 85th fastest car.

##### Share on other sites

Ill, relax dude - this is a friendly discussion.

This, of course, only works if you can make the assumption that, for example, all 18 cars traveling in the 30-34mph interval were traveling at 32mph. Which they probably weren't but, without additional data, I think this is a valid assumption. For all we know, all 18 cars could have been traveling at 30mph, all at 34mph, etc.

I'm relaxed... just frustrated that for some (e.g. sac) it's as simple as "I'm undoubtedly right" without enough discussion to come to a resolution. And despite all the posts on this topic, it's still very useful because it will clarify a fundamental understanding of percentiles. You previous post was very helpful in understanding our different approaches.

The central limit theorem does not support your assumption that all observations in the bin are all equal to the average of the bin size (or any other size). It would be reasonable to assume a normal distribution of the entire data set or assume a linear distribution within the bin. It is intuitively *VERY* unlikely that all 24 observations in a bin have the same speed unless they were dependent observations (e.g. all were observed as part of a convoy). If they were independent observations, I'm sure we could calculate the probability.

##### Share on other sites

According to DISTRIBUTION OF VEHICLE SPEEDS AND TRAVEL TIMES by DONALD S. BERRY AND DANIEL M. BELMONT of UNIVERSITY OF CALIFORNIA:

"The speeds of vehicles past a point on a highway tend to have a roughly normal distribution except when traffic volume exceeds the practical capacity of the highway."

The data bins certainly look normally distributed to me.

##### Share on other sites

By *definition*, when there are 100 values (as in this problem), the 85th percentile is the 85th largest value when put in rank order. So, yes, you *DO* simply take the 85th fastest car.

This is incorrect. If the vehicle counts were doubled (i.e. total of 200 cars in the sample set), you would still think the 85 percentile speed is 45 mph because of your "bin" theory. This is why the vehicle count versus percentile must be mutually exclusive.

Assuming normal/linear distribution within each bin is unrealistic and will not acheive anything closer to reality.

Post #32 is the clear winner in this debate and would get the correct answer in the exam.... which is what it's all about, right?

##### Share on other sites

According to DISTRIBUTION OF VEHICLE SPEEDS AND TRAVEL TIMES by DONALD S. BERRY AND DANIEL M. BELMONT of UNIVERSITY OF CALIFORNIA:

"The speeds of vehicles past a point on a highway tend to have a roughly normal distribution except when traffic volume exceeds the practical capacity of the highway."

The data bins certainly look normally distributed to me.

Now you're just being contrary. Your bin theory does not comply to a normal distribution graph; it would a stepped (layer cake) graph. If you visualize the data as a bell-curve graph, then you have to assign a single value within each bin, like a trend line. You would then use this line to determine your percentile vs. speed graph (i.e. the area under the bell curve at each interval divided by the total number in the sample set).

As for your suggestion to assume a normal/linear distribution within each bin, your boundary conditions for each bin would be based the output of the previous bin and the input of the next bin. Does this sound like a realistc approach?

##### Share on other sites

Sometimes the accepted method isn't exactly correct, but oh well, I concede.

C'mon...Really? Are you going to die on the sword for this one?

If you had to do this calculation for a client ... say in Florida ...and you were required to put your stamp on it, would you really stray from the standard? I didn't think so.

##### Share on other sites

I clearly see something isn't right.

##### Share on other sites

That's what I found too and since 85 would be into the next speed range the 45 MPH answer seemed logical.

In the solutions they graphed the frequency vs. the speed. However when they drew a horizontal line from 85% they didn't intersect the graphed information and called it 43.5 MPH and I was confused because if they intersected the graphed line with the 85% line it would've been higher that 43.5 MPH. Seems like a fine line between answers C and D.

The traffic engineer in my office told me there is an equation for questions like that but couldn't remember what it was. I would think if we had to graph anything in the actual test the answer would be a lot clearer and have a larger difference in the answers than 2 MPH.

ceg, could you please post the actual solution? Thanks.

##### Share on other sites

By *definition*, when there are 100 values (as in this problem), the 85th percentile is the 85th largest value when put in rank order. So, yes, you *DO* simply take the 85th fastest car.

This is incorrect. If the vehicle counts were doubled (i.e. total of 200 cars in the sample set), you would still think the 85 percentile speed is 45 mph because of your "bin" theory. This is why the vehicle count versus percentile must be mutually exclusive.

OK... you refuse to address the questions/points I make, so there's not much more I can do for you with this discussion. You don't understand percentiles well enough to know that when there are 100 values (as in this problem), the 85th percentile is, by definition, the 85th largest value when put in rank order. Your counter about what happens when there are 200 vehicles or binned values has nothing to do with the DEFINITION of percentile.

##### Share on other sites

If you had to do this calculation for a client ... say in Florida ...and you were required to put your stamp on it, would you really stray from the standard? I didn't think so.

Hell, yeah... I absolutely would. This isn't the case of ignoring some mandatory code requirements (e.g. IBC, NEC, ACI, etc.). Major provided a state publication which has the explicit purpose "to provide guidelines and recommended procedures for establishing uniform speed zones on State, Municipal, and County roadways throughout the State of Florida." It's a *guide* with *recommended procedures*. If you're not smart enough to understand the science and math behind the topic, you've got no business being an engineer.

And, yeah, this is one of my pet peeves: engineers who simply plug-and-chug and claim the incorrect answer isn't their fault because they followed someone's formula.

##### Share on other sites

According to DISTRIBUTION OF VEHICLE SPEEDS AND TRAVEL TIMES by DONALD S. BERRY AND DANIEL M. BELMONT of UNIVERSITY OF CALIFORNIA:

"The speeds of vehicles past a point on a highway tend to have a roughly normal distribution except when traffic volume exceeds the practical capacity of the highway."

The data bins certainly look normally distributed to me.

Now you're just being contrary. Your bin theory does not comply to a normal distribution graph; it would a stepped (layer cake) graph. If you visualize the data as a bell-curve graph, then you have to assign a single value within each bin, like a trend line. You would then use this line to determine your percentile vs. speed graph (i.e. the area under the bell curve at each interval divided by the total number in the sample set).

As for your suggestion to assume a normal/linear distribution within each bin, your boundary conditions for each bin would be based the output of the previous bin and the input of the next bin. Does this sound like a realistc approach?

Now you're just f*cking with me... bin theory (what the hell is that?) has nothing to do with how random values are distributed. Do you deny the (intuitively obvious) fact that traffic spot speeds are usually normally distributed? Of course there are exceptions (I gave two) but that doesn't change the fact that the bin counts in this problem appear to be normally distributed, too.

I don't understand what you wrote: "your boundary conditions for each bin would be based the output of the previous bin and the input of the next bin. Does this sound like a realistc approach?"

You must recognize that each vehicle has a specific speed (measured to whatever number of significant digits you're instrument gives you) regardless of how you tally counts on a form.

Would you please answer this question... if the 85th vehicle (in rank order from slowest to fastest) was really going 43 mph, then doesn't the bin count for 40-44 have to be 25 instead of 24?

##### Share on other sites

This is one of those classic examples of people blindly applying a standard that somebody established years ago without actually thinking about it critically.

This is where an educated engineer with a license would bang their heads against a wall with an unlicensed, state DOT plan reviewer, because it doesn't follow the book answer, and you can clearly see that the book answer is not right.

What he said!

##### Share on other sites

Post #32 is the clear winner in this debate and would get the correct answer in the exam.... which is what it's all about, right?

You wouldn't find a problem like this on the exam because exam questions are written by engineers who understand the principles behind the subjects and every question is peer reviewed. 43 is an answer that is inconsistent with the reality of the data provided.

##### Share on other sites

For what it's worth, I'll throw my hat in and formally agree with sac and ptatohed. From my understanding and experience, the 85th percentile speed is most definitely a calculated speed, not a counted speed. More specifically, a speed that's calculated specifically using the equation previously provided by sac and the numerous examples provided by others.

Here is a link to the FDOT standard for determining the 85th percentil speed. You'll note at the bottom of the worksheet that the 85th percentile speed is calculated based upon the cumulative totals in each speed range.

http://www.dot.state.fl.us/trafficoperations/Operations/Studies/MUTS/Chapter13.pdf

As for....

By *definition*, when there are 100 values (as in this problem), the 85th percentile is the 85th largest value when put in rank order. So, yes, you *DO* simply take the 85th fastest car.

I suppose my response to this would be, what would you do when there isn't 100 values? What if there were, say, 99 or 101 values? I certainly don't agree with the concept of taking two completely different approaches to determining the speed when there's 100 values versus 99 values.

Also, as to people blindly following industry standard.... the use of the 85th percentile IS an industry standard so, by default, the industry standard for determining it should be employed.

Finally, I'd also like to mention that I don't particularly support, or respect, the idea of telling someone they're not "smart enough" to understand the concept behind a problem simply because you don't agree with their solution.

• 1

##### Share on other sites

You don't understand percentiles well enough to know that when there are 100 values (as in this problem), the 85th percentile is, by definition, the 85th largest value when put in rank order.

Please cite your reference in your statement above. I believe you're fabricating the definition of the 85th percentile in this example where the sample set of 100 values. Regardless of the number of values in a sample set, percentiles and actual values are not the same.

##### Share on other sites

Bravo mrt406!

The process of solving this question is about calculation, not just by adding some numbers then determining which bin the next highest number falls into.

Another example of an acceptable calculated value is the number of people per household: 2.59. Common sense tells you that there's no such thing as a 0.59 person, but it's calculated, just like the 85th percentile. Maybe we need to occupy the offices of the US Census Bureau and stand up for the 0.59!

##### Share on other sites
Finally, I'd also like to mention that I don't particularly support, or respect, the idea of telling someone they're not "smart enough" to understand the concept behind a problem simply because you don't agree with their solution.

My comment wasn't directed at anyone in particular but rather it was addressed to those engineers who use guides without understanding the concepts behind them. I apologize for not writing more clearly.

I stand by my point: if you don't understand the fundamentals of a topic, you'd better not just plug-and-chug your way to an answer and then throw a stamp on it.