Monday, February 17, 2014

What's the Score? It's time for a new judging system

 
Yet again, during and after an important snowboarding competition, one of the main things everyone was talking about was the judging. Questioning the judging of a sport isn't a topic unique to snowboarding, but the level and consistency of it is. When we should be talking about the snowboarding itself, we're forced to spend time trying to figure out what's going through the minds of a group of anonymous guys sitting in a little temporary hut. But the problem isn't the judges themselves, the problem is the judging system - The current system is fundamentally flawed and it needs to change...


What's the current system?

At some point, back in the early days of snowboarding someone decided that the best way to judge events like the pipe, slopestyle and big air, would be to use an extremely simple 100 unit scoring system based entirely on the subjective opinion of a small group of judges. A generation later and the scoring of snowboarding events hasn't evolved at all.

1986 Snowboarding World Championships, same rules, slightly more dogs.
 photo by Bryce Kanights

These events can't be decided by a simple element like time or distance, so some judgement of execution/style has to be used. At the Olympics, on the two relevant snowboard events, this was decided by six judges. The judges had to decide how they'll positively score each successful trick/run and how they'll punish mistakes. They then try very hard to judge all the runs on the same merits. Like all snowboard comps, the most technical run performed without error can in theory get 100 points.

And that's about as detailed as it gets.


Why doesn't the system work?

Firstly, it's not a system. It's a bunch of people figuring out a series of subjective scores in what is essentially just a controlled squabble.

Humans are naturally flawed judges. People make mistakes, they differ in terms of what they like or don't like, who they like or don't like, the level of experience they have or what mood they are in. Worst of all people's brains are inherently inconsistent when it comes to processing what they are seeing and what they remember. That inconsistency over the length of a competition means that competitors at the start and end of the event are invariably judged differently. The solution for this is to understand this limitation and add a system to reduce the effects. No matter how good any judge is, you can make their decisions better.

There's inconsistently over time and over competitions. Over the last few days people have spent a lot of time arguing over the relative merits of triple corks versus flat spins and how they were judged. The judges of previous events had been favouring the triple cork, but in Sochi the 1440 came out on top. There's no obvious reason for the change, but the worst thing is the inconsistency.

Our judges are being overwhelmed with information. There are a large number of factors in snowboarding that need to be judged, from amplitude to creativity, and from style to mistakes. It's extremely difficult for any one judge to factor all those elements in accurately in the 30 seconds or so they have to watch the event.

photo from Whitelines
"Bloody hell, did you catch any of that?"
"Sorry mate I blinked."
"Shit. We'll just have to copy what Dave wrote again."


All those judging factors are lumped together. When you do that you leave it open for more variation over the competition. The judges might start by favouring technicality but end up inadvertently focusing on execution.

In every competition the first suckers who have to make a run have no clue about how those judging factors will be balanced. While they try hard to fight the impediment of running early, everyone else is left to guess what the rules might be by watching how they fair. At the Olympics this went on for entire rounds of competition, because unlike other competitions there was now communication between the judges and the riders. Better communication speeds up the learning process, but it doesn't remove the problem. Everyone should be able to go into a competition knowing ahead of time what kind of score they could get if they land their chosen run.

A closed scale of 100 doesn't allow for progression. This scoring system doesn't in any way reflect the huge steps that have happened over a generation, and the tricks are only getting bigger. The riders of the YOLO flip era are getting the same scores as the riders from the 720s era as if nothing has changed. When you stick with a limited scale like this you can't even compare the performances of the same rider in different competitions.

Was Gian Simmen's gold winning run 1998 really the equivalent of what were were seeing this year?

He probably should have lost some points for losing his hat.


They were using six judges. That's not an odd number, so you can't got to a vote if people don't agree. You end up needing a head judge to make the ultimate decision and one person is far more likely to make a bad call than a group.

This isn't just a FIS issue. In this particular issue, it's not simply the fault of perennial scapegoat and everybody's favourite bad guys, the FIS. This problem doesn't just occur at the Olympics, it happens in all the snowboarding competitions all of the time, irrespective of size, scale or the particular group of people organising them.



How do other sports solve the problem?

There are a range of other sports that face the same problem of trying to marry up technicality and creativity through judging, but no other Olympic sport is lumbered with such a simple and flawed system as snowboarding. Over the years they have all added a more rigorous system into their judging to significantly reduce the type of problems that have been experiencing in snowboarding. There's a long page on Wikipedia detailing the judging systems of each of these sports, but there's no similar page for snowboarding. If these sports can do it better, so can we.


Diving 

They have a better scoring system, but at least we get to use goggles to help keep our faces tidy.

All dives are given pre-agreed a degree of difficulty number. In the event the dives are judged on execution on a scale of 10 (3 for the take-off, 3 for the flight, 3 for the entry and 1 for the judges discretion), then that number is multiplied by the degree of difficulty to provide the final score.  As the difficulties of the dives increase over time the scores get higher.


Gymnastics

They have a better scoring system, but at least we have heads.

They break things down into two scores:

Difficulty score. Different skills are given scores (i.e a back layout salto with a full twist [love that trick] is given a difficulty of G and a G skill earns you 0.7 points). The scoring is open-ended so as the skills become more difficult the scores can continue to grow over time.

Execution score. You start with a score of 10 (although because they use decimals it's essentially 100) and if you make any mistakes marks are taken away.
Add the two together and you get your score. If two people end up with the same score the person with the higher execution score wins.

Here's what that looked like for the men's pommel horse at the 2012 Summer Olympics




Ice Dancing 

They have a better scoring system, clearly.

It's another one where they have two scores which they add together:

Program Components Score. It's the sum of the scores of 5 elements: skating skills, transitions, performance/execution, choreography and interpretation. Each element is scored out of 10 in chunks of 0.25 (so essentially out of 40).

Technical Element Score. It's an open scale score judged like this:
Each element is judged first by a technical specialist who identifies the specific element and determines its base value. The technical specialist uses instant replay video to verify things that distinguish different elements; e.g., the exact foot position at take-off and landing of a jump. The decision of the technical specialist determines the base value of the element. A panel of twelve judges then each award a mark for the quality and execution of the element. This mark is called the grade of execution (GOE) that is an integer from −3 to +3. The GOE mark is then translated into another value by using the table of values in ISU rule 322. The GOE value from the twelve judges is then processed with a computerized random selection of nine judges, then discarding the high and low value, and finally averaging the remaining seven. This average value is then added to (or subtracted from) the base value to get the total value for the element.

They use 9 judges and they discard the highest and lowest scores.

Here's all those numbers from the Men's singles event at Sochi




There's obviously lot there we never want to see judged in a snowboarding competition, but the thing is, if they can do all of that judging as quickly as they do in the ice dancing, then we can clearly do a little better than just coming up with one number.

All these sports ultimately are scored on a base of roughly 100 units, and at first glance they seem the same, but the level of detail is significantly different in all the established sports. We've been doing this for a generation now, it's time that we sorted this out.



The solution

Snowboarding needs a system. We need to learn from the experiences of other sports, but we also need to create something that works for our unique sport. Here's what we need to start doing:


1. Break out the elements and score them separately.

To judge a snowboarding competition you are probably looking at these four elements:

  • Amplitude 
  • Technicality
  • Style
  • Execution

Amplitude, the height out of the pipe or the size of the slopestyle air can be measured accurately.

Technicality, the difficulty of the attempted trick set. Each trick should have a specific technical score, agreed and shared in advance by the judges. We would then know the relative merit of the different tricks before the competition starts. It would also clarify how the rails and jumps are valued in comparison in the slopestyle.

Style. Now that's the tricky one, it is almost totally subjective and styles can and do change, but by breaking it out, you do at least have a shot of consistency through a competition. If you stick to a fixed scale for this one, at least the subjectivity is reduced down to just affecting 1/4 of the overall score. I say 'almost totally subjective' because I think we can at least all agree that pulling a Tindy should always be punished.

Like fellow bloggist Agnarchy, caught in the act while riding Pine Knob. 
(Apparently that's that's name of a ski hill and he's not been getting acquanted with what Pinocchio keeps in his pants)

Execution. Again slightly subjective, but we could produce a pre-agreed scale of negative scores for mistakes. A hand down is bad, a butt check is worse, a fall is awful. We could also set-out whether a mistake on one object will affect the overall score or just the score on that portion.


2. Provide total visibility. Before the competition the technical and execution scoring sytem should be shared and after the competition each aspect of the results should be shared. People will be less likely to argue with open and transparent scores. The new stats would also bring a new element to competitive snowboarding - the ability to better compare the values of different snowboarders, the ability to set and break records.


3. Always have an odd number of judges.


4. Remove the highest and lowest scores to reduce bias and short-term inconsistency. It means that we should try to have at least 7 judges to allow for that. In the Olympics we should aim for 9.


5. For big events you could have specific people measuring or judging each element. Simplify each person's job and it reduces the risks that they will suffer form information overload.


6. Slow down the judging. There's no benefit to having a quick result. Why not let them look at replays when they are comparing two very close runs? Waiting a minute for some of the key results didn't affect the Olympic competition in any negative way. In fact it actually added an extra element of suspense. It's something TV talent shows have been milking for years and American Football has built a whole sport around the premise.


7. Use an open scoring scale. If you award a certain amount of points for each achievement, the scores can increase as the sport progresses. You can also keep a consistent score for each element (for example; a 720 like Gian Simmen threw in 1998 could always be worth 10 points, and Iouri Podladtchikov's Cab Double Cork 1440 YOLO Flip could now be a 24 point score). For mistakes you'd use the same thing (-2 points for squireling the landing, -10 points for a hand down, and so on). Stick with a system like that and instead of talking about the strange judging, the press could have been talking about I-Pod's new Olympic record.


8. Make sure that the right balance of technicality and style is achieved. This is all about setting the balance of the overall scoring system so that the balance makes sense. It's important that snowboarding doesn't become a spin-to-win event or just a style event like ice dancing. By using a scoring system like this and controlling the balance we can make sure it doesn't go too far down either route. It would solve yet another almost constant argument in the world of snowboarding.



I don't know about you, but I'm tried of the status-quo, tired of the lack of progression and tired of snowboard events being dominated by the opaque judging format and questionable judging decisions. Let's get on and fix this so we can get back to snowboarding and talking about snowboarding.




You Might Also Like...

The Fine Art of Pissing Off All The Right People

What’s Wrong With Snowboard Movies? - The Tyranny of ‘The Format’



4 comments:

  1. but if you sort out the judging system AND shaun white... what will snowboarding be left with? Opposition and disaffection are an integral part of our human rights. How very dare you take that away from us

    ReplyDelete
  2. Not sure about the comparison with ice dancing, snowboarding doesn't need any of this:
    http://www.theguardian.com/sport/2014/feb/21/sochi-2014-south-korea-russia-figure-skating-gold-sotnikova-kim-yuna
    Something more like TTR's live scoring system with dedicated scores per feature would more transparent, along with a consistent degree of emphasis on a clean run.

    ReplyDelete
  3. Ahahahah I can't believe I didn't see this back in February. I also can't believe that you re-published my tindy picture!

    ReplyDelete

 
© 1896. Design by Main-Blogger - Tinkering by Zhang - Colouring in by Illicit