HOME   >   NEWS   >   How to Name Genes With Excel
How to Name Genes With Excel

How to Name Genes With Excel

BY Gloria 15 Dec,2020 Excel Genes MARCH1 HGNC

Advertisement

Human genetic material contains a large quantity of genes. If we want study them, we need to distinguish them. As a result, scientists have developed a whole set of rules to name genes. Every gene has a name and symbol written in capital letters and numbers. After all, only if scientists follow the same rules can they exchange research information. 

On the other hand, MS Excel is playing a leading role in spreadsheets. Although it is not specially designed for science and study, it has long become an unreplaceable tool for scientists.

However, problem comes.

1.jpeg

As the name list of genes becomes larger and larger, their symbols sometimes look like some wrong-spelled sentences with meanings, and Excel will automatically change them into the right format.

For example, when scientists input the symbol “MARCH1”, Excel will “correct” it into date “1/3”.

Advertisement

That was frustrating, or even dangerous and destructive kindness. Scientists need to recover them by themselves. And even small negligence will cause huge data mistakes. In 2016, a research has found that one fifth of articles were influences by Excel’s misreading, after its research members checked shared genetic data of 3,597 released articles.

“It’s really, really annoying,” Dezső Módos, a systems biologist at the Quadram Institute in the UK, told The Verge. He says Excel errors happen all the time, simply because the software is often the first thing to hand when scientists process numerical data. “It’s a widespread tool and if you are a bit computationally illiterate you will use it,” he says. “During my PhD studies I did as well!”

There’s no easy fix. Excel doesn’t offer the option to turn off this auto-formatting, and the only way to avoid it is to change the data type for individual columns. Even then, mistakes will occur again once other scientists open the same file directly.

Therefore, this week, the HGNC published new guidelines for gene naming, including for “symbols that affect data handling and retrieval.” From now on, they say, human genes and the proteins they expressed will be named with one eye on Excel’s auto-formatting. That means the symbol MARCH1 has now become MARCHF1, while SEPT1 has become SEPTIN1, and so on. A record of old symbols and names will be stored by HGNC to avoid confusion in the future.

4.jpg

So far, the names of some 27 genes have been changed like this over the past year, Elspeth Bruford, the coordinator of HGNC, tells The Verge. “We consulted the respective research communities to discuss the proposed updates, and we also notified researchers who had published on these genes specifically when the changes were being put into effect,” says Bruford.

After HGNC’s new guidelines were announced, the response from the community was jubilant. Some even cried tears of joy. “Greatest news of the day!” said a pseudonymous Twitter user.

Of course, there has been some dissent about the decision: why was it easier to rename human genes than it was to change how Excel works?

“This is quite a limited use case of the Excel software,” she says. “There is very little incentive for Microsoft to make a significant change to features that are used extremely widely by the rest of the massive community of Excel users.”

“We do not need to be upset. Microsoft Excel may be fleeting, but human genes will be around for as long as we are. It’s best to give them names that work,” said Bruford.

Advertisement

Advertisement