This guy has built a searchable directory of 107 million research articles

The biggest obstacle to effective research is the inability to access previous scientific work that could help the study.. Technologist Carl Malamud Tries To Change That With “General Index,” A Huge Article Directory Of Over 107 Million Newspaper Articles.
Released on October 7, the catalog is free to all, houses an array of over 355 billion words and parts of sentences as well as the articles in which they appear.. The goal, according to Malamud, is to help scientists refer to the work of their peers that may not be legally available to them.
In a conversation with Nature, Malamud said that to avoid violating the copyright of the articles, Malamud has created a repertoire of excerpts of up to five words. So, if a researcher finds something valuable for their work, they can access the publisher’s original file, which may not be free.
Smita Sharma / Nature
Why is this a big deal
This can be a game-changer for researchers who are constantly looking for references and studies to support their current projects. Usually, scientists have to stick to publicly available documents or other scientific studies that their institution can pay for.
Malamud told Nature he started his project with the aim of disclosing information that has traditionally been kept under lock and key. Naturally, he encountered legal obstacles. Its focus, however, has shifted from government-produced information to scientific literature.
Unsplash
Read also : How Caterina Scarpelini’s curiosity in the 19th century paved the way for the study of space
The aim was to help scientists with text mining, if not to provide full access to articles, a process still ongoing. The server for the same would be located in India.
Even then, the General Index does not yet have a search engine. To be able to search effectively, scientists would have to upload its content and create their own programs that work like search engines. He hopes, however, that those who are able to create such programs will share it with others.
Unsplash
In total, the compressed files in the General Index are 5 terabytes in size. In terms of the legality of the process, Malamud is “very confident” that it is legal. His goal, he told Nature, is not to bring about a lawsuit, but to “advance science.”
Read also : Earth’s Inner Core Home to ‘New Hidden World’, Study Finds
What do you think of Malamud’s efforts to make scientific research more accessible to everyone? Share your thoughts with us in the comments below. To learn more about the world of tech and science, keep reading Indiatimes.com.
Quote
Editorial Nature. (2021). A giant free index of global research articles published online. Nature.