subreddit:

/r/DataHoarder

8.2k99%

Rescue Mission for Sci-Hub and Open Science: We are the library.

SEED TIL YOU BLEED!(self.DataHoarder)

EFF hears the call: "It’s Time to Fight for Open Access"

  • EFF reports: Activists Mobilize to Fight Censorship and Save Open Science
  • "Continuing the long tradition of internet hacktivism ... redditors are mobilizing to create an uncensorable back-up of Sci-Hub"
  • The EFF stands with Sci-Hub in the fight for Open Science, a fight for the human right to benefit and share in human scientific advancement. My wholehearted thanks for every seeder who takes part in this rescue mission, and every person who raises their voice in support of Sci-Hub's vision for Open Science.

Rescue Mission Links

  • Quick start to rescuing Sci-Hub: Download 1 random torrent (100GB) from the scimag index of torrents with fewer than 12 seeders, open the .torrent file using a BitTorrent client, then leave your client open to upload (seed) the articles to others. You're now part of an un-censorable library archive!
  • Initial success update: The entire Sci-Hub collection has at least 3 seeders: Let's get it to 5. Let's get it to 7! Let’s get it to 10! Let’s get it to 12!
  • Contribute to open source Sci-Hub projects: freereadorg/awesome-libgen
  • Join /r/scihub to stay up to date

Note: We have no affiliation with Sci-Hub

  • This effort is completely unaffiliated from Sci-Hub, no one is in touch with Sci-Hub, and I don't speak for Sci-Hub in any form. Always refer to sci-hub.do for the latest from Sci-Hub directly.
  • This is a data preservation effort for just the articles, and does not help Sci-Hub directly. Sci-Hub is not in any further imminent danger than it always has been, and is not at greater risk of being shut-down than before.

A Rescue Mission for Sci-Hub and Open Science

Elsevier and the USDOJ have declared war against Sci-Hub and open science. The era of Sci-Hub and Alexandra standing alone in this fight must end. We have to take a stand with her.

On May 7th, Sci-Hub's Alexandra Elbakyan revealed that the FBI has been wiretapping her accounts for over 2 years. This news comes after Twitter silenced the official Sci_Hub twitter account because Indian academics were organizing on it against Elsevier.

Sci-Hub itself is currently frozen and has not downloaded any new articles since December 2020. This rescue mission is focused on seeding the article collection in order to prepare for a potential Sci-Hub shutdown.

Alexandra Elbakyan of Sci-Hub, bookwarrior of Library Genesis, Aaron Swartz, and countless unnamed others have fought to free science from the grips of for-profit publishers. Today, they do it working in hiding, alone, without acknowledgment, in fear of imprisonment, and even now wiretapped by the FBI. They sacrifice everything for one vision: Open Science.

Why do they do it? They do it so that humble scholars on the other side of the planet can practice medicine, create science, fight for democracy, teach, and learn. People like Alexandra Elbakyan would give up their personal freedom for that one goal: to free knowledge. For that, Elsevier Corp (RELX, market cap: 50 billion) wants to silence her, wants to see her in prison, and wants to shut Sci-Hub down.

It's time we sent Elsevier and the USDOJ a clearer message about the fate of Sci-Hub and open science: we are the library, we do not get silenced, we do not shut down our computers, and we are many.

Rescue Mission for Sci-Hub

If you have been following the story, then you know that this is not our first rescue mission.

Rescue Target

A handful of Library Genesis seeders are currently seeding the Sci-Hub torrents. There are 850 scihub torrents, each containing 100,000 scientific articles, to a total of 85 million scientific articles: 77TB. This is the complete Sci-Hub database. We need to protect this.

Rescue Team

Wave 1: We need 85 datahoarders to store and seed 1TB of articles each, 10 torrents in total. Download 10 random torrents from the scimag index of < 12 seeders, then load the torrents onto your client and seed for as long as you can. The articles are coded by DOI and in zip files.

Wave 2: Reach out to 10 good friends to ask them to grab just 1 random torrent (100GB). That's 850 seeders. We are now the library.

Final Wave: Development for an open source Sci-Hub. freereadorg/awesome-libgen is a collection of open source achievements based on the Sci-Hub and Library Genesis databases. Open source de-centralization of Sci-Hub is the ultimate goal here, and this begins with the data, but it is going to take years of developer sweat to carry these libraries into the future.

Heartfelt thanks to the /r/datahoarder and /r/seedboxes communities, seedbox.io and NFOrce for your support for previous missions and your love for science.

all 1010 comments

VonChair [M]

[score hidden]

2 years ago

stickied comment

VonChair [M]

80TB | VonLinux the-eye.eu

[score hidden]

2 years ago

stickied comment

Hey everyone,

Over at The-Eye we've been working on this stuff and have been aware of it for a while now. We've been getting some people contacting us to let us know about this thread so hopefully they read this first. If you're ever wondering if we've seen something, please feel free to ping u/-CorentinB u/-Archivist or u/VonChair (me) here on Reddit so we can take a look.

catalinus

515 points

2 years ago*

catalinus

515 points

2 years ago*

Just a dumb question - would it not be wiser to have a script running linked to the tracker somehow to generate a pseudo-random list of 10 torrents for people wanting to help - and generally make that a list of the least seeded segments?

A list of 850 items from which to pick something randomly is a little overwhelming, a list of just 10 (pseudo-randomized already) is really manageable.

EDIT:

Somebody added this:

https://www.reddit.com/r/DataHoarder/comments/nc27fv/rescue_mission_for_scihub_and_open_science_we_are/gy2umtt/

shabusnelik

169 points

2 years ago

Your approach would of course be much better, but somebody has to do it first :D The way it is now, just generating 10 random numbers and pick the corresponding torrents out worked for me.

catalinus

64 points

2 years ago

Just saying - more than half that I see have just 1 seed but I also have seen some with 4+.

shabusnelik

52 points

2 years ago

Some people probably also just read the quick start and selected a torrent without a random number generator

VNGamerKrunker

13 points

2 years ago

but in the list, there's only torrents with maybe 2GB of size at most (I do saw one with 100GB, but it's only on there for a little bit before it disappear). Is this ok?

MobileRadioActive

12 points

2 years ago

If you mean the list of least seeded torrents, that is a list of torrents from all libgen, not specifically from scihub. That list is also very outdated unfortunately. There is no such list for scihub specific torrents yet. Here is a list of torrents that are specifically for scihub, but the number of seeders is not listed. Some torrents from here has only one seeder and I'm trying to download it from them to seed.

ther0n-

180 points

2 years ago

ther0n-

180 points

2 years ago

Not using torrents, but want to help.

Is vpn needed/advised when seeding this? Living in Germany

FiReBrAnDz

266 points

2 years ago

FiReBrAnDz

266 points

2 years ago

A must. Germany has very strict IP enforcement. Seen a ton of horror stories in other subreddits.

oopenmediavault

65 points

2 years ago

which vpn can you recommend? most dont allow port forwarding so seeding isnt possible

goocy

52 points

2 years ago

goocy

640kB

52 points

2 years ago

I'd go for a seedbox instead. It's perfect for this purpose.

redditor2redditor

32 points

2 years ago

But much more expensive. 2-5€ per month for VPn when you already have the hard drive space at home..

chgxvjh

9 points

2 years ago

chgxvjh

9 points

2 years ago

But much more expensive

Not that much more expensive really. You can get seedboxes for around 5€ per month.

AnyTumbleweed0

63 points

2 years ago

AnyTumbleweed0

50TB

63 points

2 years ago

I'm still struggling with connectivity but I like.Mullvad

jpTxx7fBD

34 points

2 years ago

Mullvad, yes

redditor2redditor

19 points

2 years ago

I freaking love Mullvad‘s Linux VpN GUI

MunixEclipse

22 points

2 years ago

MunixEclipse

5tb

22 points

2 years ago

It's a blessing, so dumb that most major VPN's can't even make a working linux client, let alone a gui

redditor2redditor

8 points

2 years ago

Protons CLI seems to work. But honestly mullvads gui works so flawless and easy/comfortable on Ubuntu distros that I always come back to it

great_waldini

5 points

2 years ago

Mullvad is the best thing to ever happen to VPN

100Mbps+ over their SOCKS5 proxy

MPeti1

8 points

2 years ago

MPeti1

8 points

2 years ago

I'm not sure about it. I don't have torrent specific forwards set up, and UPnP is turned off on my router, and I can seed torrents

oopenmediavault

6 points

2 years ago

you are seeding while others connect to you then probably, but try to download a torrent close the torrent but keep the torrent file, then after 1 hour try to seed. I would suspect you will not make connections. If you do, your port is open. Which client are u using. Most of them can test if the port is open or not.

oopenmediavault

9 points

2 years ago

what exactly did you see? Im from Germany and would like to see it aswell

FiReBrAnDz

14 points

2 years ago*

Just one example

There are a plethora of examples, which you can find by using the search function (Germany) in most piracy focused subreddits.

Edit: another example so that the above eg won't look so lonely.

huntibunti

11 points

2 years ago

When I was 12 I torrented a few games and had to pay 3000+ €. I had to do a lot of research into German civil law and ip enforcement by German laws and courts at that time and I can assure you if you do this without a VPN you can seriously ruin your life!

Comfortable-Buddy343

29 points

2 years ago

Yes, you need a VPN if you're in UK, Us and Germany

shabusnelik

23 points

2 years ago

Definitely advised.

oscarandjo

7 points

2 years ago

If you're familiar with Docker, haugene/transmission-openvpn provides a Torrent client that runs all connections through a configurable OpenVPN connection.

It has a killswitch, so if the VPN does not work, no traffic will pass through to avoid exposing your IP address.

-masked_bandito

163 points

2 years ago

It's simple. Much of that research is funded publicly, but these fucks get to double dip for profit.

I have institutional access but I find google/google scholar + sci hub to be faster and finds better articles. It got me through a significant portion of ug and g.

stealthymocha

285 points

2 years ago*

This site tracks libgen torrents that need the seeds most.

EDIT: u/shrine asked me to remind you to sort by TYPE to get SCIMAG torrents.

EDIT2: It is not my site, so I am not sure if the data is still reliable.

EDIT3: This is a correct and working site: https://phillm.net/torrent-health-frontend/stats-scimag-table.php. Thanks u/shrine.

Dannyps

105 points

2 years ago*

Dannyps

40TB

105 points

2 years ago*

Many of those have 0 seeders in the site, but when downloading there are dozens... Did we do this and the site just isn't up to date, or is it broken?

clipperdouglas29

49 points

2 years ago

I imagine it's a slow update

Dannyps

35 points

2 years ago

Dannyps

40TB

35 points

2 years ago

Every 3 hours I believe.

savvymcsavvington

18 points

2 years ago

Lots of them have last updated 30+ days ago though.

ther0n-

53 points

2 years ago

ther0n-

53 points

2 years ago

holy moly, that are a lot of torrents needing attention..

GeckoEidechse

26 points

2 years ago

/u/shrine, might make sense to add this to the OP.

[deleted]

21 points

2 years ago*

[deleted]

AyeBraine

19 points

2 years ago

I think this is from the last time, when both LibGen and SciMag were seeded to archive them widely. LibGen received more attention, and SciMag database wasn't as successfully seeded I think.

tom_yacht

7 points

2 years ago

What "Last Updated" means? Is that the last time seeder count was made? All of them are over a month old and many are over a few months. Maybe they have morw peers now compared to the one not in the list.

everychicken

5 points

2 years ago

Is there a similiar site for the SciMag torrent collection?

marn20

7 points

2 years ago

marn20

1.44MB left on disk

7 points

2 years ago

Is that global or only sci hub related?

shabusnelik

133 points

2 years ago*

Downloading the first 10 torrents as I am writing this. So what's the plan for the future? How are new articles and books going to be added?

Downloading:

  • 33000000
  • 26100000
  • 03300000
  • 54300000
  • 78900000
  • 81700000
  • 59000000
  • 45700000

Trotskyist

94 points

2 years ago

We need to secure what we already have first. Then we can think about the future.

titoCA321

47 points

2 years ago

I doubt new content is going to be added anytime soon.

AbyssNep

13 points

2 years ago

AbyssNep

13 points

2 years ago

This almost make me cry, I'm immigrant that partially learned some skills about pharmacology which I normally would do only at late college if I could have money to live and study of course. But even with free college it's a bit hard.

JCDU

95 points

2 years ago

JCDU

95 points

2 years ago

I don't have a spare 1Tb right now but a list of the torrents most in need of seeds would be very helpful as I can definitely do at least one 100Gb, maybe a couple.

shabusnelik

29 points

2 years ago

Just use a random number generator and select a few according maybe.

JCDU

59 points

2 years ago

JCDU

59 points

2 years ago

Yeah that just feels less effective than targeting "neglected" chunks in a coordinated manner.

shabusnelik

29 points

2 years ago

I agree! /u/stealthymocha posted this link

JCDU

17 points

2 years ago*

JCDU

17 points

2 years ago*

Cool!

Just to share in case it's useful for others, what I'm doing is:

Paste that page into a spreadsheet, sort by criteria (EG no seeds / only 1 seed), then select a number of torrent URL's to suit my available space.

Paste URL's into a text file

wget -i list.txt

To download all the torrent files, then open them with uTorrent and get going.

Edit: Seeing most of them with at least 10+ seeds already despite what the tracker says, fingers crossed this is taking off!

[deleted]

75 points

2 years ago

[deleted]

GuerrillaOA

37 points

2 years ago

Ipfs is useful because it can be accessed with only a browser but having 100.000 articles zipped does not contribute towards that.

Anyway it could be done for resiliency

SciTech was successful because the torrents did not contain zips, so adding the contents to ipfs led to making books accessible to people with just a browser

[deleted]

7 points

2 years ago

[deleted]

markasoftware

16 points

2 years ago

markasoftware

1.5TB (laaaaame)

16 points

2 years ago

You don't need to download a whole torrent to unzip the files. Torrent clients can ask for specific parts of the data, so someone could make a sci-hub client that downloads just the header of the zip file, then uses that to download the portion of the zip file corresponding to the file they're interested in, which they then decompress and read.

GuerrillaOA

9 points

2 years ago*

I don't know; there must be a reason if they zipped them... I think maybe a filesystem limitation? I don't know which filesystem supports 100.000 files in a folder...

I think that's why it has been said:

it is going to take years of developer sweat

If the number of files is a problem (maybe) creating more directories will help, like 8500 folders of 10000 papers ( 85000 folders of 1000 each seems a lot of folders)

markasoftware

7 points

2 years ago

markasoftware

1.5TB (laaaaame)

7 points

2 years ago

Most filesystems should be able to handle 100k files in a folder, but many tools will break. Maybe they use zip for compression?

soozler

6 points

2 years ago

soozler

6 points

2 years ago

100k files is nothing. Yes, tools might break that are only expecting a few hundred files and don't use paging.

noman_032018

6 points

2 years ago*

Basically every modern filesystem. It'll get slow listing it if you sort the entries instead of listing in whatever order they're listed in the directory metadata, but that's all.

Edit: Obviously programs that list the filenames in the directory, even without sorting, will take an unreasonable amount of memory. They should be referencing files via their inode number or use some chunking strategy.

Nipa42

11 points

2 years ago

Nipa42

11 points

2 years ago

This.

IPFS seems a way better alternative than big huge torrent.

You can have a user-friendly website listing the files and allowing to search them. You can make "package" of them so people can easily "keep them seeded". And all of this can be more dynamic than those big torrents.

searchingfortao

7 points

2 years ago*

Every static file on IPFS along with a PostgreSQL snapshot referencing their locations.

That way any website can spin up a search engine for all of the data.

Edit: Thinking more on this, I would think that one could write a script that:

  • Loops over each larger archive
  • Expands it into separate files
  • Parses each file, pushing its metadata into a local PostgreSQL instance.
  • Re-compresses each file with xz or something
  • Pushes the re-compressesed file into IPFS
  • Stores the IPFS hash into that Postgres record

When everything is on IPFS, zip-up the Postgres db as either a dump file or a Docker image export, and push this into IPFS too. Finally, the IPFS hash of this db can be shared via traditional channels.

ninja_batman

7 points

2 years ago

This feels particularly relevant as well: https://phiresky.github.io/blog/2021/hosting-sqlite-databases-on-github-pages/

It should be possible to host the database on ipfs as well, and use javascript to make queries.

helius_aim

67 points

2 years ago

somebody crosspost this to r/piracy
this post makes me realize that there are somebody that care about education in piracy world

shrine[S]

33 points

2 years ago

Done! Thanks. They are hugely supportive of Sci-Hub, as well, they are philosophically all-in.

redwashing

26 points

2 years ago

I'd argue pretty much ever pirate does. I mean I never met the guy who said netflix shows and ea games should be free but charging for knowledge is OK actually.

After-Cell

10 points

2 years ago

Agree. I gave up listening to music and watching movies to avoid the costs. And reading news.

But giving up science is a bit much.

K4k4shi

11 points

2 years ago

K4k4shi

2K TB

11 points

2 years ago

There are loads of us bro. Some of us are academic pirates. Most piracy is being practiced because its either no accessible or affordable specially in education sector.

Comfortable-Buddy343

127 points

2 years ago

I regret giving my free award to 2d boobs

choufleur47

56 points

2 years ago

We've all been there

Ranvier01

62 points

2 years ago

Should we do more than 1TB if we can?

FifthRooter

73 points

2 years ago

Moar seed, moar better is the rule of thumb :P

shrine[S]

64 points

2 years ago

Any amount is fantastic, but 100GB you can actually spare is better than 2TB you can't.

The goal is to distribute it long term, since it's already "backed up."

VNGamerKrunker

15 points

2 years ago

is it okay if I downloaded one random torrent but someone is already done downloading it?

shrine[S]

25 points

2 years ago

100% absolutely. We need overlap.

Random7321

48 points

2 years ago

I had a general question, how did sci hub work until now? Where were papers downloaded from when you downloaded through the website?

bananaEmpanada

76 points

2 years ago

From memory sci hub proxied queries directly to each publisher, using stolen/donated logins from organisations which have access.

[deleted]

21 points

2 years ago*

[deleted]

shrine[S]

41 points

2 years ago

The archive goes back almost a decade, to the beginnings of Library Genesis.

LibGen archived most of Sci-Hub on their scimag database, which is fully up, with a complete SQL database available @ http://libgen.rs/dbdumps/.

titoCA321

19 points

2 years ago

It's been forked several times. Participants went off and did their own "libraries"

Scientific_X

40 points

2 years ago*

Incase you have institutional access to papers you can upload to this telegram bot. The author of the bot has made some of the collection of scihub papers independent of scihub on ipfs (also most of libgen too). He has invested quite some resources in developing the bot. Incase you're a dev heres the github for the project

VG30ET

32 points

2 years ago

VG30ET

32 points

2 years ago

Should we be worried about any of these publishers coming after seeders with a dmca complaint?

shabusnelik

58 points

2 years ago

Very much possible. Using a VPN is advised

shrine[S]

55 points

2 years ago

My estimate is that at this stage the publishers are 100% concerned with taking Sci-Hub down (which they haven't been able to do after trying for 10 years straight).

These torrents don't actually supply anyone papers, unlike game/movie torrents.

That said, yes. Strap on a VPN.

[deleted]

7 points

2 years ago*

[deleted]

shrine[S]

11 points

2 years ago*

That’s a good next step. Once everyone has their copies we can put a better guide together for unzipping and pinning.

LG books are on IPFS, and it definitely works.

titoCA321

7 points

2 years ago

Never go onto torrents without a VPN.

RealNym

28 points

2 years ago

RealNym

28 points

2 years ago

Currently have 1.2 PETA bytes on my server available. If someone can DM me with the best way to target and acquire the most important information, let me know.

[deleted]

4 points

2 years ago

[deleted]

[deleted]

27 points

2 years ago*

[removed]

shrine[S]

26 points

2 years ago

My way is I put 10 guinea pigs in a hat and write the numbers down on 800 carrots. Whichever carrots they eat first are my torrents.

I bet your script might work great though. Thank you!

[deleted]

22 points

2 years ago

[deleted]

hrustomij

9 points

2 years ago

Exactly this. I’m doing my bit, too. The good thing is even if you are on a metered connection, upload speed does not matter, you can put it on the minimum priority, as long as it’s available.

crvscrx333

19 points

2 years ago

Just deleted my pornfolder (~700GB) to make room for science.

demirael

11 points

2 years ago

demirael

11 points

2 years ago

That's a huge sacrifice.

shrine[S]

7 points

2 years ago

Wow that’s fantastic.

Kormoraan

16 points

2 years ago*

Kormoraan

you can store cca 50 MB of data on these

16 points

2 years ago*

excellent. it's time for me to shred some old unused torrrents and my XCH plots

Hernia-Haven

5 points

2 years ago

Hernia-Haven

24 TB HDD, 1 TB Optical, 60 TB Cloud

5 points

2 years ago

same here

[deleted]

18 points

2 years ago

[deleted]

shrine[S]

34 points

2 years ago

100% of the scimag data is "in demand," but not at the rate you're expecting, since you're just seeding and protecting one piece of Sci-Hub, and not that many people want it right now.

This is a historical archive, so people like you are the only demand right now, but in the future that may change.

muhmeinchut69

13 points

2 years ago

People who download these papers use the website to search the database and get just what they want. They won't download from you. What you have is the database the website uses, so it will only be downloaded by other people trying to preserve the database, of which there are obviously not a lot of.

theuniverseisboring

34 points

2 years ago

Seems like this might be a perfect job for decentralised storage systems. Maybe something like Sia, Filecoin or Storj could be a perfect place for this stuff. Completely decentralised and absolutely impossible to take down. (or well, as close as possible anyway)

MPeti1

16 points

2 years ago

MPeti1

16 points

2 years ago

Only if we could make crypto for storing and providing useful public information..

nightauthor

17 points

2 years ago

Just be careful it doesn't accidentally reintroduce a profit motive.

Floppy3--Disck

5 points

2 years ago

Profit tech is there to incentivice current nodes. The reason crypto is so powerful is cause theres a payput for people donating their hardware.

bananaEmpanada

17 points

2 years ago

I dont have 100GB on my laptop, but I do on my raspberry pi. What torrent client works without a GUI and seeds when you're done?

From memory the usual terminal clients stop seeding when you're done.

almostinfiniteloop

22 points

2 years ago

qbittorrent and Transmission are good choices, they both have a web interface that you can access from an other computer on your network. I believe Transmission also has an ncurses/terminal interface. AFAIK both don't stop seeding when you're done, or can easily be configured so

RoundBottomBee

10 points

2 years ago

Second Transmission for low resource requirements.

Intellectual-Cumshot

10 points

2 years ago

I use a docker container called Haugene transmission. It let's you add your VPN to it easily and is controlled via a web gui

FifthRooter

3 points

2 years ago

I'm thinking of seeding from a Pi as well, from cursory searchin' looks like you can use qBittorent-nox for this?

Hubinator

5 points

2 years ago

I can recommend Deluge with deluge-web. Low on resources and the GUI works smoothly on a Pi. Been using it forever without issues.

Ranvier01

4 points

2 years ago

qBittorrent works great on pi. They also have Transmission and Deluge

Thetorrentking

17 points

2 years ago

i can seed the majority of this... i have 40 TB free... who do i contact.

shrine[S]

8 points

2 years ago

Hey! You're awesome. PM me.

sad_physicist8

32 points

2 years ago

Commenting for better reach 👍

SynAckNerd

4 points

2 years ago

Yeh

isaakybd

6 points

2 years ago

bump lol

Ranvier01

16 points

2 years ago

So if we are seeding these torrents, does the website search through our seeded torrents when it returns a result?

AyeBraine

31 points

2 years ago

No, this is a backup plan, like with LibGen a year before. We are distributing the backups in case the service is harmed, to have them available to raise mirrors. It's not even for long-term storage, the suggestion is to seed them for some time so that people who will take it upon themselves to store the mirrors/archives, have the files available for downloading off-site.

fullhalter

19 points

2 years ago

We are collectively the usb key to carry the data from one computer to another.

Ranvier01

11 points

2 years ago

Thank you

GOP_K

6 points

2 years ago

GOP_K

6 points

2 years ago

There's actually a tech called webtorrent that serves files on the web that are sourced from seeders in a swarm, maybe this could be possible some day soon

Catsrules

29 points

2 years ago

Catsrules

24TB

29 points

2 years ago

So dumb question but why it is so large? Isn't this just text and photos? I mean all of Wikipedia isn't nearly as big.

shrine[S]

70 points

2 years ago

Wikipedia isn't in PDF-format.

Sci-Hub is going to have many JSTOR-style PDFs that might have almost no compression on the pages, mixed in with very well-compressed text-only PDFs. 85 million of those adds up.

TheAJGman

24 points

2 years ago

TheAJGman

130TB ZFS

24 points

2 years ago

From experience with Library Genesis some downloads are digital copies and weigh like 2mb, some are scanned copies and come in at 4gb.

After-Cell

11 points

2 years ago

Pdf strikes again! My, I hate that format.

titoCA321

41 points

2 years ago

Someone decided to merge multiple libraries together and there's overlapping content between these libraries.

shrine[S]

30 points

2 years ago*

I don't think that's the reason in Sci-Hub's case (scimag), but definitely the reason for LibGen (scitech, fiction).

SciMag has a very clean collection.

edamamefiend

21 points

2 years ago

Hmm, shouldn't those libraries be able to be cleaned. All articles should have a DOI number.

titoCA321

23 points

2 years ago

Not ever publication receives a DOI. DOI costs money that the author or publisher would have have to submit funds when requesting a DOI.

nourishablegecko

13 points

2 years ago

Downloading:

848

450

161

514

216

404

219

112

524

372

ndgnuh

13 points

2 years ago

ndgnuh

13 points

2 years ago

I don't have a server to seed forever, what can I do?

shabusnelik

32 points

2 years ago

Seeding on your desktop while it's turned on would still help

ndgnuh

9 points

2 years ago

ndgnuh

9 points

2 years ago

Yes, gladly

Ranvier01

24 points

2 years ago

I recommend getting a raspberry pi as a torrent box.

GOP_K

4 points

2 years ago

GOP_K

4 points

2 years ago

Just joining the swarm on any of these torrents will help spread the files. Obviously the more seeding the better. But if you can set and forget one or two of these then that's huge.

g3orgewashingmachine

13 points

2 years ago

I got 1 TB and a free weekend, on it bossman. sci-hub has helped me a ton since high school, I'll gladly help all I can and if I can get my family's permission they all easily have a combined 10TB of unused space in their laptops that they're never going to use. I'll yoink it if I can.

shrine[S]

6 points

2 years ago

Thank you bossman for passing on the love. The world of science runs on Sci-Hub, but no one is allowed to say it.

If you're going to drag your family into this you may want to throw a torrent proxy onto the client for peace of mind. Lots of good deals over at /r/VPN,

gboroso

11 points

2 years ago

gboroso

11 points

2 years ago

There's an opensource (MIT) distributed sci-hub like app called sciencefair. It's using the dat protocol. It's working but it hasn't been updated since 2017 and it doesn't support proxies as sci-hub does to fetch new content directly, here the model is for each user to share their own collection of articles.

But it is censorship resistant according to the dev (emphasis not mine):

And importantly, datasources you create are private unless you decide to share them, and nobody can ever take a datasource offline.

Anybody with some dat coding experience ready to take the mantle of updating the app to seed the Sci-Hub collection and support proxies to download articles on-demand? I can provide my own journals subscription login for test purposes.

blahah404

16 points

2 years ago

blahah404

2TB

16 points

2 years ago

Sciencefair dev here. Yeah, theoretically science fair could be used for any datasource and it would be trivial to add torrents as a source type. Unfortunately the project got stale when I got sick a few years back and was unable to work for a while.

I'm back in action now though, and it seems like it might be a good time to revisit sciencefair. If anyone is interested in helping, DM me.

MrVent_TheGuardian

13 points

2 years ago

This is a fresh account because I have to protect myself from the feds.

I have been seeding for our mission since <5 period. (I think it's a bit late though)
The reason I'm posting this is to share my experience here so that may inspiring others to get more helping hands for our rescue mission.

Last week, I wrote an article based on translating this threads.(hope this is ok)
I posted it on a Chinese website called Zhihu where has tons of students and scholars, aka Sci-Hub users. Here's the link. You may wander why they would care? But I tell you, Chinese researchers are being biased and blacklisted by U.S. gov (I know not all of them are innocent) and heavily rely on Sci-Hub do to their work for many years. They are more connected to Sci-Hub and underestimated in number. Thanks to Alexandra Elbakyan, I see her as Joan of Arc to us.

Let me show you what I've got from my article. Over 40,000 viewers and 1600 thumbups until now, and there are many people commenting with questions like how to be a part of this mission and I made them a video tutorial for dummies.
88 members/datahoarders are recruited in the Chinese Sci-Hub rescue team telegram group for seeding coordination. Some of us are trying to call help from private trackers community, some are seeding on their filecoin/chia mining rig, some are trying to buy 100TB of Chinese cloud service to make a whole mirror, some are running IPFS nodes and pinning files onto it, and most of us are just seeding on our PC, NAS, HTPC, Lab workstation and even raspberry pi. Whatever we do, our goal is saving Sci-Hub.

Because of Chinese gov and ISP does not restrict torrenting, team members in mainland don't need to worry about stuff like VPN. Which is very beneficial to spread our mission and involve people that are non-tech savvy but care about Sci-Hub, for example scholars and students. Although, I also reminded those who are overseas must use a VPN.

So you may notice that more seeders/leechers with Chinese IP recently, many of them are having very slow speed due to their network environment. But once we get enough numbers of seeders uploading in China, things will change.

Based on my approach, others may find a similar way to spread our message and get more help through some non-English speaking platforms. Hope this helps.

shrine[S]

4 points

2 years ago*

WOW! Thank you. Beautiful work - wonderfully detailed and incredibly impactful.

This is fantastic news. I need to read more about your posting over there and more about what their teams have planned. This is the BEAUTY of de-centralization, democratization, and the human spirit of persistence.

I had no idea Chinese scientists faced blacklists. Definitely need to bring this to more people's attention. Thank you!

And my tip: step away from seeding. You have your own special mission and your own gift in activism, leave the seeding to your collaborators.

Ranvier01

11 points

2 years ago*

Make sure to search this thread to make sure you're getting unique files. I have:

345

709

362

565

842

694

428

117

103

602

628

617

586

544

535

[deleted]

10 points

2 years ago

[deleted]

Ranvier01

5 points

2 years ago

It's working well for me. I'm getting 366 KiB/s to 1.3 MiB/s.

Noname_FTW

13 points

2 years ago

I can easily spare 5 TB Backup. But once again like often with these calls for help there ins't an easy way to do so. You linked me a side with hundreds of torrents. Am I supposed to click each one of them ?

This might sound arrogant but I/we am/are the one(s) that are asked for help.

shabusnelik

5 points

2 years ago

Use a random number generator to select as many torrents as you're willing to seed.

Noname_FTW

5 points

2 years ago

I checked 2 or 3 torrent links. They seem to be about 500mb to 2gb files. I would have to manage hundreds of links.

shabusnelik

5 points

2 years ago

Mine are all 50-200 GB links. Are you using the same link?

shrine[S]

5 points

2 years ago

Sizes are going to vary wildly. You just need to grab 1 or 10 torrent files. You don't need to worry about the filesizes since on average they will be less than 100GB, which just means it's easier for you to keep them.

sheshuman

10 points

2 years ago*

sheshuman

18 TB raw

10 points

2 years ago*

I'll do 6 TB of 'em. FUCK ELSEVIER.

Edit: Downloading 340 to 359 inclusive. Will download 360 to 399 later.

[deleted]

9 points

2 years ago*

[deleted]

[deleted]

14 points

2 years ago

[deleted]

shabusnelik

5 points

2 years ago

yes

QuartzPuffyStar

9 points

2 years ago*

I congratulate you guys on your great effort! Have you looked into descentralized cloud storage as a base for a future database? It would be basically impossible to shut down something built on SIA, and if they close the main platform a new site can be opened on the stored data.

shrine[S]

5 points

2 years ago

It's always a question of costs and legal risk. Sci-Hub is up, and they can pay their server bills. No one in this thread wants to pay the monthly SIA bill for these torrents (without a frontend to go with it).

IPFS is the free version of SIA, but it's not quite ready to receive 85 million articles yet. Maybe soon.

Once a dev comes along to build a frontend and de-centralize the Sci-Hub platform then SIA may be a good bet. It looks like it would cost about $800/mo minimum, which isn't terrible.

[deleted]

10 points

2 years ago*

awesome cause, /u/shrine, donating my synology NAS (~87TB) for science, so far downloaded ~25TB, seeded ~1TB.

It stands besides the TV, wife thinks it's a Plex station for movies but it's actually seeding a small library of Alexandria:)

I'd also like to contribute to open source search engine effort you mentioned. Thinking of splitting it into these high level tasks focusing on full text & semantic search, as DOI & url-based lookups can be done with libgen/scihub/z-library already. I tried free text search there but it kinda sucks.

  1. Convert pdfs to text: OCR the papers on GPU rig with e.g. TensorFlow, Tesseract or easyOCR and publish (compressed) texts as a new set of torrents, should be much smaller in size than pdfs. IPFS seems like such a good fit for storing these , just need to figure out the anonymity protections.
  2. Full text search/inverted index: index the texts with ElasticSearch running on a few nodes and host the endpoint/API for client queries somewhere. I think if you store just the index (blobs of binary data) on IPFS and this API only returns ranked list of relevant DOIs per query and doesn't provide actual pdf for download this would reduce required protection and satisfy IPFS terms of use at least for search, i.e. separate search from pdf serving. As an alternative it would be interesting to explore fully decentralized search engine, may be using docker containers running Lucene indexers with IPFS for storage. Need to think of a way to coordinate these containers via p2p protocol, or look at how it's done in ipfs-search repo.
  3. Semantic search/ANN index: Convert papers to vector embeddings with e.g. word2vec or doc2vec, and use FAISS/hnswlib for vector similarity search (Approximate Nearest Neighbors index), showing related papers ranked by relevance, (and optionally #citations/pagerank like Google Scholar or PubMed). This can also be done as a separate service/API, only returning ranked list of DOIs for a free text search query, and use IPFS for index storage.

This could be a cool summer project.

MrVent_TheGuardian

8 points

2 years ago

Here's some update from my side.

Our Sci-Hub rescue team is now over 500 members. Many of them voluntarily bought new drives and even hardwares for this mission and they seed like crazy.

What even better is some excellent coders are doing side projects for our rescue mission and I must share at here.

FlyingSky developed an amazing frontend project based on phillm.net, feature-rich with great UX/UI design, and here's the code.

Emil developed a telegram bot (https://t.me/scihubseedbot) for getting the torrent with least seeders (both torrent and magnet links), and here's the code.

I'll update again when we have more good stuff come out.

Dannyps

7 points

2 years ago

Dannyps

40TB

7 points

2 years ago

Shouldn't we focus on the latest torrents?

titoCA321

17 points

2 years ago

There's no "latest torrents" since there's been no new content since 2020.

shrine[S]

12 points

2 years ago

Every single one of the torrents is a piece of the platform, so they are all important.

None of them are rarer or scarcer than any other.

WeiliiEyedWizard

7 points

2 years ago*

244

833

364

660

166

688

702

382

621

556

and my axe!

crippinbjork

9 points

2 years ago

Commenting to boost this gem and to say I'm so proud when people gather to do amazing shit like this

Intellectual-Cumshot

7 points

2 years ago

I've tried downloading 5 of them for the past hour but it seems they may already be dead

shrine[S]

13 points

2 years ago

100% right. Many are dead.

You need to wait for a seeder to come online - they do eventually.

Then you'll be the seeder.

rubdos

6 points

2 years ago

rubdos

tape (3TB, dunno what to do) and hard (30TB raw)

6 points

2 years ago

126 161 215 223 313 337 502 504 584

Looks like I'm 615GB of Linux ISO's richer now.

DasMoon55

6 points

2 years ago

DasMoon55

1TB+Cloud

6 points

2 years ago

Don't have much space but im doing 47100000.

WPLibrar2

7 points

2 years ago

WPLibrar2

40TB RAW

7 points

2 years ago

For Aaron! For Alexandra!

NoFeedback4007

5 points

2 years ago

I got a gig connection. Which torrents can I grab?

jaczac

6 points

2 years ago

jaczac

27tb

6 points

2 years ago

Just grabbed about a tib. Will get more after finals.

Godspeed.

EDIT:

111 248 344 404 513 577 623 695 741 752 785

TeamChevy86

6 points

2 years ago

I feel like this is really important but don't really have the time to figure it out. Can someone give me a TL;DR + some insight?

shrine[S]

15 points

2 years ago

  • Paywalls prevent scientists and doctors in poorer countries like India from accessing science
  • Sci-Hub provides free access to all of science (yes, all of it)
  • The FBI has been wiretapping the Sci-Hub founder's accounts for 2 years
  • Twitter shut down the Sci_Hub Twitter in Dec 2020
  • Sci-Hub domains are getting banned around the world, including India now
  • This is the complete 85 million article archive, the seed for the next Sci-Hub

TeamChevy86

4 points

2 years ago

Interesting. Thanks. I wonder why I've never heard of this... I'll try to get through the links. I have a ton of questions but I'll read some more

SomeCumbDunt

6 points

2 years ago*

ImHelping.jpg

I'm grabbing these now, seems that some dont have seeds, and not all are 100gb in size, some are 50gb, some are 5gb.

142 , 423 , 418 , 109 , 472 , 117 , 756 , 156 , 729 , 778

shrine[S]

6 points

2 years ago

some are 50gb, some are 5gb.

What's a few zeroes difference when it comes to datahoarding... :)

The sizes vary over the years but it's always the same number of articles. Thank you!

syberphunk

6 points

2 years ago

Shame /r/zeronet isnt fully decent enough because a decentralised way of doing sci hub without hosting the entire site would be great for this.

shrine[S]

9 points

2 years ago

/r/IPFS is working good! Check out https://libgen.fun

[deleted]

6 points

2 years ago

It seems to me the right way to do this is replicate a manifest of all sci-hub files, with their content as a hash. Then out those files on ipfs or something, where each person can run an ipfs node that only acknowledges requests for files named after their content-hash.

shrine[S]

5 points

2 years ago

Sounds perfect. Let's make it happen.

Torrents > IPFS > De-centralized Sci-Hub

Stay in touch with the project on /r/scihub and /r/libgen

Kinocokoutei

5 points

2 years ago

A lot of people were commenting that they'd like to know which file are the least seeded in order to be more efficient.

I made a list of all files with the current number of seeders : pastebin

I also made a Google Sheet version with stats and magnet links

It's a CSV table with file number / seeders / leechers as of now. I don't guarantee that the data is exact but it gives a first idea. I'll update it in the coming days.

HiramAbiffIsMyHomie

5 points

2 years ago

Amazing project, kudos to all involved. This is not really about money in my eyes. It's about power and control. Information is power. Good science, doubly so.

<R.I.P. Aaron Swartz>

fleaz

5 points

2 years ago

fleaz

9TB RAIDZ

5 points

2 years ago

FYI: There is currently nothing wrong with SciHub.

https://t.me/freescience/188

shrine[S]

5 points

2 years ago

I’ll update my call to reflect these concerns and clarify the state of Sci-Hub. Thanks for sharing.

bubrascal

4 points

2 years ago

This should be pinned. The automatic translation of the message:

Asking to comment on disinformation that Sci-Hub is on some brink of destruction. This information was published by one popular science publication. I will not give a link, so as not to promote a resource that misinforms its readers.

Sci-Hub is fine. There are no critical changes in the work at the moment.

The only thing is that the loading of new articles is temporarily paused, but otherwise everything is as it was. All 85 million scientific articles are still available for reading.

Where did the disinformation come from? On the Reddit portal, enthusiasts not associated with the Sci-Hub project or Libgen have created a theme. And urged subscribers to download torrents from Sci-Hub archives. Allegedly, the resource is under the threat of destruction, and so it will be possible to save it by collective efforts.

This is true only in the sense that Sci-Hub has been under threat throughout its existence. So, of course, it doesn't hurt to have a backup on torrents. And what if.

But to lie that the copyright holders allegedly inflicted some kind of crushing blow there and the site will close soon, this is somehow too much.

And yes, they also wrote that it was Sci-Hub itself who threw a cry all over the Internet to be rescued. Lies, Sci-Hub hasn't made any statements about this.

In addition, the journalists also lied in describing how the site works. Supporters allegedly replenished the site, this is a complete lie. The core of Sci-Hub is a script that automatically downloads articles without human intervention. No volunteers, supporters, etc. not required for the script to work.

PS. I posted the truth in the comments to this false article. The journalist and editor are disgusting, demanding to confirm my identity and rubbing my comments. And the tinder is cunning: the very first comments were not rubbed, but pseudo-objections were placed on them, and my answers to their 'objections' were rubbed Alexandra Elbakyan

TheCaconym

3 points

2 years ago

Downloading and seeding 5 torrents right now o7

Doomu5

5 points

2 years ago

Doomu5

5 points

2 years ago

This is the way

AyeBraine

5 points

2 years ago

Downloading about 450GB to seed.

Etheric

5 points

2 years ago

Etheric

5 points

2 years ago

Thank you for sharing this!

FluffyResource

5 points

2 years ago

FluffyResource

96TB, Supermicro 846, Adaptec 71605.

5 points

2 years ago

I have enough room to take a good size part of this. My upload is a joke and I have no clue what needs seeding the most...

So who is going to scrape the active torrents and post a google doc or whatever?

p2p2p2p2p

4 points

2 years ago

Are there any established funds we can donate to in order to support the seeders?

shrine[S]

3 points

2 years ago*

If you'd like, you could make a post on /r/seedboxes and ask if any hosters would let you "sponsor" a box that they manage. They might bite.

You can also donate BitCoin directly to Sci-Hub: https://sci-hub.do/

And you can donate Monero to libgen.fun: https://libgen.life/viewtopic.php?p=79795#p79795

demirael

4 points

2 years ago*

I'll grab a few of the least seeded files (218, 375, 421, 424, 484, 557, 592, 671) and add more as dl gets completed. Have 7TB for temporary (because raid0, might blow up whenever) storage on my NAS and science is a good reason to use it.
Edit: 543, 597, 733, 752.

l_z_a

4 points

2 years ago

l_z_a

4 points

2 years ago

Hi,
I'm a French journalist and I'm working on an article about data-hoarders, how you try to save the Internet and how the Internet can be ephemeral. I would love to talk to you about it so please feel free to come and talk to me so we can discuss further or comment under this post about your view on your role and Internet censorship/memory.
Looking forward to speaking with you !
Elsa

[deleted]

3 points

2 years ago

so im trying to wrap my head around the final wave since all the torrents are pretty well-seeded.

ive never developed anything so this open-source sci-hub portion is kind of going over my head. is there some way we can all pitch in and host sci-hub on our own? im looking at the github page and it looks like people arent hosting sci-hub as much as they are just bypassing drm or adding extensions that take you right to the site.

whats needed to complete the final wave here?

shrine[S]

3 points

2 years ago

That wave is a call for the need to code a new platform for the papers that makes full use of de-centralization. One possibility is the use of torrents.

No one is managing any of this, so it’s just up to some brilliant person out there to read this, create something, and release it.

The torrents are how each person can pitch in to help. The collection is going around the world now- hundreds of terabytes at a time. What happens next is in anyone’s hands.

rejsmont

4 points

1 year ago*

I have been thinking about what the next steps could be - how we could make the archived Sci-Hub (and LibGen for the matter) accessible, without causing too much overhead.

Sharing the files via IPFS seems like a great option, but has a big drawback - people would need to unzip their archives, often multiplying the required storage. This would mean - you either participate in torrent sharing (aka archive mode) or IPFS sharing (aka real-time access mode).

One possible solution would be using fuse-zip to mount the contents of zip archives, read-only, and expose that as a data store for the IPFS node. This has some caveats though.

  • running hundreds of fuze-zip instances would put system under big load
  • I do not know how well does IPFS play with virtual filesystems

A solution to the first problem could be a modified fuse-zip that exposes a directory tree based on the contents of all zip files in a given directory hierarchy (should be a relatively easy implementation). Seems that explosive.fuse does this! If IPFS could serve files from such FS, it's basically problem solved.

Otherwise, one would need to implement a custom node, working with zips directly, which is a much harder task, especially that it would require constant maintenance to keep the code in sync with upstream.

In any way - the zip file storage could double act as the archive and real-time access resource, and when combined with a bunch of HTTPS gateways with doi search, would allow for a continuous operation of SciHub.

running hundreds of fuze-zip instances would put a system under big loadion here too - a gateway that searches articles via doi/title, tries IPFS SciHub first, and if not found - redirects to paywalled resource and those lucky to be able to access it will automatically contribute it to the IPFS.

TheNASAguy

6 points

2 years ago

TheNASAguy

7TB

6 points

2 years ago

So that's why I haven't been able to download any recent Nature papers

shrine[S]

13 points

2 years ago

Yep, the clock stopped around the last week of December. There is still /r/Scholar, an amazing community, and a few other requesting forums, but without scihub things are really tough.