How often do people actually copy and paste from Stack Overflow? Now we know.

[Ed. note: While we take some time to rest up over the holidays and prepare for next year, we are re-publishing our top ten posts for the year. Please enjoy our favorite work this year and we’ll see you in 2022.]

They are saying there’s a kernel of fact behind each joke. Within the case of our recent April Fools gag, it may be extra like a whole cob, maybe a bushel of fact. We needed to embrace a traditional Stack Overflow meme and tweak one in all our core ideas. Our firm was impressed by the founders frustration with web sites that saved solutions to coding questions behind paywalls. What would the world seem like if we out of the blue determined to monetize the act of copying code from Stack Overflow?

Okay, jokes over, hope everybody had a great giggle and nobody acquired too freaked out. However wait, there’s extra. As soon as we arrange a system to react each time somebody typed Command+C, we realized there was additionally a chance to study how folks use our web site. We have been in a position to catalog each copy command made on Stack Overflow over the course of two weeks, and right here’s what we discovered.

You aren’t alone

One out of each 4 customers who visits a Stack Overflow query copies one thing inside 5 minutes of hitting the web page. That provides as much as 40,623,987 copies throughout 7,305,042 posts and feedback between March twenty sixth and April ninth. Individuals copy from solutions about ten instances as usually as they do from questions and about 35 instances as usually as they do from feedback. Individuals copy from code blocks greater than ten instances as usually as they do from the encircling textual content, and surprisingly, we see extra copies being made on questions with out accepted solutions than we do on questions that are accepted. 

So, in case you’ve ever felt unhealthy about copying code from our web site as a substitute of writing it from scratch, forgive your self! Why recreate the wheel when another person has achieved the arduous work? We name this data reuse – you’re reusing what others have already discovered, created, and confirmed. Data reuse isn’t a foul factor – it helps you study, get working code sooner, and reduces your frustration. Our entire web site runs on information reuse – it’s the altruistic mentorship that makes Stack Overflow such a strong neighborhood. 

You may stand on the shoulders of giants and use their prior classes discovered to construct new issues of worth. It is best to nonetheless comply with some primary greatest practices to forestall bugs or questions of safety from sneaking into your code when copying, so be sure you educate your self earlier than grabbing and pasting. And naturally, remember that some code requires a sure license to make use of. Past that, we encourage everybody to share in the advantages of what the neighborhood has created.

That’s the excessive degree TL;DR, however for folk who need a deep dive into all of the issues we discovered whereas learning the copy information, please learn on for some marvelous insights and charts from David Gibson, an information analyst on our product advertising and marketing workforce. If you wish to hear about how we constructed the software program modal and bodily keyboard behind our April Fools joke, take a look at the podcast beneath.


As somebody who has been unapologetically copying from Stack Overflow for years, I used to be not stunned to see the thousands and thousands of copy occasions rolling in. What did shock me was the variety of questions we might lastly reply. How many individuals actually are copying from Stack Overflow? Are folks simply copying code? Are folks extra prone to copy the accepted reply?

So as to add some course to the evaluation, the workforce and I got here up with an inventory of questions that we needed to reply. What began as a joke has snowballed right into a worthwhile exploration, producing new insights and sparking many inner conversations about how we will proceed to innovate our public platform and convey extra worth to Stack Overflow for Teams.

The info

Utilizing our homegrown internet monitoring device, we created customized occasions to seize when a person copied from the positioning. With these occasions we have been in a position to seize many various attributes; tags, query reply or remark, code block or plain textual content, copier popularity and submit rating, area, and if the submit was accepted or not. We just about captured the whole lot besides the precise textual content being copied.

We collected  information for 2 full weeks, from March twenty sixth 2021 to April ninth 2021. The next evaluation relies on the habits throughout that point.

Questions

Ben already talked about a number of the high-level stats that shortly proved what folks had lengthy joked about:  everyone seems to be copying from Stack Overflow. We additionally shortly realized that the general copy habits intently adopted what we already knew about our web site visitors. Most copies occurred through the work week and through working hours. Our largest geographies make up nearly all of copies; Asia 33%, Europe 30%, and North America 26%. Lastly, 86% of all copies got here from nameless customers, aka customers with 0 rep.

Issues began to grow to be extra attention-grabbing after we requested extra detailed questions on who was copying and what they have been copying.

Are increased rep customers copying extra?

To begin, we needed to see if our increased popularity customers are copying extra.

We will see that almost all of copies are coming from customers with 0 popularity. These are our nameless customers since you instantly get 1 rep by creating an account. It’s doable that a few of these copies are from customers with an account however are usually not logged in. Sadly, there may be not a approach for us to check this concept.

Because the majority of the customers on our platform have a decrease rep, let’s take away the groupings to see if we will normalize our information. By Depend of Copies Per Person as a substitute of Whole Copies, we will see the common variety of copies a person makes by their popularity.

When this visualization, it seems that as Status will increase, the Depend of Copies Per Person decreases. So the upper a person’s popularity, the much less usually they’re copying. This relationship is current however shouldn’t be very robust, so I’m not assured in saying both increased or decrease popularity customers copy extra. Builders who’re studying usually have a decrease popularity and are searching for issues that may speed up their studying and get them began shortly. As builders construct their experience, additionally they construct their popularity, and so they concentrate on extra exact challenges, issues that is probably not doable to repeat from Stack Overflow.

Are accepted posts copied extra?

After we consider an accepted reply, we might imagine it’s the greatest one, and infer it’s copied rather more than non-accepted solutions. Wanting on the information, nonetheless, we discover 52.4% of copies come from solutions that aren’t accepted. However on common, accepted solutions get seven copies per distinctive submit whereas non-accepted solutions get 5 copies per distinctive submit. So extra copies come from non-accepted solutions, however there may be increased information reuse from accepted solutions. At Stack Overflow, we outline information reuse as reusing what others have already discovered, created, and proved.

Reply accepted Whole copies Distinctive posts % Copies per submit
FALSE 18,773,517 3,934,860 52.44 5
TRUE 17,028,108 2,614,073 47.56 7

It’s value noting {that a} query could not even have an accepted reply.  Take this answer: it has virtually 4,984 up-votes and was copied 7,943 distinctive instances throughout our examine, however shouldn’t be accepted. Truly, not one of the solutions have been accepted. It may very well be as a result of the query poster has not been seen since 2010, but additionally most of the different solutions are legitimate.

Are increased scored posts copied extra?

So if accepted solutions are usually not copied extra, then solutions with a better rating solutions have to be copied extra, proper? Let’s discover out!

We see for Solutions it appears to be fairly evenly break up throughout our outlined rating groupings from 1 to 1000. As for questions, nearly all of copies are from posts with 1-5 factors. I believe that’s as a result of customers are copying the query to breed it and finally submit a solution.

Much like when person popularity, nearly all of posts on the positioning have a decrease rating. To normalize this, let’s have a look at the copies per submit.

We will plainly see that as a submit will increase in Submit Rating so does the Copies Per Submit. This is sensible as a result of as a submit will increase in rating it’s extra seemingly that the information is being reused by our neighborhood.

Do folks copy downvoted solutions?

However what about these blue dots with a destructive rating? Why would anybody copy down-voted solutions? Properly, we by no means need to decide a e-book by its cowl.

Check out this answer. It was our most copied down-voted reply with a rating of -2 and a complete of 288 copies. Wanting nearer, it seems to be a extra concise model of the accepted reply above it that has a rating of 29 and had a complete of 493 copies. Though our destructive rating submit didn’t have extra copies, it’s the excellent instance of a “too lengthy didn’t learn” submit.

What are essentially the most copied tags?

Now for the query I used to be most excited to reply: what tags are being copied essentially the most? Sadly, as a result of scale of the info and accessible sources, I used to be unable to parse out nested tags. For instance, the html tag won’t embrace posts throughout the |html|css| tag grouping.

High ten tags copied

To not my shock, the tags receiving essentially the most copies are a number of the hottest and lively tags on Stack Overflow. The one factor that jumped out to me is python seems in 4 of the highest tag groupings. Three of them are information analytics particular tag teams; |python|pandas|, |python|pandas|dataframe| and |python|matplotlib|. As an information nerd myself I like to see extra folks studying these instruments.

Tags Whole Copies Distinctive Posts Copies Per Posts
|html|css| 265,143 36,198 7
|javascript| 245,709 33,419 7
|python| 232,077 35,852 6
|python|pandas| 222,643 19,220 12
|javascript|jquery| 177,353 26,696 7
|python|pandas|dataframe| 146,731 7,728 19
|python|matplotlib| 138,404 8,045 17
|git| 135,480 9,682 14
|php| 117,373 20,771 6
|jquery| 111,454 15,058 7

High ten tags with most copies per submit

Along with trying on the tags with essentially the most copies, I needed to see what tags have the best copies per submit. Filtering for tags with at the least ten distinctive posts, we will plainly see as tags grow to be extra particular, they obtain extra Copies Per Submit.

Tags Whole Copies Distinctive Posts Copies Per Posts
|python|suppress-warnings| 5,031 10 503
|node.js|npm|npm-install|npm-start|npm-live-server| 4,925 11 448
|python|graph|matplotlib|plot|visualization| 12,650 29 436
|sql|sql-server|tsql|system-tables| 8,590 20 430
|home windows|cmd|localhost|port|command-prompt| 10,915 26 420

What are essentially the most copied posts?

Now to reply the query I’m positive lots of you have an interest in. What submit acquired essentially the most copies?

Reply with code block

With a submit rating of three,497 and 11,829 copies, I’m comfortable to announce that How to iterate over rows in a DataFrame in Pandas acquired essentially the most copies. Answered in 2013, this query continues to assist hundreds of individuals every week.

Reply plain textual content

As for essentially the most copied reply with plain textual content, we now have TypeError: this.getOptions is not a function [closed] with a submit rating of 218 and 1,570 complete copies. Though we have been unable to substantiate this I believe that the `sass-loader@10.1.1` is being copied.

Query code block

And essentially the most copied query with a submit rating of two,147 and three,665 copies, we now have How to create an HTML button that acts like a link?

Query plain textual content

Lastly, essentially the most copied query with plain textual content with a submit rating of 322 and 261 copies, we now have Updates were rejected because the tip of your current branch is behind its remote counterpart. This one is a bit difficult as a result of there are a handful of git instructions not in code blocks that would simply be the copied a part of the query. However as we’re not capturing the truly copied textual content, we can not affirm this.

Remark

It’s necessary that solutions are usually not the whole lot on Stack Overflow. Typically all you want is one helpful remark. Listed below are essentially the most copied feedback!

The primary remark is our most copied remark throughout the positioning, and the second remark is our “unsung hero” because it solely has a submit rating of 5 however was our sixth most copied remark.

UPDATE: There was numerous curiosity in buying an actual life model of our prank. The excellent news is we anticipated this would possibly occur and we’ve been engaged on one thing alongside these traces. Keep tuned for extra!

Tags: april fools, copying code, data science

More Posts