U.S. Copyright History 1923–1964

By Sean Redmond, Senior Developer/Front-End Lead
May 31, 2019

A little over a year ago, Greg Cram wrote about a pilot project we at NYPL were just beginning that aims to unlock the record of American creativity. At that point, he and our then-colleague Josh Hadro (now managing director of the IIIF Consortium) got the ball rolling, wrote a fair amount of initial documentation, and selected a vendor, DCL, to convert the first batch of volumes of the Catalog of Copyright Entries (CCE) from scanned images to parsed XML.

At the beginning of May, we completed the full run of the "Book" volumes of the CCE dating from 1923 to 1964. This gives us the best view, to date, of the number of books registered for copyright during this period in the U.S., as well as how many of these had their copyright renewed and extended.

Bar chart showing number of copyright registrations and renewals per year

See Chart Data below for raw numbers

The rough totals: 642,000 registered copyrights with 162,000, or 25%, renewed. Those renewed books are still in copyright today, and the copyrights on the other 480,000 books have expired and are (probably) in the Public Domain. These are initial results, so some important details and caveats follow.

Why Is This Important? Why 1923–1964?

In libraries, we like to have digital versions of books. We buy thousands of ebooks from publishers (which you can get for free with our SimplyE app); but we also have millions of older books in our research divisions that we would love to digitize and make more easily available to people doing research around the world.

During the term of copyright protection, the rights holder has the exclusive right to make and distribute copies of their book. When we want to make a digital copy at NYPL, we need either to have the rights holder's permission, or to rely on an exception or limitation in copyright law. You can go to Hathi Trust, or its non-library equivalent, Google Books, and search more books than you could ever read because indexing the content of an in-copyright book for search is considered fair use; however, presenting the full text may be an infringement.

For many years, the rule of thumb has been that any book published after 1923 was in-copyright (in the U.S.). It takes a bit of convoluted history to explain this date (but Sonny Bono appears at the end):

So, a book published in 1922 and renewed after its first 28-year term had a copyright that lasted until 1978 (1922 + 28 + 28). If a book was published in 1923 and renewed for a second term, it would have been just at the end of its copyright in 1978 when its term was extended to 75 years. It would have been about to enter public domain again in 1998, when this was extended to 95 years, through 2018. This is why January 1, 2019 was the first day that any books had entered the public domain in more than 20 years.

But what if the book wasn't renewed? After its first copyright term, a book published in 1923 became public domain in 1951. A book published in 1963 was subject to the same copyright law. If it wasn't renewed in 1990, it became public domain at the start of 1991.

For a long time, any book published before 1923 has surely been in the Public Domain and any book published after 1963 has positively been in copyright. Between those two dates though there is a more complex zone I'll call the Renewal Era. Of course, the lack of a renewal is not quite enough to say that something is no longer in copyright. As John Ockerbloom pointed out when I initially tweeted about these results, unrenewed books might "include previously published material still under copyright, or [have been] published abroad 1st & meet certain other URAA conditions."

Registrations and Renewals

Assuming, for simplicity's sake, that none of those considerations are relevant to a particular book, if it was published during the Renewal Era and not renewed, then it is in the public domain. To figure out if a book has been renewed, you turn to the Catalog of Copyright Entries, many dozens of thick volumes published every year until 1977 (after which the copyright records became electronic).

A large volume of the Catalog of Copyright Entries, several inches thick

The various volumes of the CCE contain registrations and renewals of every kind of copyrightable work including books, music, movies, artworks, and labels on commercial products. They are, in Greg Cram's words "one of the best records of American creativity." We're interested in all these things at the Library, but because books are relatively easy to digitize and use in digital form, we would like to know which ones are still in copyright and which aren't.

Since 2007, renewals have been in a searchable database at Stanford making it fairly simple to find books that have been renewed. Proving the negative, that something wasn't renewed, hasn't been as easy—typos, slight changes in titles, and other complications might cause a renewal search to fail. It has also been difficult to say what percentage of books have been renewed. Estimates, based on samples, have ranged from 7% to 33%.

While there is still plenty of work to do to clean up this data and understand some nuances of the entries, for the first time we have both ends of the copyright lifetime in a digital, ultimately searchable form for a full category of works, over a complete and continuous period of time. With the registrations now in digital form, not only do we have more information about the renewed books, we can also identify all those that do not have corresponding renewals.

What's in the Data

We are publishing the data in two repositories:

The bulk of the effort has been to convert book registrations from 1923 to 1964 into XML format. This includes Part 1, Group 1 (1923–1946), Part 1A (1947–1953), and Part 1 (1953–1964) of the CCE. In addition, we have created a new version of the renewals in tab-delimited format (the same information found in the Stanford database, but parsed differently to work more accurately with the registrations).

The renewal data contains both halves of Part 1 (Groups 1 and 2, Parts 1A and 1B) as well as their combined versions for 1950-1977, parsed from a transcription made by Project Gutenberg. For the years 1978 on, there are registrations for all classes taken from a version of the renewals exported from the Copyright Office database and hosted by Google.

Beginning with July 1953, the "Book" volume is Part 1, "Books and Pamphlets, Including Serials and Contributions to Periodicals." Prior to this, pamphlets, serials, and contributions to serials (and sermons, lectures, and many other things) were published separately as Group 2 or Part 1B, which are not included in this data yet. For the first half of 1953, there are about 8,200 entries from 3rd series, volume 7, part 1A, number 1; for the second half of the same year, there are more than 20,000 entries because 3rd series, volume 7, part 1, number 2 included everything that previously would have been published separately in part 1B.

Books and Not-Books

Every registration is assigned to a class as indicated by the letter prefix of its registration number: "A" for books, "B" for serials, "D" for dramas, etc. This nominally corresponds to the division into volumes so we would expect all the "D"s to be in the "Dramatic Compositions" volume (Part 1, Group 3, later Part 3). In practice this is not the case—Eugene O'Neill's A Moon for the Misbegotten, for instance, is included in Part 1, Group 1 (1952; DP1117) along with a few hundred other class "D" registrations.

We might wonder why DP1117 wasn't published in group 3 with the other "D"s or why, if it's more like a book somehow, it wasn't given an "A" number. It begs the question, though, are there any class "A" entries in Group 1 or Part 1A that someone might class as plays? I was able to find 100 entries that have "… a play in …" in the title, from Hilda; a play in four acts by Frances Guignard Gibbes (1923; A696442) to Seven devils from Magdala; a play in three acts.

Because of examples like this, I think it's fairly fruitless to try to determine what is a book or a "book proper" from the information in the CCE, so we have simply counted the contents of the volumes we have digitized. "A Moon for the Misbegotten" was renewed as were about 15% of the class 'D' entries in Group 1/Part 1A. The situation is worse with class "A" entries, where the not-very-well-held distinction between books (class "A") and non-books (classes such as "AA" and "A5") is partly erased after 1953. "AA" is done away with and presumably collapsed into "A". "A5" continued first as "B5" and then as "BB".

That said, inclusion in Group 1/Part 1A turns out to be a pretty good predictor of the kinds of things that tend to be renewed. If we look again at 1953, Part 1 Number 2, the second half of year with the two groups combined has 153% more entries than Part 1A Number 1 (20,811 vs. 8,217), but only 30% more of those are renewed (2,820 vs. 2,154). This implies something like a 5% renewal rate for Group 2/Part 1B entries. Many of those few renewed items may, in fact, be books. We recently learned that children's books, for instance, were routinely lumped in with "pamphlets."

Because of this change in the way the CCE was arranged, the count of renewals presented for 1953-63 must include some things that aren't "books". We also imagine some things that are "books" aren't counted for the years before 1953 because they are in Group 1/Part 1B, which we haven't converted yet. Also, because the count of unrenewed entries ("books" and "non-books") would be so much higher for 1953-63, I chose to estimate what would have been in part 1A if the 1A/1B distinction had continued. Non-renewed entries are estimated at 3.7 times the number of renewals. This is based on two generalizations: everything renewed is a book (close to true) and the 27% average renewal rate for 1946-1952 held for 1953-1964.

The only class of entries that has been excluded from the count are interim registrations (class "AI") since they would be an obvious source of undercounting or double counting, depending on how renewals are matched to registrations. Ultimately, what we really want to be able to do is count copyrights rather than entries by grouping interim (AI) and foreign (AF) registrations together with corresponding A entries as a single entity. A handful of entries in each volume is very complicated to parse and have also been ignored for now. These tend to be things like dozens of issues of Bell System technical bulletins and aren't particularly interesting for this analysis.

Further Work

Two obvious tasks lay before us: correcting the data and adding more data. Beyond that, I'm sure many people would like to see an online interface for exploring the entries. Linking the data both internally—entry to entry—and to external identifiers would make it really useful in the library world.

Correcting the Data

The XML files for the completed volumes of the CCE amount to 687 MB of data, all of which has been scanned, OCRed, keyed, and tagged so we expect a certain number of errors might occur at each step. We are focusing mostly on the accuracy of ID numbers so that registrations and renewals can be correctly paired; fortunately, there are things we can do to chase down many mistakes. For instance, within the new series or third series, registration numbers should be unique and duplicates can be investigated (the light printing of some pages make 0's, 3's, 6's and 8's especially difficult for OCR to distinguish). Frequently, the errors are typos in the CCE entries themselves.

Anyone who works with bibliographic data knows how difficult the many variations of authors' and publishers' names can be to deal with. Though the tagging of these fields is currently accurate enough to be very useful, this is probably the area most in need of correction. Even better would be to link authors and publishers to VIAF (Virtual International Authority File) and other identifiers.

We welcome correction from any source. If you think you have spotted an error, you can add an issue in the repository for registrations or renewals.

More Data

It is clear from the discussion above that, even if your interest is only books, the pre-1953 "pamphlet" volumes (Part 1 Group 2 and Part 1B) are still important. Beyond the books, the CCE covers every kind of creative endeavor and these volumes have a great deal of value as an historical record. Having a complete historical record, however, would mean converting not only the volumes for the years in which copyright is in question, but also the pre-1923 and post-1964 volumes. We are, at the moment, planning to do later volumes of Part 1, and would be happy to collaborate with anyone who wanted to take on any part of the CCE.

Linking Data

There are internal and external links that can be made. Links between registrations and renewals are explicit, but links between a registration and a previous interim registration, or to an original entry when new matter is being registered, are not always present.

Probably the most useful links would be between the registrations and equivalent records in other sources. Through 1937, the entries contain Library of Congress Control Numbers, which is a key to linking them to OCLC (Online Computer Library Center) records and Hathi Trust. It would be wonderful to have a way to make connections between these sources and entries from other years. Having an LCCN or OCLC number corresponding to a registration would make it easier to correctly link VIAF ids for authors and publishers, in order to make those searches more accurate.

Chart Data

Books are counted under the year of their registration rather than publication in the CCE. That is, a book with a 1950 registration date may be published in the 1950 volume of the CCE, but there is a good chance it appears in the 1951 volume, a smaller chance in the 1952 volume, and so on. Therefore, these numbers will not match the entry counts given in each printed volume since those are counts by publication rather than registration year.

Year
# Renewed
# Not Renewed
# Not Renewed (estimated)
Total
Percentage Renewed
1923
1593
7198
 
8791
18.12%
1924
1633
7819
 
9452
17.28%
1925
1796
8869
 
10665
16.84%
1926
1955
9436
 
11391
17.16%
1927
2185
10413
 
12598
17.34%
1928
2384
11822
 
14206
16.78%
1929
2697
11161
 
13858
19.46%
1930
2559
11844
 
14403
17.77%
1931
2726
10761
 
13487
20.21%
1932
2677
9880
 
12557
21.32%
1933
2495
8925
 
11420
21.85%
1934
2666
9454
 
12120
22.00%
1935
2875
9691
 
12566
22.88%
1936
2989
9939
 
12928
23.12%
1937
3201
9674
 
12875
24.86%
1938
3242
10020
 
13262
24.45%
1939
3109
8990
 
12099
25.70%
1940
3374
9068
 
12442
27.12%
1941
3451
7353
 
10804
31.94%
1942
3229
5896
 
9125
35.39%
1943
2814
5198
 
8012
35.12%
1944
2585
4868
 
7453
34.68%
1945
2444
5971
 
8415
29.04%
1946
2954
8751
 
11705
25.24%
1947
3583
9788
 
13371
26.80%
1948
3544
8901
 
12445
28.48%
1949
3568
9930
 
13498
26.43%
1950
4257
11122
 
15379
27.68%
1951
4255
11167
 
15422
27.59%
1952
4138
11920
 
16058
25.77%
1953
5160
 
13951
19111
27.00%
1954
5915
 
15992
21907
27.00%
1955
5984
 
16179
22163
27.00%
1956
5925
 
16019
21944
27.00%
1957
6731
 
18199
24930
27.00%
1958
6787
 
18350
25137
27.00%
1959
7256
 
19618
26874
27.00%
1960
7420
 
20061
27481
27.00%
1961
7503
 
20286
27789
27.00%
1962
8017
 
21676
29693
27.00%
1963
8740
 
23630
32370
27.00%
Total
162416
 
 
642206
25.29%