The Hechinger Report is a national nonprofit newsroom that reports on one topic: education. Sign up for our weekly newsletters to get stories like this delivered directly to your inbox.

arts education
High school art teacher Karen Ladd and her colleagues meet in Concord, N.H., to compare artwork created by their students for tests and to try to find common ground in scoring it. Credit: Sarah Butrymowicz

KINGSTON, N.H. — It sounded like an ordinary assignment for a visual arts class: Teacher Karen Ladd asked her freshmen at Sanborn Regional High School to research an artist, create a piece of art inspired by the artist’s work and then write a reflection about the experience.

Dressed in tank tops and shorts that heralded the arrival of summer weather, some students studied the assignment while others listened to headphones as they browsed for artists online. One girl begged to be allowed to use Bob Ross as her inspiration; another searched determinedly for paintings of bowling to use.

But this was no ordinary class project: It was a test.

This spring, with a six-district pilot, New Hampshire joined a small but growing list of at least a half-dozen states experimenting with large-scale arts testing. Educators prefer to call the new exams assessments, because they’re so different in form and format from traditional standardized tests. The goal, though, is to create a common “test” — often in the form of a project — that can be given to students in different classrooms across the state and used to help compare the performance of schools and districts.

But coming up with a uniform and efficient way to measure a subject that’s all about creativity is difficult. In its arts tests, Florida has incorporated multiple-choice and short-answer questions that are easy to score efficiently. New Hampshire and Michigan are trying something more ambitious: devising tasks that require a student to submit a finished piece of artwork or perform a piece of music. These tests are time-intensive to administer and grade, however, and the results are difficult to translate into a single numeric score.

The push to find the best way to test the arts is coming from arts educators themselves in many instances. They hope to foster not only student improvement, but also a sense that the arts are as valuable to curriculum and society as such long-tested subjects as math and reading.

“It’s very important for arts to be seen as a subject that can be and should be tested,” said Frank Philip, an arts education assessment consultant. “It’s a parity thing.”

Related: What the country can learn from Boston about bringing the arts back to public schools

Research has shown that arts education can improve student achievement in reading and math, as well as increase critical-thinking skills and engage students in school. One study by the National Endowment for the Arts found that low-income students who take multiple arts classes are significantly more likely to enroll in a four-year college.

“We’re all pretty glad that Monet and Da Vinci didn’t’t go to a school that said, ‘You need to [paint] in this way to meet a rubric.’”

Yet access to arts education remains unequal. A federal survey released in 2012, for instance, revealed that roughly 95 percent of the highest-income high schools offered visual arts courses, while only 80 percent of the lowest-income ones did. The same was true of music courses.

Scott Shuler, a consultant for Solutions Music Group and former president of the National Association for Music Education, hopes that including the arts in state testing systems will highlight these inequities. Disparities in opportunities to learn math and English “pale in comparison to the disparities in the opportunity to learn the arts,” he said.

While arts educators want the arts to be given equal weight in the curriculum, they understand that the arts can’t be treated the same way when it comes to testing. Multiple-choice questions might demonstrate if a student knows the quadratic formula or the timeline of World War II, but they can’t measure whether a student knows how to draw with perspective or keep a steady rhythm.

That’s why, when the National Assessment of Education Progress, or NAEP, included an arts test in 1997, it required students to produce real works of art in addition to answering standard multiple-choice questions. (That year’s test famously led to semitractor-trailers full of student-created clay bunnies; since then, efforts have been made to digitize work in photos or videos.) NAEP gave arts tests again in 2008 and 2016, but some experts expressed concern that they standardized the tests too much by, for example, over relying on the multiple-choice questions and requiring students to respond to music rather than perform it.

arts education
Elementary school art teacher Justina Austin holds up a student self-portrait that earned a 3 while talking to elementary school music teachers about the method she and a fellow teacher devised for scoring the artwork on a scale of 1 to 4. Credit: Sarah Butrymowicz

New Hampshire has eschewed multiple-choice questions altogether in favor of open-ended tasks that require students to make or perform something. “You want to create a task that allows kids to demonstrate in their own way what they know and can do,” said Marcia McCaffrey, an arts education consultant for the New Hampshire Department of Education. “An assessment can also allow kids creativity if it’s designed in the right way.”

The arts are just one of many subjects for which the state is developing so-called performance assessments — tests that are really multistep assignments that require students to solve a problem or produce something. English, math and science performance tests may someday be a mandatory part of the state’s accountability system; arts assessments will likely remain voluntary.

McCaffrey and a group of 11 teachers from around the state spent months developing the music and visual arts tests, which evaluate students on a scale of 1 to 4. Shortly after school let out in June, the teachers met in Concord to share their students’ test work and begin the complicated process of reducing subjective impressions to a single number. As they discovered, that’s not so simple.

Related: Do the arts go hand in hand with Common Core?

Elementary school art teachers Sarah Boudreau and Justina Austin commandeered a separate room and laid out about two dozen self-portraits drawn by their fourth-grade students. They needed to agree on a score of 1, 2, 3 or 4 for each piece, based on predetermined grading criteria, such as drawing skills and oil pastel blending technique.

Roughly 95 percent of the highest-income high schools offer visual arts courses; only 80 percent of the lowest-income ones do. The same is true for music courses.

“This one you thought could be a 1,” Boudreau said. “I thought 2.”

“I just thought the control was lacking,” Austin replied.

They paused over another piece that they had both awarded a 3, scoring guidelines in hand, and justified why they hadn’t marked the girl down despite the unrealistic placement of her eyes in the middle of her forehead. The scoring system calls for students to use “deliberate placement” — but who were Boudreau and Austin to say the choice was not an artistically deliberate one?

As the elementary school art teachers discussed whether they needed to tweak the grading system, music teachers in another classroom struggled to distill improvised student performances on the recorder into one of the four numbers. The guidelines called for rating the students on pitch, tone and rhythm.

They debated how much minor imperfections mattered. Could a student receive a 4 if they made any errors in tone, for instance? But the bigger issue for teacher Virginia Avery was how to combine those three separate measurements into an overall score. She worried not everyone would do that the same way.

“You’re just coming up with a number to fill a box, and that angers me,” she told her colleagues. “I don’t feel comfortable saying, ‘This kid is a 2.’ ”

Related: Massachusetts once had the best state test in the country. Will it again?

This kind of resistance is understandable, said Timothy Brophy, director of institutional assessment and professor of music education at the University of Florida. That’s because even the best predetermined scoring systems might not capture everything they need to capture.

“You want to create a task that allows kids to demonstrate in their own way what they know and can do. An assessment can also allow kids creativity if it’s designed in the right way.”

A possible solution to this problem, he says, is “consensus moderation,” in which a group of experts, such as practicing artists, get together to view an artwork or listen to or watch a performance. They discuss it and come to an agreement on a final grade. This process is more labor-intensive than scoring off a checklist, Brophy said, but more closely aligns with how artists really operate.

“We’re all pretty glad that Monet and Da Vinci didn’t go to a school that said, ‘You need to [paint] in this way to meet a rubric,’ ” he said.

Several teachers at the Concord meeting recognized this tension, but still felt enthusiastic about moving forward with the project. They’ll offer the revised assessments again this fall and once more meet to refine the scoring process before expanding the testing program to encompass more schools and students.

Ladd was one of four high school teachers who gave the assignment to create a work of art inspired by an artist. The four teachers easily agreed on small changes to the assignment: Students’ statements should be typed instead of handwritten; students should be required to submit their “inspiration” pieces so that scorers can tell how derivative their work is.

But the teachers debated whether or not to make more substantive changes. Ladd’s freshman students generally struggled more with the project than the older students taught by her peers did. High school seniors got stronger results overall — although plenty of students still received 1s and 2s for lack of creativity and poor craftsmanship or written statements.

Sarah Kiley, a teacher at Epping High School, argued that it was okay — even expected — for freshmen to score lower and for all students to be docked points if they fell short. Altering the task to get more passing scores wasn’t the answer. “We need that high level,” she said. “It’s going to be a challenge and they have to work for it.”

This story was produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education.

Unlike most of our stories, this piece is an exclusive collaboration and may not be republished.

The Hechinger Report provides in-depth, fact-based, unbiased reporting on education that is free to all readers. But that doesn't mean it's free to produce. Our work keeps educators and the public informed about pressing issues at schools and on campuses throughout the country. We tell the whole story, even when the details are inconvenient. Help us keep doing that.

Join us today.

Letters to the Editor

At The Hechinger Report, we publish thoughtful letters from readers that contribute to the ongoing discussion about the education topics we cover. Please read our guidelines for more information. We will not consider letters that do not contain a full name and valid email address. You may submit news tips or ideas here without a full name, but not letters.

By submitting your name, you grant us permission to publish it with your letter. We will never publish your email address. You must fill out all fields to submit a letter.

Your email address will not be published. Required fields are marked *