Ahead of the Curve - September 7, 2016

TJ 2016 Best Research Paper: Mining Big Data to improve pulping and papermaking operations

TAPPI and the TAPPI Journal (TJ) Editorial Board have selected the 2016 TAPPI Journal Best Research Paper Award winning paper: “Leveraging mill-wide big data sets for process and quality improvement in paperboard production,” authored by Jianzhong Fu and Peter W. Hart. The paper appeared on p. 309 of the May 2016 issue. This research was recognized by the Editorial Board for its innovation, creativity, scientific merit, and clear expression of ideas among the eight nominated papers.

In their paper, Fu and Hart described how Big Data analytics were used to develop predictive models and critical insights regarding a long-term paper indents problem on a 1990s-era machine at the MWV (now WestRock) mill in Covington, VA, USA. The issue resulted in significant amounts of internal rejects and production downtime to the point that the machine was averaging just 50 percent of its capacity.

The three-month Big Data project at Covington, led by Hart with seven other full-time team members, involved selecting 6,000 operating variables from mill process logs and 60,000+ PI data points from one of the data historians, amounting to 9 billion data points during an approximately three-year period. The data were then cleaned and classified, and software was used to develop decision trees for root cause analysis that led to specific projects for the pulp mill and paper machine that ultimately addressed the indents issue.

Fu is senior technical engineer for WestRock in Covington, and Hart is director of Fiber Science for the company. Hart accepted the TJ Best Research Paper Award from TAPPI President and CEO Larry N. Montague at the PaperCon Awards Dinner on April 25 in Minneapolis. TJ recently spoke with Hart about the Covington mill’s use of Big Data analytics for the indents issues and in the future.

Peter W. Hart (left) receives the 2016 TAPPI Journal Best Research Paper Award from TAPPI President and CEO Larry N. Montague at the 2017 PaperCon Awards Dinner.

TJ: How long did it take you to narrow down the variables from over 60,000 to under 8,000? About how much data did you pull from the PI server as compared with what you collected from the manual mill logs?

Hart: It took the team around three weeks to select the 6,000 variables, and probably 80% of it came from the PI system. We then programmed 15 laptops to pull all the data from PI into Microsoft Excel, which was a convenient way to load it into the GE Intelligent Platforms Csense Troubleshooter to model the decision trees for analysis.

Your research paper noted that you relied a great deal on the in-depth process knowledge of your team for the data analysis. Had the people on your team been at the mill for a while?

We had a full range of people. We had young process engineers that had been at the mill two to three years, and then we brought back a few retired folks as consultants that had worked at the mill for over 30 years, along with my own industry experience as team lead. We also reviewed these steps with several other senior people within operations or who had extensive operational experience.

It was that no one particular thing caused the indents issue; it was always a combination of events, which is why we couldn’t diagnose the problem with traditional tools in the first place. To solve it, we had to open up some of the operational windows, like operating the bleach plant in a better manner.

Of all the tweaks we made, the bleach plant was one of the biggest, because the pulp mill was much more responsive to change than the paper mill was. There were some issues around the wet end of the paper machine that were fixed, but the majority were in the bleach plant and approach flow systems, such as giving operators a better window to respond to brownstock washer carryover. Still, there were days when the bleach plant was running well but the paper machine would be a bit off and the problems would occur even though pulp quality was good. All of these changes resulted in providing the machine with a larger operational window to be successful.

You mentioned that the analysis and manipulation of Big Data sets was very difficult. Was sifting through the variables the worst part?

Cleaning up bad data and identifying good and bad timeframes that met our established criteria were quite difficult. For instance, if there was downtime or a sensor went bad, the variables were recorded as zeroes, throwing off your global average. The PI system provided data from every minute of the three years examined in the project, and there were many data points like these that you couldn’t just randomly throw out.

We also had misunderstandings with less experienced team members doing the manual sifting work who weren’t exactly sure what we were looking for. When we identified time periods of good versus bad data, there was to have been no downtime 24 hours before the beginning of the data set. On the first pass of generating this data, we didn't always get that level of detail in what was declared a good period of time.

What could have helped with the analysis and manipulation of the big data sets?

This was one of our first forays into Big Data analysis. Since that time, the software has come a long way. We've been looking at alternative software packages that will allow us to substantially improve our capability for data manipulation.

Right now, we are in a four-month beta test of a new software, which sits on top of your PI system and allows you to directly pull data from PI and manipulate it. We actually think we could have done the Covington project in three to four weeks instead of three months if we’d had this newer, more powerful software that is friendly enough for mill process engineers to use on a routine basis. If this beta test goes as we think, we’ll consider it for all 13 of our chemical pulp mills.

My basic thought is we have absolute gold mines worth of information locked in our mill data historians, but we just don't know how to routinely turn the key and release that wealth of information so that it’s useful. We can't repeat the Covington process where it takes a dedicated team of eight a full quarter of the year to solve a single problem. We've got to be able to mine this data much faster. What this Big Data project did prove is that there’s tremendous value in the data for problem solving, but we just need better software to manipulate and understand it.

Monica Shaw is editorial director of TAPPI Journal, an online publication of relevant and timely peer-reviewed research delivered via email and free to all TAPPI members. For information on publishing your research in TAPPI Journal¸ please visit our submissions page or contact Shaw at mshaw@tappi.org.

For a modest investment of $174, receive more than US$ 1000 in benefits in return.
Visit www.tappi.org/join for more details.


			May 31, 2017

	Improving recovery boiler safety with a custom spout plug

· www.tappi.org · Subscribe to Ahead of the Curve · Newsletters · Ahead of the Curve archived issues · Contact the Editor		TJ 2016 Best Research Paper: Mining Big Data to improve pulping and papermaking operations By Monica Shaw TAPPI and the TAPPI Journal (TJ) Editorial Board have selected the 2016 TAPPI Journal Best Research Paper Award winning paper: “Leveraging mill-wide big data sets for process and quality improvement in paperboard production,” authored by Jianzhong Fu and Peter W. Hart. The paper appeared on p. 309 of the May 2016 issue. This research was recognized by the Editorial Board for its innovation, creativity, scientific merit, and clear expression of ideas among the eight nominated papers. In their paper, Fu and Hart described how Big Data analytics were used to develop predictive models and critical insights regarding a long-term paper indents problem on a 1990s-era machine at the MWV (now WestRock) mill in Covington, VA, USA. The issue resulted in significant amounts of internal rejects and production downtime to the point that the machine was averaging just 50 percent of its capacity. The three-month Big Data project at Covington, led by Hart with seven other full-time team members, involved selecting 6,000 operating variables from mill process logs and 60,000+ PI data points from one of the data historians, amounting to 9 billion data points during an approximately three-year period. The data were then cleaned and classified, and software was used to develop decision trees for root cause analysis that led to specific projects for the pulp mill and paper machine that ultimately addressed the indents issue. Fu is senior technical engineer for WestRock in Covington, and Hart is director of Fiber Science for the company. Hart accepted the TJ Best Research Paper Award from TAPPI President and CEO Larry N. Montague at the PaperCon Awards Dinner on April 25 in Minneapolis. TJ recently spoke with Hart about the Covington mill’s use of Big Data analytics for the indents issues and in the future. Peter W. Hart (left) receives the 2016 TAPPI Journal Best Research Paper Award from TAPPI President and CEO Larry N. Montague at the 2017 PaperCon Awards Dinner. TJ: How long did it take you to narrow down the variables from over 60,000 to under 8,000? About how much data did you pull from the PI server as compared with what you collected from the manual mill logs? Hart: It took the team around three weeks to select the 6,000 variables, and probably 80% of it came from the PI system. We then programmed 15 laptops to pull all the data from PI into Microsoft Excel, which was a convenient way to load it into the GE Intelligent Platforms Csense Troubleshooter to model the decision trees for analysis. *Your research paper noted that you relied a great deal on the in-depth process knowledge of your team for the data analysis. Had the people on your team been at the mill for a while?* We had a full range of people. We had young process engineers that had been at the mill two to three years, and then we brought back a few retired folks as consultants that had worked at the mill for over 30 years, along with my own industry experience as team lead. We also reviewed these steps with several other senior people within operations or who had extensive operational experience. What was the biggest discovery in your use of Big Data analytics? It was that no one particular thing caused the indents issue; it was always a combination of events, which is why we couldn’t diagnose the problem with traditional tools in the first place. To solve it, we had to open up some of the operational windows, like operating the bleach plant in a better manner. *Did the bleach plant see the most changes after your analysis?* Of all the tweaks we made, the bleach plant was one of the biggest, because the pulp mill was much more responsive to change than the paper mill was. There were some issues around the wet end of the paper machine that were fixed, but the majority were in the bleach plant and approach flow systems, such as giving operators a better window to respond to brownstock washer carryover. Still, there were days when the bleach plant was running well but the paper machine would be a bit off and the problems would occur even though pulp quality was good. All of these changes resulted in providing the machine with a larger operational window to be successful. *You mentioned that the analysis and manipulation of Big Data sets was very difficult. Was sifting through the variables the worst part?* Cleaning up bad data and identifying good and bad timeframes that met our established criteria were quite difficult. For instance, if there was downtime or a sensor went bad, the variables were recorded as zeroes, throwing off your global average. The PI system provided data from every minute of the three years examined in the project, and there were many data points like these that you couldn’t just randomly throw out. We also had misunderstandings with less experienced team members doing the manual sifting work who weren’t exactly sure what we were looking for. When we identified time periods of good versus bad data, there was to have been no downtime 24 hours before the beginning of the data set. On the first pass of generating this data, we didn't always get that level of detail in what was declared a good period of time. *What could have helped with the analysis and manipulation of the big data sets?* This was one of our first forays into Big Data analysis. Since that time, the software has come a long way. We've been looking at alternative software packages that will allow us to substantially improve our capability for data manipulation. Right now, we are in a four-month beta test of a new software, which sits on top of your PI system and allows you to directly pull data from PI and manipulate it. We actually think we could have done the Covington project in three to four weeks instead of three months if we’d had this newer, more powerful software that is friendly enough for mill process engineers to use on a routine basis. If this beta test goes as we think, we’ll consider it for all 13 of our chemical pulp mills. My basic thought is we have absolute gold mines worth of information locked in our mill data historians, but we just don't know how to routinely turn the key and release that wealth of information so that it’s useful. We can't repeat the Covington process where it takes a dedicated team of eight a full quarter of the year to solve a single problem. We've got to be able to mine this data much faster. What this Big Data project did prove is that there’s tremendous value in the data for problem solving, but we just need better software to manipulate and understand it. Monica Shaw is editorial director of TAPPI Journal, an online publication of relevant and timely peer-reviewed research delivered via email and free to all TAPPI members. For information on publishing your research in TAPPI Journal¸ please visit our submissions page or contact Shaw at mshaw@tappi.org. For a modest investment of $174, receive more than US$ 1000 in benefits in return. Visit www.tappi.org/join for more details.