Quantifying CO2 savings from wind power redux: Ireland 2012

The Sustainability Energy Authority of Ireland (SEAI) have a new report which looks at CO_2 emission savings from wind energy in 2012. The electricity grid is simulated using \frac{1}{2}-hourly system demand, known outages, and inter-connector flows as boundary conditions. One simulation is done using actual wind generation in 2012 while another sets wind generation to zero. The difference in total CO_2 corresponds to emissions savings. The SEAI study was carried out using off-the-shelf PLEXOS dispatch modelling software.

Here is a summary of SEAI’s findings (Table 4 p 31 of the report):

Rendered by QuickLaTeX.com

The savings for RoI is far lower than SEAI’s earlier number 0.49tCO2/MWh. In terms of “effectiveness”, 1 MWh of wind generation displaces the CO_2 equivalent of 0.65MWh of average thermal generation. The earlier SEAI number corresponds to approximately one-for-one displacement. So this is a big change.

SEAI’s simulation findings can be compared to results based on empirical estimates. This method (described here and here) uses the commercially metered generation data from SEMO to calculate grid CO_2 emissions. An ARMA model then relates these emissions to wind generation and other variables. Very good fits to the empirical CO_2 time-series can be obtained.^*

Here are findings using the empirical method for 2011:

Rendered by QuickLaTeX.com

The NI CO_2 savings number is much lower than found by SEAI. In fact, SEAI’s 2012 savings of -0.8tCO_2/MWh is hard to understand, because only \approx \frac{1}{3} of NI generation came from coal.

Simulation and empirical approaches each has advantages and disadvantages. Both are sensitive to imperfections in the wind generation/system demand dataset. However SEAI make a number of claims about the 2011 empirical method results which seem to me to be wrong. Firstly, system constraints and outages are automatically taken into account  in the empirical method. Secondly, despite the absence of pumped storage  and reduced inter-connector flow in 2011, emissions intensity was lower compared to 2012. This is because 2012 fuel prices favoured coal relative to gas. If anything the grid was less flexible in 2012.

^* System demand, tie-in flows between RoI/NI grids and Moyle interconnector flow are included as additional regression variables.


June 24, 2014 · joe · No Comments
Tags:  · Posted in: Uncategorized, Wind Energy

random forest or gradient boosting?

Random forest and gradient boosting are leading data mining techniques. They are designed to improve upon the poor predictive accuracy of decision trees. Random forest is by far the more popular, if the google trends chart below is anything to go by.

Correlation between predictors is the data miners’ bugbear. It is an inevitable fact of life in many situations. Multicollinearity can lead to misleading conclusions and degrade predictive power. A natural question is: Which approach handles multicollinearity better? Random forest or gradient boosting?

Suppose there are  n observations  \left\{ y \right\} and potential predictors  \left\{x_1 \cdots x_p \right\}. Assume that

(A)   \[ {y = x_1 +x_2 + \sigma\mathcal{N}} \]

where  \sigma is the amplitude of gaussian noise \quicklatex \mathcal{N} (mean zero and unit variance). Only 2 of  p potential predictors actually play a role in generating the observations. The  \left\{x_1 \cdots x_p \right\} are independently distributed (\quicklatex \mathcal{N}) with the exception of \quicklatex x_3 which is correlated with \quicklatex x_1  (correlation  \rho).

(B)   \[ x_3 =  \rho x_1 + \sqrt{1-\rho^2} \mathcal{N} \]

As the correlation  \rho increases, it becomes harder for a data mining algorithm to ignore \quicklatex x_3, even though \quicklatex x_3 is not present in (A) and it is not a “true” explanatory variable.



Variable importance charts for this class of problem show that gradient boosting does a better job of handling multicollinearity than random forest. The complex trees used by random forest tend to spread variable importance more widely, particularly to variables which are correlated with the “true” predictors. The simpler base learner trees of gradient boosting (4 terminal nodes in the above example) seem to have greater immunity from the evils of multicollinearity.

Random forest is an excellent data mining technique, but it’s greater popularity compared to boosting seems unjustified.

R code



February 11, 2014 · joe · No Comments
Posted in: Uncategorized