The Sustainability Energy Authority of Ireland (SEAI) have a new report which looks at CO emission savings from wind energy in 2012. The electricity grid is simulated usingÂ -hourly system demand, known outages, and inter-connector flows as boundary conditions. One simulation is done using actual wind generation in 2012 while another sets wind generation to zero. The difference in total CO corresponds to emissions savings.Â The SEAI study was carried out using off-the-shelf PLEXOS dispatch modelling software.
Here is a summary of SEAI’s findings (Table 4 p 31 of the report):
The savings for RoI is far lower than SEAI’s earlier number 0.49tCO2/MWh. In terms of “effectiveness”, 1 MWh of wind generation displaces the CO equivalent of 0.65MWh of average thermal generation. The earlier SEAI number corresponds to approximately one-for-one displacement. So this is a big change.
SEAI’s simulation findings can be compared to results based on empirical estimates. This method (described here and here) uses the commercially metered generation data fromÂ SEMO to calculate grid CO emissions. An ARMA model then relates these emissions to wind generation and other variables. Very good fits to the empirical CO time-series can be obtained.
Here are findings using the empirical method for 2011:
The NI CO savings number is much lower than found by SEAI. In fact, SEAI’s 2012 savings of -0.8tCO/MWh is hard to understand, because only of NI generation came from coal.
Simulation and empirical approaches each has advantages and disadvantages. Both are sensitive to imperfections in the wind generation/system demand dataset. However SEAI make a number of claims about the 2011 empirical method results which seem to me to be wrong. Firstly, system constraints and outages are automatically taken into account Â in the empirical method. Secondly, despite the absence ofÂ pumped storage Â and reduced inter-connector flow in 2011, emissions intensity was lower compared to 2012. This is because 2012 fuel prices favoured coal relative to gas.Â If anything the grid was less flexible in 2012.
System demand, tie-in flows betweenÂ RoI/NI grids and Moyle interconnector flow are included as additional regression variables.
Random forest and gradient boosting are leading data mining techniques. They are designed to improve upon the poor predictive accuracy of decision trees. Random forest is by far the more popular, if the google trends chart below is anything to go by.
Correlation between predictors is the data miners’ bugbear. It is an inevitable fact of life in many situations. Multicollinearity can lead to misleading conclusions and degrade predictive power. A natural question is: Which approach handles multicollinearity better? Random forest or gradient boosting?
Suppose there are observations and potential predictors . Assume that
where is the amplitude of gaussian noise (mean zero and unit variance). Only 2 of Â potential predictors actually play a role in generating the observations. The Â are independently distributed ()Â with the exception of which is correlated with Â (correlation ).
As the correlation increases, it becomes harder for a data mining algorithm to ignore , even though Â is not present in (A) and it is not a “true” explanatory variable.
Variable importance charts for this class of problem show that gradient boosting does a better job of handling multicollinearity than random forest. The complex trees used by random forest tend to spread variable importance more widely, particularly to variables which are correlated with the “true” predictors. The simpler base learner trees of gradient boosting (4 terminal nodes in the above example)Â seem to have greater immunity from the evils of multicollinearity.
Random forest is an excellent data mining technique, but it’s greater popularity compared to boosting seems unjustified.