Skip to content

biddca22/morocco-smart-meter-analytics

Repository files navigation

Finding Patterns and Anomalies in Real Data

Abstract

This project analyses high-frequency electricity consumption data from four Moroccan cities to discover typical demand patterns, structural variations, and anomalies. Through unsupervised learning, dimensionality reduction, change-point detection, and ensemble anomaly methods, different residential, commercial, and industrial load profiles are identified. The results show pronounced spatial and temporal heterogeneity, providing valuable information for energy planning and network performance.

1 Introduction

Understanding electricity consumption behaviour is an important step in order to improve energy planning, infrastructure management, and demand forecasting. Morocco, for instance, has seen rapid urbanisation along with the presence of housing, businesses, and industrial activities, result in electricity demand becoming more varied and unpredictable across cities and zones. These patterns fluctuate not only by location—between different areas within the same city—but also temporally, as energy usage evolves in reaction to seasonal effects, socioeconomic activity, and upgrades in infrastructure. Recognising this variability requires data-driven methods that can uncover relevant structure from large-scale, granular electricity consumption data.

The main goal of this project is to characterise electricity consumption habits in both domestic and industrial zones throughout Morocco. In particular, this work aims to determine if distinct daily load profiles can be identified, such as continuous industrial demand or household patterns marked by evening peaks. It also explores geographic inconsistency by assessing whether zones within the same city display comparable consumption behaviour or form distinct subgroups. Additionally, this study intends to capture time-related trends by spotting meaningful changes in how electricity is consumed over time, including seasonal variations or sudden changes that may result from infrastructure improvements or operational adjustments.

To address these objectives, the analysis is set up using machine learning techniques. To classify daily load profiles, we used a robust unsupervised framework to compare K-Means, hierarchical clustering, and Gaussian mixture models. Because the data is high-dimensional, we implemented both linear (PCA) and nonlinear (t-SNE) projection techniques to display complex structural relationships. Moreover, we applied the PELT algorithm to detect temporal pattern shifts and structural volatility. And lastly, we created a combined anomaly detection system—merging isolation forest, Mahalanobis distance, and local outlier factor—using consensus voting to accurately pinpoint and analyse irregular consumption events.

2 Data Description

The electricity consumption data used in this project is sourced from smart meters deployed in four Moroccan cities: Laayoune, Boujdour, Foum Eloued, and Marrakech. The datasets were included as separate sheets in one data file and reveal differences in time period covered, level of detail, and measurement units. Laayoune, Boujdour, and Foum Eloued all span nearly 20 months of observations, from mid-September 2022 to late May 2024, whereas the Marrakech dataset extends over 12 months, from January 2023 to January 2024. Although we find this inconsistency, the common period in 2023 enables solid cross-city comparisons. After loading, the datasets contained 88,890 raw observations per city for Laayoune, Boujdour, and Foum Eloued; and 17,501 observations for Marrakech.

To ensure temporal consistency across all 4 cities, we applied 30-minute resampling. Since Laayoune, Boujdour, and Foum Eloued were initially recorded at a finer granularity, their data was resampled in order to match Marrakech, yielding 29,631 half-hourly observations per city during the aligned period. Start and end timestamps were carefully checked to confirm correct time synchronisation. Furthermore, minor issues with data quality were fixed: a low number of missing values were found in Laayoune (20), Boujdour (12), and Foum Eloued (28), accounting for an insignificant fraction of the more than 29,000 data points in each set. These gaps in the data were resolved using interpolation, a reliable approach given their scarcity and the continuous nature of power usage data. Marrakech didn't have any missing values.

Apart from temporal alignment, the datasets also vary in measurement units, with consumption recorded in kilowatts (kW) for Marrakech and in amperes (A) for the other cities. To tackle this heterogeneity and allow for joint analysis, we used Z-score normalisation. This allows subsequent clustering analyses to focus on consumption patterns (in the form of morning peaks or night drops) rather than absolute magnitudes.

image

The visualisation of scaled load curves above highlights this effect: although they come from different measurement units and show distinct raw scales, the standardised profiles fall within similar numerical ranges and reveal comparable time-based patterns. This preprocessing step creates a consistent and reliable foundation for the use of machine learning techniques throughout this project.

3 Methodology

To efficiently transfer the processed structures from the data preparation notebook to following analysis notebooks, we adopted Pickle to store serialised data. Finding the optimal way of classifying consumption patterns required a rigorous, multi-method strategy, rather than relying on a single algorithm.

Within the unsupervised learning framework, we used K-Means, hierarchical clustering, and Gaussian mixture models. For metric selection, we determined the number of clusters (K) through the elbow method (inertia), the silhouette score (measuring cohesion and separation), and the Calinski-Harabasz index (similar to the Fisher statistic; the proportion of variation between clusters relative to variation within clusters). To check structural patterns, we generated a dendrogram using Ward's method, which minimises the variance of merged clusters. As an alternative to hard clustering, we applied GMM to represent the data using probabilistic modelling, using the BIC to penalise model complexity and choose the most parsimonious model.

To validate our clusters and interpret high-dimensional patterns, we used both linear and nonlinear approaches for dimensionality reduction. We implemented PCA to project the daily profiles into a lower-dimensional space, reviewing the scree plot to determine information retention, and the factor loadings to understand what the principal axes mean physically (baseline consumption vs. day-night variation). To compensate for the shortcomings of linear projections, we added t-SNE, which maintains local neighbourhoods and reveals complex nonlinear structures.

This was additionally strengthened by time-series specific techniques like change-point detection, focused on finding structural instability and regime changes. To manage the computational demands and the high-frequency noise associated with 30-minute sampling, we started by aggregating the data into daily mean series. This resampling step acts as a low-pass filter, reducing intraday variability to isolate true long-term trends. For the core segmentation, we used the Pruned Exact Linear Time algorithm from the ruptures library, implementing a Radial Basis Function kernel. As opposed to traditional linear approaches, the RBF kernel enables identification of distribution-free changes in the signal without assuming any particular underlying probability density function, which is ideal to find complex shifts in electricity usage.

Defining how sensitive the detection algorithm should be required a process of hyperparameter tuning based on the data. We used the elbow method to determine the optimal penalty value—the parameter which controls the trade-off between goodness-of-fit and how many change-points are detected. By simulating penalties on a logarithmic scale (from 1 to 1000) and charting the resulting segmentation cost against the number of change-points, we located the threshold where adding more complexity to the model led to diminishing returns in error minimisation. To verify the reliability of these boundaries, we cross-validated the PELT results using binary segmentation with an L2 loss function. While PELT automatically finds the best number of change-points, binary segmentation was applied to impose a fixed quantity, which allows us to check if the main regime changes persisted across different algorithmic assumptions. Additionally, we performed a statistical post-processing step that calculated the mean and standard deviation of Z-scores for each identified segment, enabling the descriptive labelling of regimes based on numerical cutoffs.

Then, to evaluate structural stability at the regional level, we automated this pipeline for all cities and zones, with a consistent global penalty. By counting the number of validated change-points in each series, we obtained a series volatility indicator. This let us organise zones in a structured way not only by their consumption levels, but also by their stability over time—distinguishing between rigid, continuous loads and highly volatile, seasonal profiles.

And finally, to add to the analysis of long-term structural shifts, we developed a multi-method anomaly detection framework in order to find specific daily profiles that deviate considerably from standard behaviour. Knowing that different algorithms are sensitives to various types of irregularities, we built an ensemble of three unsupervised techniques, which we applied to the 48-dimensional daily data points. We started with the isolation forest algorithm, a tree-based approach that efficiently identifies global outliers by randomly dividing the feature space. Subsequently, we computed the Mahalanobis distance for each daily consumption profile; as opposed to the simple Euclidean distance, this multivariate statistical indicator accounts for the covariance between time steps, identifying days that are statistically extreme compared to the global mean distribution (with the $99^{\text{th}}$ percentile as threshold). Third, we used the local outlier factor with a neighbourhood of 20, a density-based method that assessed a profile's local density relative to its nearest neighbours to spot local outliers: points that might not be globally extreme, but are unusual within their specific cluster.

To reduce the risk of false positives associated with using single detectors, we combined these results through a consensus voting mechanism. We defined a robust anomaly as any daily profile marked by at least 2 of the 3 algorithms at the same time. This strategy acts as a credibility filter, rejecting method-dependent noise and only keeping events with strong, multi-method confirmation. To give physical meaning to these statistical outliers, we aggregated the robust anomalies to calculate an average anomalous profile, which we then compared to the standard baseline. This let us describe the patterns of grid disruptions—differentiating between anomalies due to sudden reactions (like blackouts) and those cause by extreme surges—and to methodically map the frequency of these severe events in the different cities and zones.

4 Results

4.1 Clustering

We start the analysis by determining the optimal number of clusters (K) to segment the 6,182 extracted daily profiles. Although initially reviewing the normalised profiles revealed substantial variability, establishing clear subgroups required consistency among statistical indicators. The elbow method showed a smooth curve with a slight bend between $\mathrm{K} = 3$ and $\mathrm{K} = 4$ , implying that increased complexity beyond this point produced smaller improvements in reducing variance. The hierarchical clustering dendrogram (Ward linkage) displayed a significant structural separation into 2 distinct branches, however we also observe a cut around a distance of 25 into 4 clusters. Further confirmation via Gaussian mixture models demonstrated that the BIC stabilised substantially—representing the best trade-off between model fit and complexity—at 4 components. Even though the Silhouette score favoured simply having 2 clusters and the Calinski-Harabasz index declined steadily, the convergence of the GMM and the local peak at $\mathrm{K} = 4$ in the silhouette score suggested that having 4 clusters provided a clearer physical interpretation of the different consumption behaviours (domestic vs. industrial).

image image image

After fixing the model to $\mathrm{K} = 4$ , four distinct "typical consumption" profiles emerged, successfully categorising the diverse energy habits throughout the region. Cluster 0 (lowest count of profiles) reflects a high-intensity profile, sustaining elevated consumption levels well above the average $(\mathrm{Z} > 1)$ during the majority of the 24-hour period, a typical feature of non-stop industrial production. Cluster 1 shows a low/baseload pattern, in which consumption stays steadily below the standardised mean $(\mathrm{Z} < 0)$ with minimal variation, likely reflecting empty buildings or low-intensity activity at night. On the other hand, Cluster 2 (largest count of profiles) illustrates the classic domestic pattern, characterised by lower usage during the day and a spike in the evening, peaking around 22:00, associated with household lighting and appliance usage. And lastly, Cluster 3 exhibits a clear daytime profile, marked by a double hump, rising during working hours and with a pronounced peak around 18:00, strongly pointing to business or office usage.

image

The geographic spread of these clusters confirms the heterogeneity of the studied zones. A comparison of clusters across cities shows that Laayoune is characterised by a lower presence in clusters 0 and 1, but is very prevalent in clusters 2 and 3. This dual dominance is highly indicative of a complete urban centre, which has dense residential usage as well as notable administrative and commercial activity. Meanwhile, with Boujdour, its influence is prominent in Cluster 2, and it hardly has any presence in Cluster 3, which signals a primarily residential profile with little business-hour consumption. As for Foum Eloued, it has an unusually high concentration of Cluster 1—which makes sense as it is a underpopulated place with low-intensity baseloads—and also heavily dominates Cluster 0 because it hosts continuous high-intensity operations (related to industry and infrastructure) that run nonstop for 24 hours. Finally, Marrakech displays a mixed profile, making moderate contributions to both commercial (Cluster 3) and baseload (Cluster 1) patterns, yet lacks presence in the domestic group. This uneven distribution corroborates with our hypothesis that electricity demand is spatially inconsistent. Instead of a uniform "city profile", we see that specific areas have low demand due to being in a place with low population density, whereas others follow a more cyclical, time-dependent rhythm.

image

4.2 Linear & Nonlinear Mapping

To validate how well-formed these clusters were, we employed dimensionality reduction techniques to view the high-dimensional data. PCA depicted that the first 2 components account for more than $74%$ of the total variance. As we can observe in the factor loadings, PC1 (roughly $60%$ variance) captures the overall level of consumption, differentiating the high-intensity Cluster 0 (far right on the projection) from the low-usage Cluster 1 (far left). Meanwhile, PC2 (around $14%$ variance) identifies the day-night usage contrast. The loadings for PC2 drop considerably during daylight hours; as a result, this component successfully distinguishes the daytime-oriented Cluster 3 (at the bottom of the projection) from the evening-focused Cluster 2. Mapping daily profiles onto these principal axes shows clear boundaries between groups, with the t-SNE visualisation also supporting that. The t-SNE plot, which maintains local nonlinear relationships, displays 4 distinct islands that don't overlap too much, proving that the clusters we've identified are not simply byproducts of the K-Means algorithm, but genuinely represent different electricity consumption behaviour intrinsic to the data.

image image

image image

4.3 Change-Point Detection

To add to the analysis of daily profiles, we explored the temporal stability of these usage behaviours using change-point detection. Since 30-minute fluctuations can hide longer-term structural changes, we began by aggregating the data into daily average series. This preprocessing step removed within-day volatility, allowing us to focus on actual regime changes, like seasonal effects or operational adjustments. To pinpoint these breaks, we employed the PELT algorithm with a Radial Basis Function kernel, a good technique to detect non-parametric shifts in complex signals. We calibrated how sensitive the algorithm was using the elbow method; by plotting the segmentation cost against the number of change points, we found a stabilising point, which implied an optimal penalty around 7. This calibration made sure that the model recognised meaningful structural changes without overfitting to minor noise.

image

To turn these change-points into useful findings, we statistically verified the segments that PELT determined. By computing the mean and standard deviation of the Z-scores for each separate segment, we categorised the varying usage patterns. As seen in the detailed validation for Laayoune, this approach is effective in distinguishing operational states: "Stable Normal" periods, where usage hovered near the historical baseline $(Z\approx 0)$ , and "Stable Peak" periods, where demand increased considerably $(Z > 0.5)$ and stayed high for months. This statistical assessment demonstrates that the detected change points represent real alterations in electricity consumption, rather than model-induced artifacts. This is probably caused by the starts of peak-demand periods or ongoing changes in baseline consumption needs.

image
Segment Statistics for Laayoune zone1
Start Date End Date Mean (Z) Std (Z) Description
2023-01-09 2023-07-23 -0.44 0.48 Stable Normal
2023-07-23 2024-01-08 0.51 0.59 Stable Peak

To cross-validate the structural shifts found by the optimal PELT configuration, we used binary segmentation as a comparative, greedy method that allows us to manually fix the number of change-points. With this constraint, which forces a simplified view of the time series, we aimed to check whether the main patterns persisted under different algorithmic conditions. The segmentation produced closely mirrored the optimal PELT results, accurately isolating the most significant features of the load curve. We found a pronounced drop in consumption (visible around the $170^{\text{th}}$ observation) and a distinct high-usage peak (near the $240^{\text{th}}$ observation). This consistency between PELT's penalty approach and binary segmentation's fixed constraint proves that these consumption regimes reflect genuine, underlying changes in electricity consumption behaviour rather than the results of specific parameter optimisation.

image

Applying the PELT methodology across all zones revealed notable geographic differences in structural consistency. The summary of detected change-points shows that some areas remain statistically stable and others experience regular pattern shifts. Foum Eloued has the most extreme polarisation: it includes zones (like Zone 7) with 0 identified change-points, reflecting a fully stable demand that matches the continuous industrial profile found in Cluster 0. In contrast, other zones in the same city recorded up to 6 change-points, suggesting highly variable consumption, likely resulting from seasonality or intermittent usage. As for Boujdour and Marrakech, they generally displayed moderate instability (clustering around 3 breaks), which indicates a consumption pattern following steady, expected seasonal cycles rather than unpredictable industrial shocks.

Structural Instability Summary
City Zone Num_ChangePoints Series_Length_Days
1 Laayoune zone2 6 365
13 Foum_Eloued zone6 6 365
9 Foum_Eloued zone2 4 365
8 Foum_Eloued zone1 4 365
4 Laayoune zone5 4 365
5 Boujdour zone1 4 365
15 Marrakech zone1 3 365
6 Boujdour zone2 3 365
2 Laayoune zone3 3 365
7 Boujdour zone3 3 365
3 Laayoune zone4 2 365
0 Laayoune zone1 1 365
10 Foum_Eloued zone3 1 365
12 Foum_Eloued zone5 1 365
11 Foum_Eloued zone4 1 365
16 Marrakech zone2 1 365
14 Foum_Eloued zone7 0 365
image

4.4 Anomaly Detection

To complement the analysis of long-term structural changes, we also looked at short-term irregularities by identifying anomalous daily demand patterns using a combination of multiple methods. By cross-referencing isolation forest (for global outliers), local outlier factor (local density differences), and Mahalanobis distance (multivariate statistical distance), we found a group of robust anomalies, which corresponded to days which were marked by two or more of the algorithms. This consensus strategy removed noise and only emphasised the important deviations. As seen in the comparison of daily mean profiles, the anomalous days we found diverge significantly from normal behaviour. While the average normal day follows the typical dual peak pattern we expect from routine activity, the average anomalous day has a drastic and inverted trajectory: consumption decreases sharply in the early morning (falling to Z-scores below -1.0), before rising to very high values in the late evening. This distinctive shape suggests that these anomalies aren't random noise—they are distinct events characterised by major outages and then recovery loads or operational peaks.

image

The geographic distribution of the robust anomalies reveals notable variation in the reliability and stability of the electricity network and its usage. We can observe that Laayoune was the main contributor to these inconsistencies, with the highest count of confirmed anomalous days (11), with Boujdour (9), and Foum Eloued (7) come in close second and third.

image

Yet, it's worth noting that, when looking at the Top 5 Most Extreme Anomalies table, ranked by Mahalanobis distance, we observe that while Laayoune has the highest frequency, Boujdour experiences some of the most statistically extreme individual events (e.g., Zone 2 in December 2023). This indicates that although Laayoune's grid faces irregular operations more routinely, probably due to its diverse administrative and commercial activity, Boujdour's predominantly residential network is vulnerable to sharper, more intense individual shocks. On the other hand, Marrakech appears to be very stable in this context, with minimal presence in the severe anomaly list, strengthening its description as a well-organised, predictable demand environment.

Top 5 Most Extreme Anomalies (Highest Mahalanobis Score)
City Zone Date Vote_Sum
2514 Boujdour zone2 2023-12-06 2
2193 Boujdour zone2 2023-01-19 1
1003 Laayoune zone3 2023-10-12 1
1323 Laayoune zone4 2023-08-29 2
6027 Marrakech zone2 2023-08-06 1

A more in-depth temporal breakdown of these anomalies in each city offers an important understanding of the timing of these deviations during the day. As we can see below, Boujdour presents a unique, sustained volatility where consumption stays higher than the norm for nearly the entire daylight period, spiking around noon and staying at a high level until midnight. This implies that anomalies in Boujdour are likely caused by extended periods of high demand, potentially related to specific daytime household demands or weather events. In contrast, Foum Laayoune exhibits a recovery anomaly shape, as they begin with very low Z-scores during the early hours, and then returns the baseline and goes above it by evening. This distinction is key, it identified Boujdour's anomalies as surges, while Laayoune's can be defined as drops or service interruptions.

image

Regarding the statistical agreement between the detection methods we used, it provides a reliability measure for these findings. The overlap analysis shows that while individual techniques (Mahalanobis distance, and local outlier factor, separately) each flagged 62 potential days, only 10 days were marked by both together. This relatively limited overlap highlights the complexity of the data—LOF identifies local density anomalies (unusual with respect to their immediate neighbours), while Mahalanobis finds globally extreme statistical deviations. By restricting our analysis to the robust set where methods agree, we successfully eliminated false positives and minor variations. The resulting group of 27 confirmed anomalies represents the most important events in the dataset—moments where the behaviour of the electricity network fundamentally diverged its typical patterns—providing a focus for further root-cause analysis regarding the reliability of Morocco's infrastructure or the impact of extreme weather.

5 Discussion

The main drawback of our unsupervised approach is the sensitivity of hyperparameter selection, particularly the compromise between mathematical optimality and interpretable results. In the clustering analysis section, while the silhouette score statistically favoured a simple two-way division, we opted for $\mathrm{K} = 4$ to better represent the complexity of the "business" profile. This choice emphasises a difficulty found in unsupervised learning: statistical metrics often value compactness over real-world applicability. In a similar way, within the change-point detection section, we selected the penalty term for the PELT algorithm by relying on the elbow heuristic. While this did help prevent over-segmentation, it's still a subjective boundary. A slightly lower penalty would have divided the consistent seasonal regimes into smaller, noisier segments, potentially resulting in false positives regarding infrastructure variability.

Another methodological constraint was the need to consolidate data to manage high-frequency noise. To properly implement change-point detection and decrease computational costs, we aggregated the 30-minute sampling resolution into daily mean sequences. While this smoothing process worked well for recognising long-term seasonal shifts, it also led to information loss regarding intra-day structural changes. For instance, if a zone had a structural change in peak timing (e.g., from 18:00 to 20:00), but maintained the same average daily electricity demand, our daily-mean strategy wouldn't detected this. In addition, the use of Z-score normalisation, although necessary in order to compare Marrakech with the other cities, hid the absolute size of events. An anomaly found in a low-voltage residential area is treated with the same statistical weight as a major industrial surge, overlooking the different economic impacts of these events on the real grid infrastructure.

It is also worth noting that, the anomaly detection phase exposed the challenge of using a single universal algorithm across heterogeneous datasets. We noticed significant discrepancies between the detection approaches. Specifically, the isolation forest algorithm struggled to find anomalies in the Marrakech data, likely due to the inconsistencies emerging as statistical outliers (which are captured better by Mahalanobis distance) rather than the structural separation that tree-based methods are very good at finding. This required the adoption of a consensus voting mechanism, which, although reliably, introduced a new layer of complexity: determining the appropriate voting threshold. As observed, a strict consensus requirement (2 votes) effectively eliminated noise, but masked valid anomalies found in Marrakech. This prompts a reconsideration of what constitutes a robust anomaly in different urban contexts.

6 Conclusion

This study demonstrated that Morocco's complex electricity consumption patterns can be separated into distinct, interpretable functional profiles, going beyond simple aggregate measures to offer useful operational information. By applying a robust unsupervised framework including K-Means and GMM, we showed that demand isn't geographically uniform—it is driven by four particular behaviours: continuous industrial operations, low-intensity baseloads, domestic evening peaks, and business daytime cycles. Validating these clusters through PCA and t-SNE proves that these patterns are inherent in the data rather than created by the algorithm. In practical application, this segmentation allows power grid operators to move from a generalised distribution approach to a zone-specific, targeted approach, where infrastructure planning is tailored to the main functional focus for each area: residential, commercial, or industrial.

The geographical spread of these profiles and irregularities offers a clear functional description for each studied city. Laayoune appears to be a fully developed urban hub, maintaining a balance between concentrated residential and commercial activity; however, its frequent recovery anomalies (sudden drops followed by returns to baseline) indicated a grid vulnerable to service interruptions or instability. Boujdour stands out as a strictly residential location, that is operationally steady, but prone to high-intensity events, experiencing anomalies as surges where demand jumps and persists at high levels, likely because of weather events or household strain. On the other hand, Foum Eloued is a case of extreme polarisation, serving as an industrial centre with zones of continuous, high-level consumption (Cluster 0) along with underdeveloped areas with minimal load. This distinction is vital for the management of the electricity network: while Boujdour needs spare capacity to handle spikes, Laayoune requires better reliability to prevent disruptions, and Foum Eloued demands consistent baseload support for non-stop industrial activity.

Finally, incorporating temporal analysis through PELT and consensus-based anomaly detection indicated that consumption stability is as variable as the actual demand. While some zones present stable normal behaviour, others are subject to considerable structural shifts and seasonal regime changes. Even though the methodology involved trade-offs, such as the loss of within-day resolution due to daily aggregation and the subjectivity of hyperparameter tuning, the use of a consensus voting mechanism for anomalies effectively removed noise to find real grid stress events. To sum up, this project proves that machine learning can successfully convert raw smart meter data into strategic information, allowing for better demand forecasting, more resilient infrastructure planning, and a deeper understanding of the socioeconomic dynamics of Moroccan cities.

About

Python-based analysis of high-resolution smart meter data across four Moroccan cities. Applying time series clustering, dimensionality reduction, change-point detection, and anomaly detection to identify consumption profiles and structural shifts. Includes data preprocessing and visualisation for energy demand insights.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors