Update site with 2.1 release blog

eyala · eyala · commit b5797653c8fb · 2025-05-20T14:31:53.000+03:00
diff --git a/datafu-spark/README.md b/datafu-spark/README.md
@@ -11,7 +11,7 @@ This matrix represents versions of Spark that DataFu has been compiled and teste
 | 1.7.0 | 2.2.0 to 2.2.2, 2.3.0 to 2.3.2 and 2.4.0 to 2.4.3|
 | 1.8.0 | 2.2.3, 2.3.3, and 2.4.4 to 2.4.5|
 | 2.0.0 | 3.0.x - 3.1.x |
-| 2.1.0 (not released yet) | 3.0.x - 3.4.x |
+| 2.1.0 | 3.0.x - 3.4.x |
 
 # Examples
 
diff --git a/site/source/blog/2025-04-27-datafu-2-1-0-released.markdown b/site/source/blog/2025-04-27-datafu-2-1-0-released.markdown
@@ -0,0 +1,52 @@
+---
+title: Apache DataFu-Spark 2.1.0 Released
+author: Eyal Allweil
+license: >
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+---
+
+I'd like to announce the release of Apache DataFu-Spark 2.1.0.
+
+In this release, Spark versions 3.0.0 to 3.4.2 are supported.
+
+<br>
+
+**Additions**
+
+* Add dedupByAllExcept method (DATAFU-167). This is a new method for reducing rows when there is one column whose value is not important, but you don't want to lose any actual data from the other rows. For example if a server creates events with an autogenerated event id, and sometimes events are duplicated. You don't want double rows just for the event ids, but if any of the other fields are distinct you want to keep the rows (with their event ids)
+
+* Add collectNumberOrderedElements (DATAFU-176). This is a new UDAF for aggregating and collecting data with a possibility of skew. For example if you want to create a list of top customers for a company. Using a window function would require sending all the data for a given company to the same executor. This method will filter rows out in the combiner stage.
+
+**Improvements**
+
+* Spark 3.0.0 - 3.4.x supported (DATAFU-175, DATAFU-179)
+* Expose dedupRandomN in Python (DATAFU-180)
+  
+**Breaking changes**
+
+* The four deprecated classes in SparkUDAFs - MultiSet, MultiArraySet, MapMerge and CountDistinctUpTo have been removed. Instead of them, there are new versions which use the Spark Aggregator API.
+
+<br>
+
+The source release can be obtained from:
+
+http://www.apache.org/dyn/closer.cgi/datafu/apache-datafu-2.1.0/apache-datafu-sources-2.1.0.tgz
+
+Artifacts for DataFu are published in Apache's Maven Repository:
+
+https://repository.apache.org/content/groups/public/org/apache/datafu/
+
+Please visit the [Download](/docs/download.html) page for instructions on building from source or retrieving the artifacts in your build system.
diff --git a/site/source/docs/download.html.markdown.erb b/site/source/docs/download.html.markdown.erb
@@ -1,7 +1,7 @@
 ---
 title: Download - Apache DataFu
 section_name: Getting Started
-version: 2.0.0
+version: 2.1.0
 license: >
    Licensed to the Apache Software Foundation (ASF) under one or more
    contributor license agreements.  See the NOTICE file distributed with
diff --git a/site/source/docs/spark/getting-started.html.markdown.erb b/site/source/docs/spark/getting-started.html.markdown.erb
@@ -1,6 +1,6 @@
 ---
 title: Apache DataFu Spark - Getting Started
-version: 2.0.0
+version: 2.1.0
 section_name: Getting Started
 license: >
    Licensed to the Apache Software Foundation (ASF) under one or more
@@ -36,7 +36,7 @@ This matrix represents versions of Spark that DataFu has been compiled and teste
 | 1.7.0 | 2.2.0 to 2.2.2, 2.3.0 to 2.3.2 and 2.4.0 to 2.4.3 |
 | 1.8.0 | 2.2.3, 2.3.3, and 2.4.4 to 2.4.5 |
 | 2.0.0 | 3.0.x - 3.1.x |
-| 2.1.0 (unreleased) | 3.2.x and up |
+| 2.1.0 | 3.0.x - 3.4.2 |
 
 <br>
 ## Examples
diff --git a/site/source/docs/spark/guide.html.markdown.erb b/site/source/docs/spark/guide.html.markdown.erb
@@ -1,6 +1,6 @@
 ---
 title: Guide - Apache DataFu Spark
-version: 2.0.0
+version: 2.1.0
 section_name: Apache DataFu Spark
 license: >
    Licensed to the Apache Software Foundation (ASF) under one or more
@@ -26,7 +26,7 @@ It has a number of useful functions available.  This guide will provide examples
 
 ## Spark Compatibility
 
-The current version of DataFu has been tested against Spark versions 3.0.0 - 3.1.3, in Scala 2.12.  The jars have been published to the [Apache Maven Repository](https://repository.apache.org/content/groups/public/org/apache/datafu/).  Other versions can be built by [downloading the source](/docs/download.html) and following the build instructions.
+The current version of DataFu has been tested against Spark versions 3.0.0 - 3.4.2, in Scala 2.12.  The jars have been published to the [Apache Maven Repository](https://repository.apache.org/content/groups/public/org/apache/datafu/).  Other versions can be built by [downloading the source](/docs/download.html) and following the build instructions.
 
 ## Calling DataFu Spark functions from PySpark
 
diff --git a/site/source/layouts/_docs_nav.erb b/site/source/layouts/_docs_nav.erb
@@ -29,7 +29,7 @@
 <ul class="nav nav-pills nav-stacked">
 
   <li><a href="/docs/spark/guide.html">Guide</a></li>
-  <li><a href="https://datafu.apache.org/docs/spark/2.0.0/">Scaladocs</a></li>
+  <li><a href="https://datafu.apache.org/docs/spark/2.1.0/">Scaladocs</a></li>
 </ul>
 
 <h4>DataFu Pig Docs</h4>
diff --git a/site/source/layouts/_footer.erb b/site/source/layouts/_footer.erb
@@ -24,7 +24,7 @@
 </div>
 
 <div class="copyright">
-  Copyright &copy; 2011-2024 The Apache Software Foundation, Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br>
+  Copyright &copy; 2011-2025 The Apache Software Foundation, Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br>
   Apache DataFu, DataFu, Apache Pig, Apache Hadoop, Hadoop, Apache, and the Apache feather logo are either registered trademarks or trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a> in the United States and other countries.
 </div>
 </div>