Skip to content

Commit b579765

Browse files
committed
Update site with 2.1 release blog
1 parent b829851 commit b579765

File tree

7 files changed

+60
-8
lines changed

7 files changed

+60
-8
lines changed

datafu-spark/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ This matrix represents versions of Spark that DataFu has been compiled and teste
1111
| 1.7.0 | 2.2.0 to 2.2.2, 2.3.0 to 2.3.2 and 2.4.0 to 2.4.3|
1212
| 1.8.0 | 2.2.3, 2.3.3, and 2.4.4 to 2.4.5|
1313
| 2.0.0 | 3.0.x - 3.1.x |
14-
| 2.1.0 (not released yet) | 3.0.x - 3.4.x |
14+
| 2.1.0 | 3.0.x - 3.4.x |
1515

1616
# Examples
1717

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
---
2+
title: Apache DataFu-Spark 2.1.0 Released
3+
author: Eyal Allweil
4+
license: >
5+
Licensed to the Apache Software Foundation (ASF) under one or more
6+
contributor license agreements. See the NOTICE file distributed with
7+
this work for additional information regarding copyright ownership.
8+
The ASF licenses this file to You under the Apache License, Version 2.0
9+
(the "License"); you may not use this file except in compliance with
10+
the License. You may obtain a copy of the License at
11+
12+
http://www.apache.org/licenses/LICENSE-2.0
13+
14+
Unless required by applicable law or agreed to in writing, software
15+
distributed under the License is distributed on an "AS IS" BASIS,
16+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
17+
See the License for the specific language governing permissions and
18+
limitations under the License.
19+
---
20+
21+
I'd like to announce the release of Apache DataFu-Spark 2.1.0.
22+
23+
In this release, Spark versions 3.0.0 to 3.4.2 are supported.
24+
25+
<br>
26+
27+
**Additions**
28+
29+
* Add dedupByAllExcept method (DATAFU-167). This is a new method for reducing rows when there is one column whose value is not important, but you don't want to lose any actual data from the other rows. For example if a server creates events with an autogenerated event id, and sometimes events are duplicated. You don't want double rows just for the event ids, but if any of the other fields are distinct you want to keep the rows (with their event ids)
30+
31+
* Add collectNumberOrderedElements (DATAFU-176). This is a new UDAF for aggregating and collecting data with a possibility of skew. For example if you want to create a list of top customers for a company. Using a window function would require sending all the data for a given company to the same executor. This method will filter rows out in the combiner stage.
32+
33+
**Improvements**
34+
35+
* Spark 3.0.0 - 3.4.x supported (DATAFU-175, DATAFU-179)
36+
* Expose dedupRandomN in Python (DATAFU-180)
37+
38+
**Breaking changes**
39+
40+
* The four deprecated classes in SparkUDAFs - MultiSet, MultiArraySet, MapMerge and CountDistinctUpTo have been removed. Instead of them, there are new versions which use the Spark Aggregator API.
41+
42+
<br>
43+
44+
The source release can be obtained from:
45+
46+
http://www.apache.org/dyn/closer.cgi/datafu/apache-datafu-2.1.0/apache-datafu-sources-2.1.0.tgz
47+
48+
Artifacts for DataFu are published in Apache's Maven Repository:
49+
50+
https://repository.apache.org/content/groups/public/org/apache/datafu/
51+
52+
Please visit the [Download](/docs/download.html) page for instructions on building from source or retrieving the artifacts in your build system.

site/source/docs/download.html.markdown.erb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Download - Apache DataFu
33
section_name: Getting Started
4-
version: 2.0.0
4+
version: 2.1.0
55
license: >
66
Licensed to the Apache Software Foundation (ASF) under one or more
77
contributor license agreements. See the NOTICE file distributed with

site/source/docs/spark/getting-started.html.markdown.erb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Apache DataFu Spark - Getting Started
3-
version: 2.0.0
3+
version: 2.1.0
44
section_name: Getting Started
55
license: >
66
Licensed to the Apache Software Foundation (ASF) under one or more
@@ -36,7 +36,7 @@ This matrix represents versions of Spark that DataFu has been compiled and teste
3636
| 1.7.0 | 2.2.0 to 2.2.2, 2.3.0 to 2.3.2 and 2.4.0 to 2.4.3 |
3737
| 1.8.0 | 2.2.3, 2.3.3, and 2.4.4 to 2.4.5 |
3838
| 2.0.0 | 3.0.x - 3.1.x |
39-
| 2.1.0 (unreleased) | 3.2.x and up |
39+
| 2.1.0 | 3.0.x - 3.4.2 |
4040

4141
<br>
4242
## Examples

site/source/docs/spark/guide.html.markdown.erb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Guide - Apache DataFu Spark
3-
version: 2.0.0
3+
version: 2.1.0
44
section_name: Apache DataFu Spark
55
license: >
66
Licensed to the Apache Software Foundation (ASF) under one or more
@@ -26,7 +26,7 @@ It has a number of useful functions available. This guide will provide examples
2626

2727
## Spark Compatibility
2828

29-
The current version of DataFu has been tested against Spark versions 3.0.0 - 3.1.3, in Scala 2.12. The jars have been published to the [Apache Maven Repository](https://repository.apache.org/content/groups/public/org/apache/datafu/). Other versions can be built by [downloading the source](/docs/download.html) and following the build instructions.
29+
The current version of DataFu has been tested against Spark versions 3.0.0 - 3.4.2, in Scala 2.12. The jars have been published to the [Apache Maven Repository](https://repository.apache.org/content/groups/public/org/apache/datafu/). Other versions can be built by [downloading the source](/docs/download.html) and following the build instructions.
3030

3131
## Calling DataFu Spark functions from PySpark
3232

site/source/layouts/_docs_nav.erb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
<ul class="nav nav-pills nav-stacked">
3030

3131
<li><a href="/docs/spark/guide.html">Guide</a></li>
32-
<li><a href="https://datafu.apache.org/docs/spark/2.0.0/">Scaladocs</a></li>
32+
<li><a href="https://datafu.apache.org/docs/spark/2.1.0/">Scaladocs</a></li>
3333
</ul>
3434

3535
<h4>DataFu Pig Docs</h4>

site/source/layouts/_footer.erb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
</div>
2525

2626
<div class="copyright">
27-
Copyright &copy; 2011-2024 The Apache Software Foundation, Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br>
27+
Copyright &copy; 2011-2025 The Apache Software Foundation, Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br>
2828
Apache DataFu, DataFu, Apache Pig, Apache Hadoop, Hadoop, Apache, and the Apache feather logo are either registered trademarks or trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a> in the United States and other countries.
2929
</div>
3030
</div>

0 commit comments

Comments
 (0)