Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/Community/Research.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,8 @@ This solution suffices in some applications, but for other applications the chun

## References

**[ABL+17]** Daniel Anderson, Pryce Bevan, Kevin J. Lang, Edo Liberty, Lee Rhodes, and Justin Thaler. A high-performance algorithm for identifying frequent items in data streams. In *ACM IMC 2017 (To Appear)*, 2017. [Preliminary paper](https://arxiv.org/abs/1705.07001).
**[ABL+17]** Daniel Anderson, Pryce Bevan, Kevin J. Lang, Edo Liberty, Lee Rhodes, and Justin Thaler. A high-performance algorithm for identifying frequent items in data streams. In *ACM IMC 2017*, 2017.
(dl.acm.org),(arxiv.org/abs/1705.07001).

**[AC+13]** Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff M. Phillips, Zhewei Wei, Ke Yi. Mergeable summaries. In *ACM Trans. Database Syst.* 38(4): 26:1-26:28, 2013

Expand Down
4 changes: 2 additions & 2 deletions docs/Frequency/FrequentDistinctTuplesSketch.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,9 +165,9 @@ When the Group is printed as a string, it will output seven columns as follows:
### Error Behavior

Note: the code for the following study can be found in the characterization repository
[here](https://github.com/DataSketches/characterization/tree/master/src/main/java/org/apache/datasketches/characterization/fdt) and the configuration file can be found [here](https://github.com/DataSketches/characterization/tree/master/src/main/resources/fdt).
[here](https://github.com/apache/datasketches-characterization/tree/master/java-base/src/main/java/org/apache/datasketches/characterization/fdt) and the configuration file can be found [here](https://github.com/apache/datasketches-characterization/blob/master/java-base/src/main/resources/fdt/FdtAccuracyJob.conf). A login to GitHub will be required.

In order to study the error behavior of this sketch a power-law distribution with a slope of -1 was created. The head of the distribution was a single item with a cardinality of 16384, and the tail of the distribution was 16384 items each with a cardinality of one. All the points inbetween were items that have multiplicities and cardinalities that would fall on a straight line plotted on a Log-X, Log-Y graph. This generated an input stream of about 850K (Key, value) pairs, which was input into the sketch and is considered one trial. The sketch was constructed with a target
In order to study the error behavior of this sketch a power-law distribution with a slope of -1 was created. The head of the distribution was a single item with a cardinality of 16384, and the tail of the distribution was 16384 items each with a cardinality of one. All the points in between were items that have multiplicities and cardinalities that would fall on a straight line plotted on a Log-X, Log-Y graph. This generated an input stream of about 850K (Key, value) pairs, which was input into the sketch and is considered one trial. The sketch was constructed with a target
threshold of 1% and a target RSE of 5%.

Twenty such trials were run and the error distribution quantiles of the results were computed and is shown in the following graph.
Expand Down
5 changes: 3 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ under the License.

<properties>
<!-- UNIQUE FOR THIS JAVA COMPONENT -->
<org-json.version>20231013</org-json.version> <!-- 20231013 -->
<org-json.version>20230227</org-json.version> <!-- was 20231013 -->
<!-- END:UNIQUE FOR THIS JAVA COMPONENT -->

<!-- Test -->
Expand Down Expand Up @@ -403,7 +403,7 @@ under the License.
This profile is only active when the property "m2e.version" is set,
which is the case when building in Eclipse with m2e.
The ignore below tells m2eclipse to skip the execution.
-->

<profile>
<id>m2e</id>
<activation>
Expand Down Expand Up @@ -442,6 +442,7 @@ under the License.
</pluginManagement>
</build>
</profile>
-->

<profile>
<id>strict</id>
Expand Down
3 changes: 3 additions & 0 deletions src/main/java/org/apache/datasketches/ByteArrayBuilder.java
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ public class ByteArrayBuilder {
private int count_ = 0;
private int capacity_;

/**
* Constructor, no arguments
*/
public ByteArrayBuilder() {
this(1024);
}
Expand Down
2 changes: 2 additions & 0 deletions src/main/java/org/apache/datasketches/Files.java
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ public final class Files {
private static final String LS = System.getProperty("line.separator");
private static final byte CR = 0xD;
private static final byte LF = 0xA;

/** DEFAULT_BUFSIZE */
public static final int DEFAULT_BUFSIZE = 8192;

// Common IO & NIO file methods
Expand Down
Loading