Specialization for Scala 3 #18044
Replies: 4 comments 2 replies
-
|
It's an interesting scheme, though I think a worked-through example would help clarify some of the details and potential rough edges. In particular, I'm not sure that it interacts well enough with opaque types to be clearly superior to the tedious but somewhat-workable alternative of inlining-based specialization by hand. A great deal of the pointless boilerplate that was previously required by hand-based specialization can be avoided by judicious use of inlining. For instance, if you have sealed abstract class Thing[A, B] {
def foo(a: A, b: B): Thing[A, B]
}and final class ThingIntLong extends Thing[A, B] {
def foo(a: Int, b: Long): ThingIntLong = ...
}you can very often write an inline function final class ThingIntLong() extends Thing[A, B] {
def foo(a: Int, b: Long): ThingIntLong = fooImpl[Int, Long, ThingIntLong](a, b, this, (i: Int, l: Long) => ..., ...)
}This is still a lot of boilerplate. But it's far less boilerplate than manually specializing everything by hand, and it's very flexible, so it raises the bar for what specialization needs to offer. The key is, if you do it this way, if you have Anyway, at least for me, it's very hard to judge whether this type of specialization would be worth it without some examples that let us judge whether the assumption that we won't cross abstraction boundaries too much seems to be true in practice. For instance, one of the problems that image processing libraries tend to run into is that pixel-by-pixel access with the right specialized data type for the pixel, from at least two images at once, is essential for performance. With Scala 3, you can do that with inlining, but it isn't clear to me whether the above specialization scheme would allow it. |
Beta Was this translation helpful? Give feedback.
-
|
Yes, the way I look at it are two complementary things:
|
Beta Was this translation helpful? Give feedback.
-
|
any updates on this? actually boxing cause a lot of performance hit and memory allocation hit if lack of specialized, I'm trying out some HFT ideas, and need a way to eliminate the boxing given most of the calculation is pure numeric, and manually create specialized impl for all number types is not reliable. |
Beta Was this translation helpful? Give feedback.
-
|
The more I replace reliance on specialized, which is hard to predict, with inlined functions that range over primitives, the happier I am. It doesn't cover every conceivable scenario, but it covers a large enough fraction that one can manually create traits that cover the rest of the use cases when they're really important. Note that because of SAM adaptation, you can That was already in Scala 2, incidentally. But combined with inline functions in Scala 3, I really don't miss user-land specialization at all for my own high-performance code. It's faster than ever. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Specialization for Scala 3
Specialization is one of the few remaining desirable features from Scala 2 that's are as yet missing in Scala 3. We could try to port the Scala 2 scheme, which would be non-trivial since the implementation is quite complex. But that scheme is problematic enough to suggest that we also look for alternatives. A possible alternative is described in this note. It is meant to complement the proposal on inline traits. That proposal also contains a more detailed critique of Scala 2 specialization.
The main problem of Scala-2 specialization is code bloat. We have to pro-actively generate up to 11 copies of functions and classes when they have a specialized type parameter, and this grows exponentially with the number of such type parameters. Miniboxing tries to reduce the number under the exponent from ~10 to 3 or 4, but it has problems dealing with arrays.
Languages like C++, Rust, Go, D, or Zig avoid the proactive generation of all possible specializations by monomorphizing the whole program. This means we only need to generate a specialized version of a function or class if it is actually used in the program. On the other hand, a global monomorphization can lead itself to code bloat and long compile times. It is also a problematic choice for binary APIs.
This note discusses a different scheme to get specialization for Scala 3, which is somewhat between Scala 2's selective specialization and full monomorphization. As in Scala 2, specialized type parameters are tagged explicitly (but not with a annotation). But as for monomorphization, specializations are only generated if a specialized type is referenced in the program. To make this work efficiently, we need a way to transport information about possible specialization types through generic code (full monomorphization does not need that since it eliminates all generic code).
We do that using a type class
Specializedthat is typically used as a context bound on a type parameter of some class. It indicates that we want to create specialized versions of that class where the type parameter is instantiated to the type argument. The specialized versions offer optimization opportunities compared to the generic class.Example
As an example, consider a
Vecclass for vectors over a numeric type.The idea is that we want to specialize vectors on the type parameter
Tin order to get important efficiency gains, including the following:arrspecialized to the actual element instead of a fully generic array that has to be accessed via reflectionresultscalarProductNumericclass instance forT, so that calls tonum's methods have static targets and can be inlined.Terminology and Restrictions
A class such as
Vecwhich has some type parameterXthat comes with a context parameter of typeSpecialized[X]is called a specialized class, and its type parameterXis called a specialized type parameter.Restrictions:
legal if the class is translated to an inline trait.
The types which can be substituted for a
Specializedparameter areAnyRefor a primitive typeByte,Short,Char,Int,Long,Float,Double,Boolean,Unit. We call these types the specializable argument types.Besides
Specialized, there is also aSpecializedAttype class which allows to specify the specializable argument types explicitly. Given a context boundX: SpecializedAt[T_1, ..., T_n]the specializable argument types forXareT1, ..., Tn. As forSpecialized, these types must be primitive types orAnyRef.Examples:
[X: SpecializedAt[Long | Double | AnyRef]]areLong,Double, andAnyRef.[X: SpecializedAt[Int]]consist of justInt.(The precise definition of specializable argument types can be changed without changing the essence of the proposal, we keep it here for concreteness).
A Closer Look At Specialized
Specializedarguments can be inter-converted withClassTags. For instanceSpecializedcould be a subclass ofClassTagand an additional given could produce aSpecializedvalue from a classtag.Specializeddiffers fromClassTagmainly in that it has some additional fields that help make specialization efficient. For instance, it could carry a tag field that maps every specializable type to a valid index, and maps all other types to a sentinel number.A concrete implementation of
SpecializedandSpecializedAtcould be as follows:Note that the first
Sparameter ofSpecializedAtis a phantom type that is not used on the right hand side. In fact the type argument forSis only used to determine the specialized argument types for a type variable with theSpecializedAtannotation. When it comes to parameter passing, allSpecializedAtdictionaries are equivalent toSpecialized.Expansion of Specialized Classes
A general form of a specialized class modelled after
Vecis:We will use this class as an example to explain how specialized classes are compiled. In general, we can have
X,a,Specializedcontext parameters likes, each over a separate type parameter,bthat take specialized type parameters as arguments.A specialized class expands to several artifacts.
First, there is an inline trait with the parents and body as given in the class.
For instance, here is the inline trait for class
Cabove:Second, there are zero or more specialized versions of the class.The name of the specialized version is of the form
C$sp$NwhereCis the specialized class andNis a tag of a specialized argument type. We propose for now to use the same naming schemes as for Scala 2 specialization, but this aspect could be modified.The definition of the specialized version depends on whether the specialized argument
Tis final or not. IfTis final, we determine a canonical implementations of each typeclass argument overTusingIn this case, we assume that any specialized typeclass arguments like
bwill be the same whereeverC[T]gets instantiated. We test that this assumption is correct at least at the toplevel by comparing the class of any specialized instantiation argument with the class of the corresponding argument with whichC$sp$Twas first instantiated. Fixing type class arguments in this way creates important potential for optimizations when inlining traitCatT.If
Tis not final, we expand instead to:If
Tis not final, we can still specialize the class to an element type upper-bounded byT. This could be important in some cases. For instance, if classCis specialized atAnyRefand there are generic arrays of typeArray[X]inC, those arrays would be specialized to anArray[Object]representation. This is much more efficient than an unconstrainedArray[X]which needs reflection for element access.On the other hand, we cannot fix the type class argument
bsince different subtypes ofTmight have different instances, even if we assume global coherence.Third, there is a generic implementation of the class which is used for all type arguments that are not specializable. Here is the one for
C:Creating Specialized Versions
Scala erasure maps the type
C[T]toC$sp$SorC$sp$S[T]ifTconforms to a specializable argumentSforC. This step is important for performance. Without it, any interaction with classCwould go through the inefficient generic API ofCthat relies on boxing.To make this erasure work, the compiler needs to create a specialized version of
Cfor every specializable argument typeSwhereC[S]is referred to in a program. A compiler can scan its classpath to see whether a required specialized version already exists. If none exists, the specialized version of classCatSwill be created on the fly. This typically involves reading the inline trait forCfrom Tasty and expanding it in the freshly created classC$sp$S.Now consider a construction of a specializable class, say
new C[T](a)(using s, b). IfTis statically known to conform to a specializable typeS, we can rewrite this tonew C$sp$S(a)ornew C$sp$[T](a)(using s, b)depending on whetherSis final or not. IfTis a monomorphic type that does not conform to any specializable typeS, we rewrite tonew C$generic[T](a)(s, b)instead.That leaves type variables
Twhere the actual runtime instance is unknown at compile time. In this case we must still create the appropriate value of classC$sp$Tin the case where at runtime theSpecializedvaluesindicatesTis a specializableargument for
C. This is necessary, since otherwise it would not be sound to erase all occurrences of typeC[T]to their specialized versions.We do this by implementing
newexpressions as calls to a special factory method in the companion object of the specializable class. For instance, the new expression above would be rewritten toC.newInstance(s)[T](a, s, b), where the companion object ofCcontains a factory methodnewInstancedefined like this:The implementation of
newInstancedepends on whether we assume static linking or not.With static linking, the linker could generate code that maps every
Specialized[T]entry whereTis a specializable argument forCmentioned in the code to the functionSpecializedentries could contain consecutive tags for specializable types and a separate tag for other types, so this mapping could be a simple table lookup on the tag.If we don't assume static linking, we can get equivalent functionality using reflection. For instance, on the JVM, we can implement
newInstanceby callingClass.forNamewith the name of the specialized version corresponding to a type and the class loader oftrait
Citself. If such a class exists, we can then extract the calling function by getting a method handle on the constructor.If the specialized version does not exist we would instead return the creation method for the generic version of
C(this could happen if classes are created only from generic code, but never mentioned explicitly at the specialized type in the source). We can use hashes to make sure that any specialized version found matches the version of the inline trait it implements. We would cache these computations in a lambda factory so all calls after the first one are simple lambda invocations.AOT Specialization
Our scheme does specialization on demand. A type
C[T]gets specialized ifTis a specializable argument forCand the type is mentioned in the program (either explicitly or inferred).A typical use case is where a library defines specializable generic classes and then applications using that library would use these classes at specific argument types. Then the classes would be specialized on demand in the binary code of such applications.
We might also want to define some common specializations that are generated with the library and thus can be shared by all applications using them. To do this it is enough to mention the specialized type in the library at any point whatsoever.
As a convenience feature, we can define a method
Then libraries can force specializations by calling
specializedlike this:Discussion
Upsides of this proposal relative to Scala 2 specialization:
Numericwhich could be very expensive if compiled as is.Downsides:
Class.forNameor a linking step.The characteristics of the new scheme mean that it is ideally suited for larger classes such as
The larger the specialized code body is, the more likely it is that we will not cross abstraction boundaries requiring boxing, and the fewer lookups are needed to find the right constructor. On the other hand, for Scala 2 specialization, exponential code blow up makes all these usage scenarios unrealistic. So Scala 2 specialization is best used for small classes and traits such as functions and tuples.
Scala 3 already supports specialized functions in the Scala 2 style. It should not be too hard to support specialized tuples in that style as well. For most other things, the scheme described here looks like a better option.
Beta Was this translation helpful? Give feedback.
All reactions