Skip to content

Releases: google/highway

1.2.0

31 May 17:04
Compare
Choose a tag to compare
  • Add InterleaveEven/InterleaveOdd, BitShuffle, GatherIndexNOr

  • Add IsNegative, IfNegativeThenElseZero, IfNegativeThenZeroElse

  • Add NEON_BF16, HWY_VERSION_GE/LT, HWY_EXPORT_T/HWY_DYNAMIC_DISPATCH_T

  • Add PromoteInRangeTo/ConvertInRangeTo/DemoteInRangeTo

  • Add Rol/Ror, RotateLeft/RotateLeftSame/RotateRightSame

  • Add SatWidenMulPairwiseAccumulate, SatWidenMulAccumFixedPoint

  • Add stats.h, bit_set.h, IsEitherNaN

  • Add UI8/UI32/UI64 MulHigh, I64 MulEven/MulOdd/Mul128

  • Add WidenMulAccumulate, MulEvenAdd, MulOddAdd

  • contrib/bit_pack: support 32/64-bit lanes

  • contrib/math: Add Exp2, Hypot

  • contrib/matvec: Add MatVecAdd

  • contrib/sort: Add VQ/HeapSelect, partial sort

  • contrib/topology: add affinity, detect topology/cache size/CPU name

  • Enable runtime dispatch for NEON/RVV, bazel modules, abort handler

  • Remove DASSERT for negative Gather indices

  • Support opting out of GUnit dependency

  • Use SPR/ZEN4 bf16 dot product

  • Known GCC 13 RVV issue: parts of sort_test and bit_pack_test disabled

  • Known Clang RVV/QEMU issue: incorrect rounding mode in upper/lower halves

1.1.0

18 Feb 01:33
Compare
Choose a tag to compare
  • Add BitCastScalar, DispatchedTarget, Foreach
  • Add Div/Mod and MaskedDiv/ModOr, SaturatedAbs, SaturatedNeg
  • Add InterleaveWholeLower/Upper, Dup128VecFromValues
  • Add IsInteger, IsIntegerLaneType, RemoveVolatile, RemoveCvRef
  • Add MaskedAdd/Sub/Mul/Div/Gather/Min/Max/SatAdd/SatSubOr
  • Add MaskFalse, IfNegativeThenNegOrUndefIfZero, PromoteEven/OddTo
  • Add ReduceMin/Max, 8-bit reductions, f16 <-> f64 conversions
  • Add Span, AlignedArray, matrix-vector mul
  • Add SumsOf2/4, I8 SumsOf8, SumsOfAdjQuadAbsDiff, SumsOfShuffledQuadAbsDiff
  • Add ThreadPool, hierarchical profiler
  • Build: use bazel_platforms
  • Enable clang16 Arm/PPC runtime dispatch, F16 for GCC AVX3_SPR
  • Extend Dot to f32*bf16, FMA to integer
  • Fix: RVV 8-bit overflow, UB in vqsort, big-endian bugs, PPC HTM
  • Improved codegen in various ops, fp16/bf16 tests and conversions
  • New targets: HWY_Z14, HWY_Z15
  • Test: add foreign_arch builders, CodeQL

1.0.7

30 Aug 07:06
Compare
Choose a tag to compare
  • Add LoadNOr, GatherIndexN, ScatterIndexN
  • Add additional float<->int conversions
  • Codegen improvements for 8-bit shift, PPC Compress/Expand
  • Fixes for MSVC, PPC, RVV, WASM, GCC 13, GCC 8.2, i686, f16 type, QEMU 7.2
  • Support CMake args in Debian packaging

1.0.6

11 Aug 15:01
Compare
Choose a tag to compare
  • Add MaskedGatherIndex, MaskedScatterIndex, LoadN, StoreN
  • Add SatWidenMulPairwiseAdd, SumOfMulQuadAccumulate, PromoteUpperLowerTo
  • Add F64 for Wasm, F64 AbsDiff
  • Add F16 support to AVX3_SPR, RVV tuple (both not yet enabled)
  • Validate all D args in x86 function signatures
  • License: now dual Apache2/BSD3
  • Doc: new users, vcpkg install instructions, AVX10 plans
  • Doc: advice on dynamic dispatch plus -march flags
  • Build: avoid installing hwy_test if !HWY_ENABLE_TESTS
  • Codegen: improved PPC9 Find*True, variable-length CopyBytes
  • Fix: GCC 8.2, MSVC, ICC, PPC9, SVE, arm64 MSVC issues
  • Fix: IfNegativeThenElse, MulFixedPoint15, Debian changelog format
  • Tests: faster builds (split up), use release builds

1.0.5

19 Jul 16:10
Compare
Choose a tag to compare
  • Add Insert/ExtractBlock, BroadcastBlock/Lane, NumBlocks
  • Add integer Le/Ge and [Neg]MulAdd, extend DemoteTo/PromoteTo
  • Add Leading/TrailingZeroCount, HighestSetBitIndex, ReverseBits
  • Add MaskedLoadOr, tuple Get/Set/Create, ReduceSum, WidenMulPairwiseAdd
  • Add [ZeroExtend]ResizeBitCast, BitwiseIfThenElse, Find[Known]LastTrue
  • Add AESRoundInv, AESKeyGenAssist
  • Add contrib/math Atan2/SinCos, contrib/unroller
  • Add fp16/bf16 support (Armv8, SVE, RVV), HWY_DYNAMIC_POINTER
  • Add OrderedTruncate2To, Per4LaneBlockShuffle, TwoTablesLookupLanes
  • Add SlideUp/Down[Blocks/Lanes], Slide1Up/Down, ReverseLaneBytes
  • Add SetBeforeFirst, SetAtOrBefore/AfterFirst, SetOnlyFirst
  • Add 8-bit Reverse2/4/8, Shl/Shr, RotateRight, Reverse, Mul
  • Add 8/16-bit DupEven/Odd, TableLookupLanes
  • Add F64 ApproximateReciprocal[Sqrt], 32/64-bit SaturatedAdd/Sub
  • Build: Support Bazel modules
  • Codegen improvements
  • Compiler: support Clang 15/16
  • Doc: add Github pages, support policy, evaluation
  • Doc: publish AVX-512 throttling/startup findings
  • Release: add signing
  • Test: add GCC to Github Actions
  • VQSort: small N speedups: fix seeding, func ptr, 8-wide network.
  • VQSort: add BenchAllColdSort, VQSortStatic
  • VQSort: fix subnormal/inf/NaN, support fp16, fix KV types
  • Workarounds: RVV VXRM, x87 excess precision, missing intrinsics

1.0.4

17 Mar 15:33
Compare
Choose a tag to compare
  • Add PPC8..10, SSE2, AVX3_ZEN4, NEON_WITHOUT_AES targets
  • Add Expand, LoadExpand, integer AbsDiff, SumsOf8AbsDiff
  • Improved Half/Twice support, codegen for Shift*Same
  • Support Wasm in Godbolt
  • Faster KV128 sorting
  • Fix armv7 build config, CMake config mode
  • Update RVV intrinsics for 1.0-draft

1.0.3

19 Jan 15:20
Compare
Choose a tag to compare
  • Add RearrangeToOddPlusEven, Xor3, 8-bit CompressStore, HWY_ASSUME
  • Add contrib/bit_pack for 8/16-bit lanes
  • Add WASM_EMU256 target
  • Documentation improvements
  • Allow opting out of C++ stdlib usage for Compiler Explorer
  • Update for new RVV intrinsics; faster WASM min/max and extmul/q15mul
  • Fix UB, GCC atomic

1.0.2

28 Oct 11:05
Compare
Choose a tag to compare
  • Add ExclusiveNeither, FindKnownFirstTrue, Ne128
  • Add 16-bit SumOfLanes/ReorderWidenMulAccumulate/ReorderDemote2To
  • Faster sort for low-entropy input, improved pivot selection
  • Add GN build system, Highway FAQ, k32v32 type to vqsort
  • CMake: Support find_package(GTest), add rvv-inl.h, add HWY_ENABLE_TESTS
  • Fix MIPS and C++20 build, Apple LLVM 10.3 detection, EMU128 AllTrue on RVV
  • Fix missing exec_prefix, RVV build, warnings, libatomic linking
  • Work around GCC 10.4 issue, disabled RDCYCLE, arm7 with vfpv3
  • Documentation/example improvements
  • Support static dispatch to SVE2_128 and SVE_256

1.0.1

24 Aug 16:43
Compare
Choose a tag to compare
  • Add Eq128, i64 Mul, unsigned->float ConvertTo
  • Faster sort for few unique keys, more robust pivot selection
  • Fix: floating-point generator for sort tests, Min/MaxOfLanes for i16
  • Fix: avoid always_inline in debug, link atomic
  • GCC warnings: string.h, maybe-uninitialized, ignored-attributes
  • GCC warnings: preprocessor int overflow, spurious use-after-free/overflow
  • Doc: <=HWY_AVX3, Full32/64/128, how to use generic-inl

1.0.0

27 Jul 14:56
Compare
Choose a tag to compare
  • ABI change: 64-bit target values, more room for expansion
  • Add CompressBlocksNot, CompressNot, Lt128Upper, Min/Max128Upper, TruncateTo
  • Add HWY_SVE2_128 target
  • Sort speedups especially for 128-bit
  • Documentation clarifications
  • Faster NEON CountTrue/FindFirstTrue/AllFalse/AllTrue
  • Improved SVE codegen
  • Fix u16x8 ConcatEven/Odd, SSSE3 i64 Lt
  • MSVC 2017 workarounds
  • Support for runtime dispatch on Arm/GCC/Linux

The 1.0 release signals an increased focus on backwards compatibility.
Applications using documented functionality will remain compatible with future updates that have the same major version number.