Optimize HashContainer until the performance is on par with ES6 #105

ZLY201 · 2022-09-07T14:14:13Z

ZLY201
Sep 7, 2022
Maintainer

ZLY201 · 2022-10-04T13:11:55Z

ZLY201
Oct 4, 2022
Maintainer Author

Draft

I have to saw the source code of chorme v8 engine.

To really optimize hash container we need to know the principle of Js-array.

Consider the following code:

console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = new Array(num);
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 11.60986328125 ms

console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = new Array(num);
arr[arr.length + num] = 1;
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 619.56884765625 ms

console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = {};
arr[arr.length + num] = 1;
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 81.559814453125 ms

The length only increased one but the running time became ten times over.

The reason is the array in js has two mode: fast and slow.

The fast mode will create a memory contiguous array in C++.

But the slow mode will create a HashTable to reduce the memory.

You know all the members in js is Object including Array.

When you change the length of Array directly it will be changed to a HashTable like { 0: xxx, 1: xxx, ... }.

But in Java and C++, the array is always contiguous.

So we don't need to make the reAllocate function to expand or save the memory.

Just change the hashTable's length to (1 << 30).

Test

Do a simple verification of the above theory.

// After change.
import { HashSet } from 'js-sdsl';
const testNum = 1000000;
function generateRandomString() {
    const length = Math.max(Math.floor(Math.random() * 6), 1);
    const base = 'a'.charCodeAt(0);
    let str = '';
    for (let i = 0; i < length; ++i) {
        const random = Math.floor(Math.random() * 26);
        str += String.fromCharCode(base + random);
    }
    return str;
}
const arr = [];
for (let i = 0; i < testNum; ++i) {
    arr.push(generateRandomString());
    arr.push(Math.random() * testNum);
    arr.push(i);
}
const st = new Set();
console.time('ES6');
for (let i = 0; i < testNum; ++i) {
    st.add(arr[i]);
}
console.timeEnd('ES6');
const hst = new HashSet<number>();
console.time('js-sdsl');
for (let i = 0; i < testNum; ++i) {
    hst.insert(arr[i]);
}
console.timeEnd('js-sdsl');
// ES6: 552.02ms
// js-sdsl: 999.697ms

Much faster than before.

I'll do memory test later.

0 replies

ZLY201 · 2022-10-04T13:28:50Z

ZLY201
Oct 4, 2022
Maintainer Author

About hash function

I find the source hash function for Integer in v8.

inline uint32_t ComputeIntegerHash(uint32_t key, uint32_t seed) {
  uint32_t hash = key;
  hash = hash ^ seed;
  hash = ~hash + (hash << 15);  // hash = (hash << 15) - hash - 1;
  hash = hash ^ (hash >> 12);
  hash = hash + (hash << 2);
  hash = hash ^ (hash >> 4);
  hash = hash * 2057;  // hash = (hash + (hash << 3)) + (hash << 11);
  hash = hash ^ (hash >> 16);
  return hash & 0x3fffffff;
}

That's easy to implement.

But the hash function of object it use the address of the pointer.

int DescriptorLookupCache::Hash(Object* source, Name* name) {
  // Uses only lower 32 bits if pointers are larger.
  uint32_t source_hash =
      static_cast<uint32_t>(reinterpret_cast<uintptr_t>(source)) >>
      kPointerSizeLog2;
  uint32_t name_hash = name->hash_field();
  return (source_hash ^ name_hash) % kLength;
}

I have no idea how to implement it.

It seems like we have no way to reduce the time to get the hash code.

0 replies

ZLY201 · 2022-10-04T14:42:04Z

ZLY201
Oct 4, 2022
Maintainer Author

before

=================================== HashSet ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │      testFunc       │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│    0    │    'constructor'    │    1    │    1000000    │  1537   │
│    1    │      'insert'       │ 1000000 │    2000000    │  5486   │
│    2    │       'find'        │ 2000000 │    2000000    │  2380   │
│    3    │ 'eraseElementByKey' │ 2000000 │    2000000    │  1703   │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== HashMap ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │      testFunc       │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│    0    │    'constructor'    │    1    │    1000000    │  2308   │
│    1    │    'setElement'     │ 1000000 │    1000000    │  4395   │
│    2    │  'getElementByKey'  │ 1000000 │    1000000    │  3872   │
│    3    │ 'eraseElementByKey' │ 1000000 │    1000000    │  4138   │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== Report ===================================

after

=================================== HashSet ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │      testFunc       │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│    0    │    'constructor'    │    1    │    1000000    │  1064   │
│    1    │      'insert'       │ 1000000 │    2000000    │  1140   │
│    2    │       'find'        │ 2000000 │    2000000    │  4839   │
│    3    │ 'eraseElementByKey' │ 2000000 │    2000000    │  2911   │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== HashMap ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │      testFunc       │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│    0    │    'constructor'    │    1    │    1000000    │   988   │
│    1    │    'setElement'     │ 1000000 │    1000000    │   271   │
│    2    │  'getElementByKey'  │ 1000000 │    1000000    │  1205   │
│    3    │ 'eraseElementByKey' │ 1000000 │    1000000    │   863   │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== Report ===================================

0 replies

ZLY201 · 2022-10-11T06:43:37Z

ZLY201
Oct 11, 2022
Maintainer Author

I find that slow mode for Array is really slow.

I think we should find a new way to solve this problem.

The best idea I consider is that change hashTable type to Node<K, V>[] like java.

0 replies

ZLY201 · 2022-10-11T19:58:47Z

ZLY201
Oct 11, 2022
Maintainer Author

The source code of ES6 Set: https://codereview.chromium.org/220293002/diff/20001/src/objects.cc.

0 replies

noname0310 · 2022-10-20T11:08:34Z

noname0310
Oct 20, 2022
Collaborator

About hash function

I find the source hash function for Integer in v8.

inline uint32_t ComputeIntegerHash(uint32_t key, uint32_t seed) {
  uint32_t hash = key;
  hash = hash ^ seed;
  hash = ~hash + (hash << 15);  // hash = (hash << 15) - hash - 1;
  hash = hash ^ (hash >> 12);
  hash = hash + (hash << 2);
  hash = hash ^ (hash >> 4);
  hash = hash * 2057;  // hash = (hash + (hash << 3)) + (hash << 11);
  hash = hash ^ (hash >> 16);
  return hash & 0x3fffffff;
}

That's easy to implement.

But the hash function of object it use the address of the pointer.

int DescriptorLookupCache::Hash(Object* source, Name* name) {
  // Uses only lower 32 bits if pointers are larger.
  uint32_t source_hash =
      static_cast<uint32_t>(reinterpret_cast<uintptr_t>(source)) >>
      kPointerSizeLog2;
  uint32_t name_hash = name->hash_field();
  return (source_hash ^ name_hash) % kLength;
}

I have no idea how to implement it.

It seems like we have no way to reduce the time to get the hash code.

Wouldn't it be possible to create a hash value for a reference or something like that by putting the references in the JS set and giving them an ID?

0 replies

ZLY201 · 2022-10-20T11:19:01Z

ZLY201
Oct 20, 2022
Maintainer Author

Wouldn't it be possible to create a hash value for a reference or something like that by putting the references in the JS set and giving them an ID?

Do you mean { value: xxx, id: xxx }?

It will use object {} and it's a hash container too.

0 replies

noname0310 · 2022-10-20T14:49:27Z

noname0310
Oct 20, 2022
Collaborator

Wouldn't it be possible to create a hash value for a reference or something like that by putting the references in the JS set and giving them an ID?

Do you mean { value: xxx, id: xxx }?

It will use object {} and it's a hash container too.

const ObjectIdAllocator = new class ObjectIdAllocator {
    private _nextId = 0;
    private _map = new WeakMap<object, number>();

    public generateOrGetId(obj: object): number {
        let id = this._map.get(obj);
        if (id === undefined) {
            id = this._nextId;
            this._nextId += 1;
            this._map.set(obj, id);
        }
        return id;
    }
}

const obj1 = {
    a: 1,
    b: 2
};

const obj2 = {
    a: 1,
    b: 2
};

console.log(ObjectIdAllocator.generateOrGetId(obj1)); // 0
console.log(ObjectIdAllocator.generateOrGetId(obj2)); // 1

It seems that the hashing cost can be reduced by replacing the hashing function with an object that gives a unique ID according to the reference.

0 replies

ZLY201 · 2022-10-29T14:44:44Z

ZLY201
Oct 29, 2022
Maintainer Author

I carefully studied the internal implementation of Js v8, I think it is impossible to make the hash table perform better than the native object or set.

The reasons are as follows:

object is super fast, 70 times faster than the hash table I optimized.
The source code of js is written by c++, array in Js are not consistent with any other language. I can't use it as well as other static languages.
I can't do some low-level operations like getting the variable address. It make my hash function so slow.
If I want to optimize further, I may need to use some functions that some browsers do not support, but object is fully compatible for all the browsers.

I decided to hold off on this problem until I find a good solution.

It's worth mentioning that while I can't make it as fast as native set, it's not impossible to get performance that is orders of magnitude close.

If the object do something by 20ms I think I can make my hash table do it wthin 100ms.

Because the current hash table does not fully refer to the Java implementation due to some initial design flaws. I'll try to solve this problem as much as possible but it may take long long time to re-design js-sdsl. The version should be 5.x.

I think the only way to optimize the performance of the hash table as soon as possible is to encapsulate the native object or set. It turns out that popular hash libraries do this. But that should be discussed.

0 replies

ZLY201 · 2024-01-22T07:19:47Z

ZLY201
Jan 22, 2024
Maintainer Author

@noname0310 Hey, I send a email for you ([email protected]).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Js-sdsl

Optimize HashContainer until the performance is on par with ES6 #105

{{title}}

Replies: 10 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

About hash function

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Js-sdsl

Optimize HashContainer until the performance is on par with ES6 #105

ZLY201 Sep 7, 2022 Maintainer

Replies: 10 comments

ZLY201 Oct 4, 2022 Maintainer Author

Draft

Test

ZLY201 Oct 4, 2022 Maintainer Author

About hash function

ZLY201 Oct 4, 2022 Maintainer Author

ZLY201 Oct 11, 2022 Maintainer Author

ZLY201 Oct 11, 2022 Maintainer Author

noname0310 Oct 20, 2022 Collaborator

About hash function

ZLY201 Oct 20, 2022 Maintainer Author

noname0310 Oct 20, 2022 Collaborator

ZLY201 Oct 29, 2022 Maintainer Author

ZLY201 Jan 22, 2024 Maintainer Author

ZLY201
Sep 7, 2022
Maintainer

ZLY201
Oct 4, 2022
Maintainer Author

ZLY201
Oct 4, 2022
Maintainer Author

ZLY201
Oct 4, 2022
Maintainer Author

ZLY201
Oct 11, 2022
Maintainer Author

ZLY201
Oct 11, 2022
Maintainer Author

noname0310
Oct 20, 2022
Collaborator

ZLY201
Oct 20, 2022
Maintainer Author

noname0310
Oct 20, 2022
Collaborator

ZLY201
Oct 29, 2022
Maintainer Author

ZLY201
Jan 22, 2024
Maintainer Author