Replies: 10 comments
-
DraftI have to saw the source code of chorme v8 engine. To really optimize hash container we need to know the principle of Js-array. Consider the following code: console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = new Array(num);
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 11.60986328125 ms console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = new Array(num);
arr[arr.length + num] = 1;
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 619.56884765625 ms console.time('test-js-array');
const num = 1024 * 1024 * 6;
const arr = {};
arr[arr.length + num] = 1;
for (let i = 0; i < num; ++i) arr[i] = i;
console.timeEnd('test-js-array');
// test-js-array: 81.559814453125 ms The length only increased one but the running time became ten times over. The reason is the array in js has two mode: fast and slow. The fast mode will create a memory contiguous array in C++. But the slow mode will create a You know all the members in js is When you change the length of But in So we don't need to make the Just change the TestDo a simple verification of the above theory. // After change.
import { HashSet } from 'js-sdsl';
const testNum = 1000000;
function generateRandomString() {
const length = Math.max(Math.floor(Math.random() * 6), 1);
const base = 'a'.charCodeAt(0);
let str = '';
for (let i = 0; i < length; ++i) {
const random = Math.floor(Math.random() * 26);
str += String.fromCharCode(base + random);
}
return str;
}
const arr = [];
for (let i = 0; i < testNum; ++i) {
arr.push(generateRandomString());
arr.push(Math.random() * testNum);
arr.push(i);
}
const st = new Set();
console.time('ES6');
for (let i = 0; i < testNum; ++i) {
st.add(arr[i]);
}
console.timeEnd('ES6');
const hst = new HashSet<number>();
console.time('js-sdsl');
for (let i = 0; i < testNum; ++i) {
hst.insert(arr[i]);
}
console.timeEnd('js-sdsl');
// ES6: 552.02ms
// js-sdsl: 999.697ms Much faster than before. I'll do memory test later. |
Beta Was this translation helpful? Give feedback.
-
About hash functionI find the source hash function for inline uint32_t ComputeIntegerHash(uint32_t key, uint32_t seed) {
uint32_t hash = key;
hash = hash ^ seed;
hash = ~hash + (hash << 15); // hash = (hash << 15) - hash - 1;
hash = hash ^ (hash >> 12);
hash = hash + (hash << 2);
hash = hash ^ (hash >> 4);
hash = hash * 2057; // hash = (hash + (hash << 3)) + (hash << 11);
hash = hash ^ (hash >> 16);
return hash & 0x3fffffff;
} That's easy to implement. But the hash function of int DescriptorLookupCache::Hash(Object* source, Name* name) {
// Uses only lower 32 bits if pointers are larger.
uint32_t source_hash =
static_cast<uint32_t>(reinterpret_cast<uintptr_t>(source)) >>
kPointerSizeLog2;
uint32_t name_hash = name->hash_field();
return (source_hash ^ name_hash) % kLength;
} I have no idea how to implement it. It seems like we have no way to reduce the time to get the hash code. |
Beta Was this translation helpful? Give feedback.
-
=================================== HashSet ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │ testFunc │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│ 0 │ 'constructor' │ 1 │ 1000000 │ 1537 │
│ 1 │ 'insert' │ 1000000 │ 2000000 │ 5486 │
│ 2 │ 'find' │ 2000000 │ 2000000 │ 2380 │
│ 3 │ 'eraseElementByKey' │ 2000000 │ 2000000 │ 1703 │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== HashMap ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │ testFunc │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│ 0 │ 'constructor' │ 1 │ 1000000 │ 2308 │
│ 1 │ 'setElement' │ 1000000 │ 1000000 │ 4395 │
│ 2 │ 'getElementByKey' │ 1000000 │ 1000000 │ 3872 │
│ 3 │ 'eraseElementByKey' │ 1000000 │ 1000000 │ 4138 │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== Report ===================================
=================================== HashSet ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │ testFunc │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│ 0 │ 'constructor' │ 1 │ 1000000 │ 1064 │
│ 1 │ 'insert' │ 1000000 │ 2000000 │ 1140 │
│ 2 │ 'find' │ 2000000 │ 2000000 │ 4839 │
│ 3 │ 'eraseElementByKey' │ 2000000 │ 2000000 │ 2911 │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== HashMap ===================================
┌─────────┬─────────────────────┬─────────┬───────────────┬─────────┐
│ (index) │ testFunc │ testNum │ containerSize │ runTime │
├─────────┼─────────────────────┼─────────┼───────────────┼─────────┤
│ 0 │ 'constructor' │ 1 │ 1000000 │ 988 │
│ 1 │ 'setElement' │ 1000000 │ 1000000 │ 271 │
│ 2 │ 'getElementByKey' │ 1000000 │ 1000000 │ 1205 │
│ 3 │ 'eraseElementByKey' │ 1000000 │ 1000000 │ 863 │
└─────────┴─────────────────────┴─────────┴───────────────┴─────────┘
=================================== Report =================================== |
Beta Was this translation helpful? Give feedback.
-
I find that slow mode for I think we should find a new way to solve this problem. The best idea I consider is that change |
Beta Was this translation helpful? Give feedback.
-
The source code of ES6 Set: https://codereview.chromium.org/220293002/diff/20001/src/objects.cc. |
Beta Was this translation helpful? Give feedback.
-
Wouldn't it be possible to create a hash value for a reference or something like that by putting the references in the JS set and giving them an ID? |
Beta Was this translation helpful? Give feedback.
-
Do you mean It will use object |
Beta Was this translation helpful? Give feedback.
-
const ObjectIdAllocator = new class ObjectIdAllocator {
private _nextId = 0;
private _map = new WeakMap<object, number>();
public generateOrGetId(obj: object): number {
let id = this._map.get(obj);
if (id === undefined) {
id = this._nextId;
this._nextId += 1;
this._map.set(obj, id);
}
return id;
}
}
const obj1 = {
a: 1,
b: 2
};
const obj2 = {
a: 1,
b: 2
};
console.log(ObjectIdAllocator.generateOrGetId(obj1)); // 0
console.log(ObjectIdAllocator.generateOrGetId(obj2)); // 1 It seems that the hashing cost can be reduced by replacing the hashing function with an object that gives a unique ID according to the reference. |
Beta Was this translation helpful? Give feedback.
-
I carefully studied the internal implementation of Js v8, I think it is impossible to make the hash table perform better than the native The reasons are as follows:
I decided to hold off on this problem until I find a good solution. It's worth mentioning that while I can't make it as fast as native If the Because the current hash table does not fully refer to the Java implementation due to some initial design flaws. I'll try to solve this problem as much as possible but it may take long long time to re-design I think the only way to optimize the performance of the hash table as soon as possible is to encapsulate the native |
Beta Was this translation helpful? Give feedback.
-
@noname0310 Hey, I send a email for you ([email protected]). |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
All reactions