Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finagle-chirper fails on x86-64 Linux #231

Open
piyush286 opened this issue May 5, 2020 · 4 comments
Open

finagle-chirper fails on x86-64 Linux #231

piyush286 opened this issue May 5, 2020 · 4 comments
Assignees
Milestone

Comments

@piyush286
Copy link

piyush286 commented May 5, 2020

Problem Description

Getting the following errors while running finagle-chirper on x86 Linux with Openjdk11-OpenJ9. Earlier, I could run this benchmark on this platform successfully as mentioned in #211.

Errors

12:01:16  Resetting master, feed map size: 5000
12:01:21  ====== finagle-chirper (twitter-finagle), iteration 14 completed (9824.963 ms) ======
12:01:21  ====== finagle-chirper (twitter-finagle), iteration 15 started ======
12:01:21  Resetting master, feed map size: 5000
12:01:26  Exception in thread "Thread-1140" Exception in thread "Thread-1084" Exception in thread "Thread-1087" Exception in thread "Thread-1147" Exception in thread "Thread-1148" Exception in thread "Thread-1119" Exception in thread "Thread-1130" Exception in thread "Thread-1136" Exception in thread "Thread-1102" Exception in thread "Thread-1122" Exception in thread "Thread-1155" Exception in thread "Thread-1123" Exception in thread "Thread-1134" Exception in thread "Thread-1129" Exception in thread "Thread-1108" Exception in thread "Thread-1121" Exception in thread "Thread-1124" Exception in thread "Thread-1106" Exception in thread "Thread-1101" Exception in thread "Thread-1135" Exception in thread "Thread-1138" Exception in thread "Thread-1086" Exception in thread "Thread-1104" Exception in thread "Thread-1146" Exception in thread "Thread-1114" Exception in thread "Thread-1095" Exception in thread "Thread-1141" Exception in thread "Thread-1153" Exception in thread "Thread-1098" Exception in thread "Thread-1150" Exception in thread "Thread-1131" Exception in thread "Thread-1117" Exception in thread "Thread-1091" Exception in thread "Thread-1105" Exception in thread "Thread-1093" Exception in thread "Thread-1139" Failure(connection timed out: localhost/127.0.0.1:37255 at remote address: localhost/127.0.0.1:37255. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:37255, Downstream label: :37255, Trace Id: 7ae0eb8657c17c60.7ae0eb8657c17c60<:7ae0eb8657c17c60 with Service -> :37255Failure(connection timed out: localhost/127.0.0.1:46501 at remote address: localhost/127.0.0.1:46501. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:46501, Downstream label: :46501, Trace Id: 5f55821328870c8f.5f55821328870c8f<:5f55821328870c8f with Service -> :46501Failure(connection timed out: localhost/127.0.0.1:38355 at remote address: localhost/127.0.0.1:38355. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:38355, Downstream label: :38355, Trace Id: 4bfaf6661fa74a2f.4bfaf6661fa74a2f<:4bfaf6661fa74a2f with Service -> :38355Failure(connection timed out: localhost/127.0.0.1:41496 at remote address: localhost/127.0.0.1:41496. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:41496, Downstream label: :41496, Trace Id: 46eb25346382bfc4.46eb25346382bfc4<:46eb25346382bfc4 with Service -> :41496Failure(connection timed out: localhost/127.0.0.1:37794 at remote address: localhost/127.0.0.1:37794. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:37794, Downstream label: :37794, Trace Id: 076c59f94b627628.076c59f94b627628<:076c59f94b627628 with Service -> :37794Failure(connection timed out: localhost/127.0.0.1:43615 at remote address: localhost/127.0.0.1:43615. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:43615, Downstream label: :43615, Trace Id: 49d171f379962e7c.49d171f379962e7c<:49d171f379962e7c with Service -> :43615Failure(connection timed out: localhost/127.0.0.1:42070 at remote address: localhost/127.0.0.1:42070. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:42070, Downstream label: :42070, Trace Id: 1c70dfc606542b87.1c70dfc606542b87<:1c70dfc606542b87 with Service -> :42070
12:01:26  
12:01:26  
12:01:26  Caused by: Caused by: 
12:01:26  com.twitter.finagle.ConnectionFailedException: connection timed out: localhost/127.0.0.1:41496 at remote address: localhost/127.0.0.1:41496. Remote Info: Not AvailableCaused by: com.twitter.finagle.ConnectionFailedException: connection timed out: localhost/127.0.0.1:43615 at remote address: localhost/127.0.0.1:43615. Remote Info: Not Available
12:01:26  com.twitter.finagle.ConnectionFailedException: connection timed out: localhost/127.0.0.1:37794 at remote address: localhost/127.0.0.1:37794. Remote Info: Not AvailableCaused by: 
12:01:26  com.twitter.finagle.ConnectionFailedException: connection timed out: localhost/127.0.0.1:42070 at remote address: localhost/127.0.0.1:42070. Remote Info: Not Available
12:01:26  	at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:99)
12:01:26  
12:01:26  	at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:99)	at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:99)	at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:78)
12:01:26  
12:01:26  	at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:99)	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
12:01:26  Exception in thread "Thread-1152" 
12:01:26  
12:01:26  	at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:78)Failure(connection timed out: localhost/127.0.0.1:44631 at remote address: localhost/127.0.0.1:44631. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:44631, Downstream label: :44631, Trace Id: 1440fc2c6d90890a.1440fc2c6d90890a<:1440fc2c6d90890a with Service -> :44631
12:01:26  
12:01:26  	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)	at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:78)	at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:78)
12:01:26  	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
12:01:26  
12:01:26  	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)Caused by: 
12:01:26  com.twitter.finagle.ConnectionFailedException: connection timed out: localhost/127.0.0.1:44631 at remote address: localhost/127.0.0.1:44631. Remote Info: Not Available
12:01:26  
12:01:26  	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)	at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:99)
12:01:26  
12:01:26  	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)	at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:78)
12:01:26  
12:01:26  	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
12:01:26  	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
12:01:26  
12:01:26  	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:570)
12:01:26  
12:01:26  	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
12:01:26  
12:01:26  
12:01:26  	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:570)
12:01:26  	at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
12:01:26  	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
12:01:26  	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
12:01:26  	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:570)
12:01:26  
12:01:26  	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)Exception in thread "Thread-1137" 
12:01:26  
12:01:26  	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
12:01:26  
12:01:26  	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)Failure(connection timed out: localhost/127.0.0.1:42545 at remote address: localhost/127.0.0.1:42545. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:42545, Downstream label: :42545, Trace Id: fad5076b6173754c.fad5076b6173754c<:fad5076b6173754c with Service -> :42545	at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
12:01:26  	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:335)	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
12:01:26  
12:01:26  
12:01:26  	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)
12:01:26  
12:01:26  	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:570)Caused by: 
12:01:26  	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)	at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
12:01:26  
12:01:26  
12:01:26  	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)com.twitter.finagle.ConnectionFailedException: connection timed out: localhost/127.0.0.1:42545 at remote address: localhost/127.0.0.1:42545. Remote Info: Not Available	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
12:01:26  
12:01:26  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)Exception in thread "Thread-1144" 	at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:99)
12:01:26  	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:335)
12:01:26  	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
12:01:26  
12:01:26  	at com.twitter.finagle.netty4.ConnectionBuilder$$anon$1.operationComplete(ConnectionBuilder.scala:78)	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
12:01:26  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)Failure(connection timed out: localhost/127.0.0.1:34803 at remote address: localhost/127.0.0.1:34803. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:34803, Downstream label: :34803, Trace Id: 758e432f8c8caca7.758e432f8c8caca7<:758e432f8c8caca7 with Service -> :34803
12:01:26  
12:01:26  
12:01:26  	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
12:01:26  	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:335)	at com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run(BlockingTimeTrackingThreadFactory.scala:23)
12:01:26  	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
12:01:26  
12:01:26  
12:01:26  	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
12:01:26  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
12:01:26  Caused by: 
12:01:26  
12:01:26  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)com.twitter.finagle.ConnectionFailedException: connection timed out: localhost/127.0.0.1:34803 at remote address: localhost/127.0.0.1:34803. Remote Info: Not Available	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
12:01:26  
12:01:26  	at java.base/java.lang.Thread.run(Thread.java:834)	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
12:01:26  
12:01:26  
12:01:26  
12:01:26  	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)	at 

To Reproduce

  • Direct command to run the benchmark
    OR
  1. Add (or enable if it already exists) the target(s) in the Renaissance playlist (https://github.com/AdoptOpenJDK/openjdk-tests/blob/master/perf/renaissance/playlist.xml)
  2. Use the Adopt Grinder: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/build?delay=0sec if you have permissions or else, run locally
  3. Set the TARGET to the relevant benchmark
@ceresek
Copy link
Collaborator

ceresek commented May 7, 2020

I'm running finagle-chirper on x86_64 (Fedora Linux) and OpenJDK 11 with no obvious issues. Could you please provide a bit more info to help us reproduce the error ? (Did you use Renaissance HEAD or the 0.10.0 release ? Is the machine you are using special in any way - e.g. high number of cores, lots of RAM, etc. ?) Thanks.

@piyush286
Copy link
Author

piyush286 commented May 7, 2020

I used 0.9.0 release from here: https://github.com/renaissance-benchmarks/renaissance/releases/download/v0.9.0/renaissance-mit-0.9.0.jar

OpenJ9 JDK: https://github.com/AdoptOpenJDK/openjdk11-binaries/releases/download/jdk-11.0.7%2B10_openj9-0.20.0/OpenJDK11U-jdk_x64_linux_openj9_11.0.7_10_openj9-0.20.0.tar.gz

Here's the info about the machine:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                72
On-line CPU(s) list:   0-71
Thread(s) per core:    2
Core(s) per socket:    18
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               1869.451
CPU max MHz:           3600.0000
CPU min MHz:           1200.0000
BogoMIPS:              4599.93
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-17,36-53
NUMA node1 CPU(s):     18-35,54-71
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts

@ceresek
Copy link
Collaborator

ceresek commented May 7, 2020

I've just finished running Renaissance 393adff with OpenJ9 11.0.8 x86_64 build from May 5, on a machine with 80 processors, with and without forced GC between iterations, default number of iterations (90), with no error, so this looks a bit more difficult to reproduce.

Can you please try with Renaissance built from current HEAD ?

Also, is it possible that the machine where you see the problem is also loaded by other workloads ?

@farquet farquet added this to the 1.0.0 milestone Apr 27, 2021
@farquet
Copy link
Collaborator

farquet commented Apr 27, 2021

I can reproduce this problem on a specific machine. I'll have a look but I feel the benchmark can't open some ports or reach out to localhost somehow.

@farquet farquet self-assigned this Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants