Skip to content

Conversation

juj
Copy link
Collaborator

@juj juj commented Sep 9, 2025

Two problems that I run into with the parallel browser harness on Windows with Firefox:

  1. On Windows, all the browser windows open up in the exact same X,Y coordinate. Firefox has "is browser in the background" detection, which causes all but one test foreground harness to hang when running any requestAnimationFrame() based test.

  2. Killing a Firefox browser process with

proc = subprocess.Popen(['firefox'])
proc.kill()

does not work. This is because Firefox is a multiprocess browser, and does not have a process hierarchy where the first launched process would be some kind of a master process. (this is an old known issue, that can be seen also in existing emrun.py code) There seems to be about 12 processes that come alive when running subprocess.Popen(['firefox']).


To fix these issues:

  1. ask the Windows Win32 API to move each spawned process to its own X,Y location. Use a file-based multiprocessing lock + counter file to get unique counter to each process.

  2. Snapshot alive processes before spawning Firefox, and after. Then acquire the delta of these two to find which process IDs correspond to the launched Firefox. (I am not 100% sure how watertight this scheme is, but I'm at least getting successful runs with retries now)

@juj juj requested a review from brendandahl September 9, 2025 21:55
@brendandahl
Copy link
Collaborator

Have you tried using -new-instance (use to be -no-remote)? IIRC, that should ensure a new process and the process will be the one that can be killed to bring down the rest.

@juj
Copy link
Collaborator Author

juj commented Sep 9, 2025

Have you tried using -new-instance

Do you mean set EMTEST_BROWSER="C:\Program Files\Mozilla Firefox\firefox.exe" -new-instance before running?

@brendandahl
Copy link
Collaborator

Yeah

@juj
Copy link
Collaborator Author

juj commented Sep 9, 2025

Thanks - running that now, all the browser windows do at least spawn on top of each other like before this PR, causing several tests to fail.

I'll have to do more tests to see if -new-instance fixes the subprocess killing.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 9, 2025

Strange that -new-instance is not implicit in the separate data directories. This seems to be the chromium behavior.

@brendandahl
Copy link
Collaborator

I'm having flashbacks to doing this exact thing in pdf.js. I thought we got to a point where we could reliably kill the process, but now that I look back we had some code to tear down browser processes by name too.

Eventually we started using puppeteer and didn't have to deal with this.

@juj
Copy link
Collaborator Author

juj commented Sep 9, 2025

Unfortunately -new-instance is not working out well, because the returned proc from subprocess.Popen() cannot still be used to move the Firefox window.

I'll add that flag into the default Firefox spawn flags, though I don't see any difference between using it and not using it.. I think it was probably already in effect.

@brendandahl
Copy link
Collaborator

  1. On Windows, all the browser windows open up in the exact same X,Y coordinate. Firefox has "is browser in the background" detection, which causes all but one test foreground harness to hang when running any requestAnimationFrame() based test.

There's a pref to disable this dom.min_background_timeout_value

@juj
Copy link
Collaborator Author

juj commented Sep 9, 2025

There's a pref to disable this dom.min_background_timeout_value

It looks like the background tabs don't even navigate to the browser test pages if they are not foregrounded. The pref value is 1000 on my Firefox, not sure if that would solve anything.

@brendandahl
Copy link
Collaborator

1000 means every setTimeout will take 1s

@juj
Copy link
Collaborator Author

juj commented Sep 9, 2025

1000 means every setTimeout will take 1s

Thanks, hmm, that doesn't seem like the solution here then.

mypy is saying it wants to install some libraries.. I wonder how to resolve that..

@juj
Copy link
Collaborator Author

juj commented Sep 9, 2025

Btw, an unrelated item: I noticed I had to create a file called policies.json with the content

{ "policies": { "DisableAppUpdate": true } }

and place it in C:\Program Files\Mozilla Firefox\distribution\policies.json in order to get Firefox auto-updater to not trigger and break the CI.

The auto-updater is not tied to a specific user profile, so Windows UAC would pop up without that policies.json file.

E.g. https://support.mozilla.org/en-US/questions/1304175#answer-1350120

You may want to do that on CircleCI as well, if possible (or maybe already done)

@brendandahl
Copy link
Collaborator

Disabling throttling should fix the requestAnimationFrame issue. There's definitely a magic set of prefs that turns off throttling, but I'm not sure what the current prefs are. I ran into this same issue when I added headless mode to firefox.

@juj
Copy link
Collaborator Author

juj commented Sep 9, 2025

Disabling throttling should fix the requestAnimationFrame issue.

What I was saying above is that the issue is also that the background browser windows do not even load up the page contents, if they are on the background. So it is not only a rAF issue.

# safely.
if not BrowserCore.original_EMTEST_BROWSER:
BrowserCore.original_EMTEST_BROWSER = EMTEST_BROWSER
EMTEST_BROWSER = BrowserCore.original_EMTEST_BROWSER
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can instead maybe do this whole EMTEST_BROWSER during startup instead of in browser_open?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea. I'll refactor to do that in final form.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this extra complexity to manage firefox's lifetime makes me sad. How comes we don't seem to need any of this in circleci where we are currently running firefox tests?

I wonder if it makes sense to move it out the common.py and into its own browser_manger.py maybe?

@sbc100
Copy link
Collaborator

sbc100 commented Sep 10, 2025

This is because Firefox is a multiprocess browser, and does not have a process hierarchy where the first launched process would be some kind of a master process.

But chrome is also multi-process and this does work for chrome? That does proc.kill() actually end up killing then? Anything at all?

Is this a windows-only thing or should I be able to reproduce on linux too?

test/common.py Outdated
@@ -2461,13 +2467,77 @@ def init_worker(counter, lock):
counter.value += 1


def list_processes_by_name(exe_name):
try:
import psutil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just add this to requirements-dev.txt and import it unconditionally at the top.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we require all end users to have python psutil module installed? Are we ok to require that unconditionally?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it not end users, its just emscripten developers (that was the -dev part means). We should have a relatively low bar for adding stuff to requirements-dev.txt because it doesn't effect our actual users.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, we have sphinx in there, but we don't expect emscripten users to install sphinx.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm happy to do it, since it simplifies my trouble.. just went through all the trouble expecting that would have been a problem with this PR.

@juj
Copy link
Collaborator Author

juj commented Sep 10, 2025

How comes we don't seem to need any of this in circleci where we are currently running firefox tests?

Are you testing Firefox on Windows? I thought Firefox testing was on Ubuntu Focal only?

@juj
Copy link
Collaborator Author

juj commented Sep 10, 2025

I wonder if it makes sense to move it out the common.py and into its own browser_manger.py maybe?

Sure, I can do that.

But chrome is also multi-process and this does work for chrome? That does proc.kill() actually end up killing then? Anything at all?

I don't know if it works for Chrome on Windows, I haven't tested (yet). I did Firefox on Windows first, and looking on Firefox on macOS now. The way I wrote it is that it won't disturb the Chrome side.

Is this a windows-only thing or should I be able to reproduce on linux too?

I don't know if this is a Windows only thing. On Firefox on Linux both reasons to track/kill Firefox processes are avoided:
a) at least on Linux Mint 22 Cinnamon where I am testing Linux Firefox on, the windowing system spawns the Firefox browser windows automatically +20 pixels apart on each browser launch, so I don't need to do the manual tracking. (on macOS it doesn't and I see it has the same problem)
b) on Linux, it is not critical to kill Firefox in order to free up file handles, since Linux filesystem allows deleting files that are in use (and they just orphan out and vanish after use)

@sbc100
Copy link
Collaborator

sbc100 commented Sep 10, 2025

How comes we don't seem to need any of this in circleci where we are currently running firefox tests?

Are you testing Firefox on Windows? I thought Firefox testing was on Ubuntu Focal only?

We run them on linux only. Are all these fixes windows-specific? (even the "proc.kill doesn't work", thing?)

If it adds this much complexity perhaps we should say that the browser test suite does not run on windows? or at least the auto-launch and terminate stuff?

@juj
Copy link
Collaborator Author

juj commented Sep 10, 2025

If it adds this much complexity perhaps we should say that the browser test suite does not run on windows?

No, that is not acceptable. We just need to eat whatever the complexity is, and maybe later beg on https://bugzilla.mozilla.org/home to help make Firefox CI integration simpler.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 10, 2025

If it adds this much complexity perhaps we should say that the browser test suite does not run on windows?

No, that is not acceptable. We just need to eat whatever the complexity is, and maybe later beg on https://bugzilla.mozilla.org/home to help make Firefox CI integration simpler.

We've managed fine without running browser tests on non-linux for many many years now. I get that more coverage is good but is there a specific reason why you think this is needed now?

@sbc100
Copy link
Collaborator

sbc100 commented Sep 10, 2025

If it adds this much complexity perhaps we should say that the browser test suite does not run on windows?

No, that is not acceptable. We just need to eat whatever the complexity is, and maybe later beg on https://bugzilla.mozilla.org/home to help make Firefox CI integration simpler.

We've managed fine without running browser tests on non-linux for many many years now. I get that more coverage is good but is there a specific reason why you think this is needed now?

To be clear here, I'm not actually trying to block the landing of this PR, just trying understand the requirements.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 10, 2025

If / when this change does land I guess we should add at least some basic chrome and ff testing on Windows (and Mac) in circleci, otherwise this code will be very prone to breakage.

@juj
Copy link
Collaborator Author

juj commented Sep 10, 2025

We've managed fine without ...

Err.. yeah. We don't have that perception at Unity. Windows support in Emscripten is bitrotting every month I don't give it a look, and I count more than 100 pull requests now that I have landed to clean up on that.

... running browser tests on non-linux for many many years now. I get that more coverage is good but is there a specific reason why you think this is needed now?

We cannot risk to depend on Emscripten when it does not have the same testing coverage of the platforms that we are shipping Unity on.

@juj
Copy link
Collaborator Author

juj commented Sep 10, 2025

I guess we should add at least some basic chrome and ff testing on Windows (and Mac)

That sounds great 👍

@sbc100
Copy link
Collaborator

sbc100 commented Sep 10, 2025

We've managed fine without ...

Err.. yeah. We don't have that perception at Unity. Windows support in Emscripten is bitrotting every month I don't give it a look, and I count more than 100 pull requests now that I have landed to clean up on that.

I get that a lot of configuration don't have enough coverage but really appropriate all those recent fixes, but specifically do you think that there are a lot of those PR are that only reproduce in the windows browser tests?

Maybe my perception is wrong but most of the recent fixes seems like they fix things with the core/other tests suite (which I agree could would be good to run more of on windows... but is unrelated specifically to windows+browser testing).

My hope is that there should not be a lot of bugs that slip through due to lack of cross-platform browser testing. Can you point of one or two changes that are only caught in these modes, so I can better understand the type of bugs that we would be catching? I can imagine maybe webGPU-related stuff?

@juj
Copy link
Collaborator Author

juj commented Sep 10, 2025

but most of the recent fixes seems like they fix things with the core/other tests suite

Well yes, because that is where I started cleaning up the suite.

Here is the current set of browser tests I have disabled on Windows to get Firefox to pass:

            browser_test_skips = ['test_webgpu_required_limits',
                                  'test_webgpu_basic_rendering_pthreads',
                                  'test_webgpu_basic_rendering_main_module',
                                  'test_webgpu_basic_rendering_closure_advanced',
                                  'test_webgpu_basic_rendering_closure',
                                  'test_webgpu_basic_rendering',
                                  'test_webgl_resize_offscreencanvas_from_main_thread_offscreen',
                                  'test_webgl_resize_offscreencanvas_from_main_thread_offscreen_blocking',
                                  'test_webgl_resize_offscreencanvas_from_main_thread_offscreen_proxy',
                                  'test_webgl_resize_offscreencanvas_from_main_thread_offscreen_blocking_proxy',
                                  'test_webgl_resize_offscreencanvas_from_main_thread_offscreencanvas',
                                  'test_webgl_offscreen_canvas_in_pthread_chained',
                                  'test_webgl_offscreen_canvas_in_pthread',
                                  'test_cubegeom_pre_vao_es',
                                  'test_cubegeom_pre_vao_regal',
                                  'test_cubegeom_pre_regal',
                                  'test_cubegeom_row_length',
                                  'test_sdl_ogl_proc_alias',
                                  'test_sdl_ogl_defaultmatrixmode',
                                  'test_sdl2_gl_frames_swap',
                                  'test_sdl2_canvas_proxy',
                                  'test_pthread_growth_mainthread_growable_arraybuffers',
                                  'test_pthread_growth_growable_arraybuffers',
                                  'test_sdl_audio_beep_sleep',
                                  'test_sdl_audio_beep_sleep_safeheap',
                                  'test_preload_caching_150mb',
                                  'test_preload_file',
                                  'test_preload_file_wasmfs',
                                  'test_glgears_long',
                                  'test_glgears_long_proxy',
                                  'test_cpuprofiler_memoryprofiler',
                                  'test_cpuprofiler_memoryprofiler_modularized',
                                  'test_fs_lz4fs_package',
                                  'test_rollup',
                                  'test_rollup_pthreads',
                                  'test_vite',
                                  'test_vite_pthreads',
                                  'test_webpack',
                                  'test_webpack_es6',
                                  'test_webpack_es6_pthreads',
                                  'test_webpack_es6_wasm2js',
                                  'test_webpack_es6_wasm2js_pthreads',
                                  'test_webpack_wasm2js',
                                  'test_webpack_wasm2js_pthreads',
                                  'test_webpack_pthreads',
                                  'test_dylink_dso_needed', # only fails in browser64_4gb suite
                                  'test_dylink_dso_needed_inworker', # only fails in browser64_4gb suite
                                  'test_runtimelink', # only fails in browser64_4gb suite
                                  'test_zzz_html_source_map',
                                  'test_glbook',
                                  'test_emscripten_main_loop',
                                  'test_emscripten_main_loop_pthreads'
                                  ]

If the whole harness was broken and unsupported by design on Windows, I would not be able to even come up with a list like above to see what works and what doesn't, and have to take it at word that it'll work out. That would not be appropriate.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 10, 2025

Great, that was the context I was missing, that does seems significant.

@juj
Copy link
Collaborator Author

juj commented Sep 10, 2025

Closing this PR to land in pieces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants