Page 1 of 1

Crashes on training

Posted: Thu Oct 12, 2023 12:56 am
by hullo

I tried realface and dlight so far. Realface crashes within a minute. Dlight crashes in a few hours. I'm on a 2022 Mac Studio base model and the fans never kicked in so I don't think it's my machine being overwhelmed. Here's what terminal looked like. Seems there are some issues with my setup but I need someone to translate :D

Code: Select all

Last login: Wed Oct 11 09:33:41 on ttys000
/Users/joshua/faceswap/faceswap_gui_launcher.command ; exit;
joshua@Joshuas-Mac-Studio ~ % /Users/joshua/faceswap/faceswap_gui_launcher.command ; exit;
Setting Faceswap backend to APPLE_SILICON
Metal device set to: Apple M1 Max

systemMemory: 32.00 GB
maxCacheSize: 10.67 GB

2023-10-11 15:36:18.830134: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-10-11 15:36:18.830260: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
10/11/2023 15:36:18 INFO     Log level set to: INFO
2023-10-11 15:36:31.016 python[36065:5736012] +[CATransaction synchronize] called within transaction
2023-10-11 15:36:39.976 python[36065:5736012] +[CATransaction synchronize] called within transaction
2023-10-11 15:36:46.476 python[36065:5736012] +[CATransaction synchronize] called within transaction
WARNING:tensorflow:From /Users/joshua/faceswap/lib/gui/analysis/event_reader.py:532: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
2023-10-11 15:38:14.679 python[36065:5736012] +[CATransaction synchronize] called within transaction
2023-10-11 15:39:50.291 python[36065:5736012] +[CATransaction synchronize] called within transaction
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
libpng error: Read Error
Fatal Python error: PyEval_RestoreThread: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)
Python runtime state: initialized

Thread 0x000000017793f000 (most recent call first):
  File "/Users/joshua/faceswap/lib/gui/wrapper.py", line 389 in _read_stderr
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/threading.py", line 953 in run
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x0000000176933000 (most recent call first):
  File "/Users/joshua/faceswap/lib/gui/custom_widgets.py", line 417 in __call__
  File "/Users/joshua/faceswap/lib/gui/custom_widgets.py", line 260 in write
  File "/Users/joshua/faceswap/lib/gui/wrapper.py", line 371 in _read_stdout
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/threading.py", line 953 in run
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00000001dda16080 (most recent call first):
  File "/Users/joshua/faceswap/lib/training/preview_cv.py", line 43 in add_image
  File "/Users/joshua/faceswap/lib/gui/utils/image.py", line 116 in load
  File "/Users/joshua/faceswap/lib/gui/display_command.py", line 140 in display_item_set
  File "/Users/joshua/faceswap/lib/gui/display_page.py", line 266 in _update_page
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/tkinter/__init__.py", line 839 in callit
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/tkinter/__init__.py", line 1921 in __call__
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/tkinter/__init__.py", line 1349 in update_idletasks
  File "/Users/joshua/faceswap/lib/training/preview_tk.py", line 382 in set_image
  File "/Users/joshua/faceswap/lib/training/preview_tk.py", line 818 in _update_image
  File "/Users/joshua/faceswap/lib/training/preview_tk.py", line 914 in _display_preview
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/tkinter/__init__.py", line 839 in callit
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/tkinter/__init__.py", line 1921 in __call__
  File "/Users/joshua/anaconda3/envs/faceswap/lib/python3.10/tkinter/__init__.py", line 1458 in mainloop
  File "/Users/joshua/faceswap/scripts/gui.py", line 183 in process
  File "/Users/joshua/faceswap/lib/cli/launcher.py", line 225 in execute_script
  File "/Users/joshua/faceswap/faceswap.py", line 52 in _main
  File "/Users/joshua/faceswap/faceswap.py", line 56 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, psutil._psutil_osx, psutil._psutil_posix, tensorflow.python.framework.fast_tensor_util, charset_normalizer.md, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.linalg._flinalg, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, PIL._imaging, scipy.ndimage._nd_image, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, _ni_label, scipy.ndimage._ni_label, matplotlib._c_internal_utils, matplotlib._path, kiwisolver._cext, numexpr.interpreter, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, numpy.linalg.lapack_lite, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, matplotlib._image, matplotlib.backends._tkagg, PIL._imagingtk (total: 117)
/Users/joshua/faceswap/faceswap_gui_launcher.command: line 4: 36065 Abort trap: 6           python "/Users/joshua/faceswap/faceswap.py" gui

Saving session.../Users/joshua/anaconda3/envs/faceswap/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

...copying shared history...
...saving history...truncating history files...
...completed.

[Process completed]


Re: Crashes on training

Posted: Thu Oct 12, 2023 9:20 pm
by torzdf

I have not seen this crash before, and I would imagine that it is being caused by Tensorflow-metal (from Apple) or another library that we use, purely based on this:
https://stackoverflow.com/questions/667 ... alled-with


Re: Crashes on training

Posted: Sat Oct 14, 2023 11:52 am
by hullo

EDIT: nvm. I misread python "3.10" as 3.1

Gonna try installing latest tensorflow metal.

EDIT 2: I now realize tensorflow and tensorflow metal are 2 different things and I'm already on the latest or near latest tf metal.


Re: Crashes on training

Posted: Tue Oct 17, 2023 9:55 am
by torzdf

The version of Tensorflow metal you use is important. It has to correspond with the version of Tensorflow used. Currently Faceswap is on Tensorflow 2.10. This requires Tensorflow-Metal 0.60.

The only other thing I can suggest is doing the macOS equivalent of this:
https://forum.faceswap.dev/app.php/faqpage#f1r1