16  Fallstudie GermEval-Keras-Simple

16.1 Lernsteuerung

16.1.1 Lernziele

Nach Abschluss dieses Kapitels …

  • ein einfaches neuronales Netzwerk mit Keras erstellen zur Klassifikation von Hate-Speech.

16.1.2 Überblick

In diesem Kapitel nutzen wir grundlegende Methoden neuronaler Netze, um Hate-Speech vorherzusagen. Dabei findet der Datensatz GermEval Verwendung. Zunächst verwenden wir den schon aufbereiteten Datensatz, das macht es uns einfacher. Dieser aufbereitete Datensatz ist schon “numerisiert”1. Der Text der Tweets ist schon in numerische Prädiktoren umgewandelt. Dabei fanden einfache (deutschsprachige) Wordvektoren (wikipedia2vec) Verwendung. In diesem Kapitel arbeiten wir mit ausschließlich mit Python.

16.1.3 Benötigte R-Pakete

# keines :-)

16.1.4 Python-Check

reticulate::py_available()
## [1] FALSE
reticulate::py_config()
## python:         /Users/sebastiansaueruser/.virtualenvs/r-tensorflow/bin/python
## libpython:      /Users/sebastiansaueruser/.pyenv/versions/3.8.16/lib/libpython3.8.dylib
## pythonhome:     /Users/sebastiansaueruser/.virtualenvs/r-tensorflow:/Users/sebastiansaueruser/.virtualenvs/r-tensorflow
## version:        3.8.16 (default, Sep 15 2023, 17:53:02)  [Clang 14.0.3 (clang-1403.0.22.14.1)]
## numpy:          /Users/sebastiansaueruser/.virtualenvs/r-tensorflow/lib/python3.8/site-packages/numpy
## numpy_version:  1.24.3
## 
## NOTE: Python version was forced by VIRTUAL_ENV

16.1.5 Benötigte Python-Module

import keras
import pandas as pd

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.metrics import accuracy_score

16.2 Pipeline mit 1 Hidden Layer

16.2.1 Daten

d_train_baked = pd.read_csv("https://raw.githubusercontent.com/sebastiansauer/Datenwerk2/main/data/germeval/germeval_train_recipe_wordvec_senti.csv")

d_train_num = d_train_baked.select_dtypes(include='number')

d_train2 = d_train_baked.loc[:, "emo_count":"wordembed_text_V101"]

X_train = d_train2.values

d_train_baked["y"] = d_train_baked["c1"].map({"OTHER" : 0, "OFFENSE" : 1})

y_train = d_train_baked.loc[:, "y"].values

Head von y_train:

print(y_train[:6])
## [0 0 0 0 1 0]

Info zum Objekt:

d_train2.info()
## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 5009 entries, 0 to 5008
## Columns: 119 entries, emo_count to wordembed_text_V101
## dtypes: float64(119)
## memory usage: 4.5 MB

Head von y_train2:

print(d_train2.head())
##    emo_count  schimpf_count  ...  wordembed_text_V100  wordembed_text_V101
## 0   0.574594      -0.450067  ...            -0.449265            -0.277801
## 1  -1.111107      -0.450067  ...             0.974438             0.223422
## 2   0.186402      -0.450067  ...             0.407285             0.470835
## 3   0.201551      -0.450067  ...            -0.681155             0.351565
## 4   0.168223      -0.450067  ...            -0.674108             0.543312
## 
## [5 rows x 119 columns]
d_test_baked = pd.read_csv("https://raw.githubusercontent.com/sebastiansauer/Datenwerk2/main/data/germeval/germeval_test_recipe_wordvec_senti.csv")

d_test_num = d_test_baked.select_dtypes(include='number')

d_test2 = d_test_baked.loc[:, "emo_count":"wordembed_text_V101"]

X_test = d_test2.values


d_test_baked["y"] = d_test_baked["c1"].map({"OTHER" : 0, "OFFENSE" : 1})

y_test = d_test_baked.loc[:, "y"].values
print(y_test[:5])
## [0 0 0 0 1]

16.2.2 Modeldefinition

model = Sequential()
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

16.2.3 Fit

model.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_test, y_test))
## Epoch 1/10
## 
 1/79 [..............................] - ETA: 57s - loss: 0.9031 - accuracy: 0.4688
32/79 [===========>..................] - ETA: 0s - loss: 0.7415 - accuracy: 0.5435 
66/79 [========================>.....] - ETA: 0s - loss: 0.6537 - accuracy: 0.6264
79/79 [==============================] - 1s 5ms/step - loss: 0.6366 - accuracy: 0.6396 - val_loss: 0.5578 - val_accuracy: 0.7166
## Epoch 2/10
## 
 1/79 [..............................] - ETA: 0s - loss: 0.5155 - accuracy: 0.7969
36/79 [============>.................] - ETA: 0s - loss: 0.5192 - accuracy: 0.7374
70/79 [=========================>....] - ETA: 0s - loss: 0.5059 - accuracy: 0.7473
79/79 [==============================] - 0s 3ms/step - loss: 0.5030 - accuracy: 0.7479 - val_loss: 0.5491 - val_accuracy: 0.7262
## Epoch 3/10
## 
 1/79 [..............................] - ETA: 0s - loss: 0.5058 - accuracy: 0.7344
36/79 [============>.................] - ETA: 0s - loss: 0.4703 - accuracy: 0.7726
71/79 [=========================>....] - ETA: 0s - loss: 0.4771 - accuracy: 0.7658
79/79 [==============================] - 0s 3ms/step - loss: 0.4733 - accuracy: 0.7678 - val_loss: 0.5540 - val_accuracy: 0.7273
## Epoch 4/10
## 
 1/79 [..............................] - ETA: 0s - loss: 0.4064 - accuracy: 0.7969
36/79 [============>.................] - ETA: 0s - loss: 0.4464 - accuracy: 0.7860
73/79 [==========================>...] - ETA: 0s - loss: 0.4523 - accuracy: 0.7815
79/79 [==============================] - 0s 3ms/step - loss: 0.4539 - accuracy: 0.7820 - val_loss: 0.5623 - val_accuracy: 0.7240
## Epoch 5/10
## 
 1/79 [..............................] - ETA: 0s - loss: 0.3533 - accuracy: 0.8438
35/79 [============>.................] - ETA: 0s - loss: 0.4413 - accuracy: 0.7915
71/79 [=========================>....] - ETA: 0s - loss: 0.4400 - accuracy: 0.7931
79/79 [==============================] - 0s 3ms/step - loss: 0.4380 - accuracy: 0.7944 - val_loss: 0.5607 - val_accuracy: 0.7248
## Epoch 6/10
## 
 1/79 [..............................] - ETA: 0s - loss: 0.3845 - accuracy: 0.8281
34/79 [===========>..................] - ETA: 0s - loss: 0.4141 - accuracy: 0.8051
70/79 [=========================>....] - ETA: 0s - loss: 0.4201 - accuracy: 0.7998
79/79 [==============================] - 0s 3ms/step - loss: 0.4233 - accuracy: 0.7978 - val_loss: 0.5656 - val_accuracy: 0.7251
## Epoch 7/10
## 
 1/79 [..............................] - ETA: 0s - loss: 0.4457 - accuracy: 0.7812
24/79 [========>.....................] - ETA: 0s - loss: 0.4157 - accuracy: 0.8027
57/79 [====================>.........] - ETA: 0s - loss: 0.4115 - accuracy: 0.8062
79/79 [==============================] - 0s 3ms/step - loss: 0.4098 - accuracy: 0.8065 - val_loss: 0.5689 - val_accuracy: 0.7282
## Epoch 8/10
## 
 1/79 [..............................] - ETA: 0s - loss: 0.3460 - accuracy: 0.8906
32/79 [===========>..................] - ETA: 0s - loss: 0.3959 - accuracy: 0.8184
67/79 [========================>.....] - ETA: 0s - loss: 0.3945 - accuracy: 0.8190
79/79 [==============================] - 0s 3ms/step - loss: 0.3971 - accuracy: 0.8169 - val_loss: 0.5731 - val_accuracy: 0.7268
## Epoch 9/10
## 
 1/79 [..............................] - ETA: 0s - loss: 0.4085 - accuracy: 0.7969
21/79 [======>.......................] - ETA: 0s - loss: 0.3617 - accuracy: 0.8549
49/79 [=================>............] - ETA: 0s - loss: 0.3765 - accuracy: 0.8339
76/79 [===========================>..] - ETA: 0s - loss: 0.3832 - accuracy: 0.8283
79/79 [==============================] - 0s 3ms/step - loss: 0.3849 - accuracy: 0.8271 - val_loss: 0.5836 - val_accuracy: 0.7268
## Epoch 10/10
## 
 1/79 [..............................] - ETA: 0s - loss: 0.2995 - accuracy: 0.8906
26/79 [========>.....................] - ETA: 0s - loss: 0.3683 - accuracy: 0.8335
52/79 [==================>...........] - ETA: 0s - loss: 0.3717 - accuracy: 0.8287
78/79 [============================>.] - ETA: 0s - loss: 0.3725 - accuracy: 0.8305
79/79 [==============================] - 0s 3ms/step - loss: 0.3726 - accuracy: 0.8305 - val_loss: 0.5840 - val_accuracy: 0.7265
## <keras.src.callbacks.History object at 0x137a43ac0>

16.2.4 Fazit

Schon mit diesem einfachen Netzwerk, das sich schnell berechnen lässt, übertreffen wir auf Anhieb die Modellgüte (Gesamtgenauigkeit) der Shallow-Learners aus früheren Kapiteln.

16.3 Pipeline mit 2 Hidden Layers

Wir verwenden die gleichen Daten wie oben.

Wir fügen eine zweite Hidden Layer hinzu. Außerdem verändern wir die Batch-Size.

16.3.1 Modeldefinition

model = Sequential()
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(units=32, activation='relu'))  # Second hidden layer
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

16.3.2 Fit

model.fit(X_train, y_train, epochs=10, batch_size=8, validation_data=(X_test, y_test))
## Epoch 1/10
## 
  1/627 [..............................] - ETA: 8:54 - loss: 0.6587 - accuracy: 0.5000
 28/627 [>.............................] - ETA: 1s - loss: 0.6542 - accuracy: 0.6384  
 56/627 [=>............................] - ETA: 1s - loss: 0.6259 - accuracy: 0.6562
 86/627 [===>..........................] - ETA: 0s - loss: 0.6204 - accuracy: 0.6570
119/627 [====>.........................] - ETA: 0s - loss: 0.6080 - accuracy: 0.6691
153/627 [======>.......................] - ETA: 0s - loss: 0.6012 - accuracy: 0.6748
189/627 [========>.....................] - ETA: 0s - loss: 0.5924 - accuracy: 0.6779
224/627 [=========>....................] - ETA: 0s - loss: 0.5813 - accuracy: 0.6864
255/627 [===========>..................] - ETA: 0s - loss: 0.5744 - accuracy: 0.6926
291/627 [============>.................] - ETA: 0s - loss: 0.5744 - accuracy: 0.6942
327/627 [==============>...............] - ETA: 0s - loss: 0.5726 - accuracy: 0.6957
332/627 [==============>...............] - ETA: 0s - loss: 0.5713 - accuracy: 0.6969
366/627 [================>.............] - ETA: 0s - loss: 0.5658 - accuracy: 0.7018
401/627 [==================>...........] - ETA: 0s - loss: 0.5634 - accuracy: 0.7042
438/627 [===================>..........] - ETA: 0s - loss: 0.5616 - accuracy: 0.7072
473/627 [=====================>........] - ETA: 0s - loss: 0.5554 - accuracy: 0.7106
509/627 [=======================>......] - ETA: 0s - loss: 0.5552 - accuracy: 0.7110
545/627 [=========================>....] - ETA: 0s - loss: 0.5522 - accuracy: 0.7149
582/627 [==========================>...] - ETA: 0s - loss: 0.5497 - accuracy: 0.7174
617/627 [============================>.] - ETA: 0s - loss: 0.5499 - accuracy: 0.7180
627/627 [==============================] - 3s 3ms/step - loss: 0.5477 - accuracy: 0.7181 - val_loss: 0.5593 - val_accuracy: 0.7214
## Epoch 2/10
## 
  1/627 [..............................] - ETA: 0s - loss: 0.1919 - accuracy: 1.0000
 36/627 [>.............................] - ETA: 0s - loss: 0.4186 - accuracy: 0.8194
 71/627 [==>...........................] - ETA: 0s - loss: 0.4415 - accuracy: 0.8046
105/627 [====>.........................] - ETA: 0s - loss: 0.4356 - accuracy: 0.8036
141/627 [=====>........................] - ETA: 0s - loss: 0.4379 - accuracy: 0.7979
175/627 [=======>......................] - ETA: 0s - loss: 0.4489 - accuracy: 0.7893
211/627 [=========>....................] - ETA: 0s - loss: 0.4469 - accuracy: 0.7885
246/627 [==========>...................] - ETA: 0s - loss: 0.4488 - accuracy: 0.7881
282/627 [============>.................] - ETA: 0s - loss: 0.4509 - accuracy: 0.7877
317/627 [==============>...............] - ETA: 0s - loss: 0.4441 - accuracy: 0.7898
354/627 [===============>..............] - ETA: 0s - loss: 0.4508 - accuracy: 0.7871
389/627 [=================>............] - ETA: 0s - loss: 0.4538 - accuracy: 0.7831
425/627 [===================>..........] - ETA: 0s - loss: 0.4540 - accuracy: 0.7815
460/627 [=====================>........] - ETA: 0s - loss: 0.4549 - accuracy: 0.7793
491/627 [======================>.......] - ETA: 0s - loss: 0.4589 - accuracy: 0.7777
525/627 [========================>.....] - ETA: 0s - loss: 0.4598 - accuracy: 0.7769
561/627 [=========================>....] - ETA: 0s - loss: 0.4598 - accuracy: 0.7790
593/627 [===========================>..] - ETA: 0s - loss: 0.4589 - accuracy: 0.7789
623/627 [============================>.] - ETA: 0s - loss: 0.4577 - accuracy: 0.7791
627/627 [==============================] - 2s 3ms/step - loss: 0.4568 - accuracy: 0.7796 - val_loss: 0.5706 - val_accuracy: 0.7285
## Epoch 3/10
## 
  1/627 [..............................] - ETA: 1s - loss: 0.4661 - accuracy: 0.5000
 36/627 [>.............................] - ETA: 0s - loss: 0.4126 - accuracy: 0.7847
 70/627 [==>...........................] - ETA: 0s - loss: 0.4173 - accuracy: 0.7929
100/627 [===>..........................] - ETA: 0s - loss: 0.4206 - accuracy: 0.8025
132/627 [=====>........................] - ETA: 0s - loss: 0.4117 - accuracy: 0.8078
167/627 [======>.......................] - ETA: 0s - loss: 0.4155 - accuracy: 0.8046
202/627 [========>.....................] - ETA: 0s - loss: 0.4129 - accuracy: 0.8069
233/627 [==========>...................] - ETA: 0s - loss: 0.4146 - accuracy: 0.8063
267/627 [===========>..................] - ETA: 0s - loss: 0.4173 - accuracy: 0.8034
298/627 [=============>................] - ETA: 0s - loss: 0.4177 - accuracy: 0.8054
326/627 [==============>...............] - ETA: 0s - loss: 0.4154 - accuracy: 0.8067
357/627 [================>.............] - ETA: 0s - loss: 0.4143 - accuracy: 0.8074
380/627 [=================>............] - ETA: 0s - loss: 0.4139 - accuracy: 0.8076
407/627 [==================>...........] - ETA: 0s - loss: 0.4154 - accuracy: 0.8050
428/627 [===================>..........] - ETA: 0s - loss: 0.4137 - accuracy: 0.8055
458/627 [====================>.........] - ETA: 0s - loss: 0.4140 - accuracy: 0.8051
489/627 [======================>.......] - ETA: 0s - loss: 0.4164 - accuracy: 0.8037
517/627 [=======================>......] - ETA: 0s - loss: 0.4159 - accuracy: 0.8044
548/627 [=========================>....] - ETA: 0s - loss: 0.4174 - accuracy: 0.8043
574/627 [==========================>...] - ETA: 0s - loss: 0.4170 - accuracy: 0.8051
600/627 [===========================>..] - ETA: 0s - loss: 0.4162 - accuracy: 0.8067
626/627 [============================>.] - ETA: 0s - loss: 0.4140 - accuracy: 0.8075
627/627 [==============================] - 2s 3ms/step - loss: 0.4140 - accuracy: 0.8075 - val_loss: 0.6027 - val_accuracy: 0.7248
## Epoch 4/10
## 
  1/627 [..............................] - ETA: 1s - loss: 0.3053 - accuracy: 0.7500
 32/627 [>.............................] - ETA: 0s - loss: 0.3231 - accuracy: 0.8672
 62/627 [=>............................] - ETA: 0s - loss: 0.3515 - accuracy: 0.8407
 93/627 [===>..........................] - ETA: 0s - loss: 0.3484 - accuracy: 0.8562
120/627 [====>.........................] - ETA: 0s - loss: 0.3372 - accuracy: 0.8583
153/627 [======>.......................] - ETA: 0s - loss: 0.3312 - accuracy: 0.8603
183/627 [=======>......................] - ETA: 0s - loss: 0.3439 - accuracy: 0.8518
212/627 [=========>....................] - ETA: 0s - loss: 0.3443 - accuracy: 0.8532
226/627 [=========>....................] - ETA: 0s - loss: 0.3461 - accuracy: 0.8507
245/627 [==========>...................] - ETA: 0s - loss: 0.3511 - accuracy: 0.8495
269/627 [===========>..................] - ETA: 0s - loss: 0.3523 - accuracy: 0.8494
288/627 [============>.................] - ETA: 0s - loss: 0.3567 - accuracy: 0.8455
317/627 [==============>...............] - ETA: 0s - loss: 0.3589 - accuracy: 0.8442
346/627 [===============>..............] - ETA: 0s - loss: 0.3659 - accuracy: 0.8396
376/627 [================>.............] - ETA: 0s - loss: 0.3672 - accuracy: 0.8371
397/627 [=================>............] - ETA: 0s - loss: 0.3673 - accuracy: 0.8369
425/627 [===================>..........] - ETA: 0s - loss: 0.3697 - accuracy: 0.8350
447/627 [====================>.........] - ETA: 0s - loss: 0.3711 - accuracy: 0.8353
467/627 [=====================>........] - ETA: 0s - loss: 0.3726 - accuracy: 0.8362
492/627 [======================>.......] - ETA: 0s - loss: 0.3726 - accuracy: 0.8364
511/627 [=======================>......] - ETA: 0s - loss: 0.3726 - accuracy: 0.8371
537/627 [========================>.....] - ETA: 0s - loss: 0.3756 - accuracy: 0.8350
568/627 [==========================>...] - ETA: 0s - loss: 0.3755 - accuracy: 0.8343
595/627 [===========================>..] - ETA: 0s - loss: 0.3756 - accuracy: 0.8334
622/627 [============================>.] - ETA: 0s - loss: 0.3751 - accuracy: 0.8336
627/627 [==============================] - 2s 3ms/step - loss: 0.3753 - accuracy: 0.8333 - val_loss: 0.6474 - val_accuracy: 0.7101
## Epoch 5/10
## 
  1/627 [..............................] - ETA: 0s - loss: 0.4109 - accuracy: 0.7500
 37/627 [>.............................] - ETA: 0s - loss: 0.3228 - accuracy: 0.8818
 70/627 [==>...........................] - ETA: 0s - loss: 0.3331 - accuracy: 0.8643
106/627 [====>.........................] - ETA: 0s - loss: 0.3080 - accuracy: 0.8774
144/627 [=====>........................] - ETA: 0s - loss: 0.3092 - accuracy: 0.8724
180/627 [=======>......................] - ETA: 0s - loss: 0.3196 - accuracy: 0.8611
216/627 [=========>....................] - ETA: 0s - loss: 0.3272 - accuracy: 0.8559
247/627 [==========>...................] - ETA: 0s - loss: 0.3281 - accuracy: 0.8548
274/627 [============>.................] - ETA: 0s - loss: 0.3275 - accuracy: 0.8540
295/627 [=============>................] - ETA: 0s - loss: 0.3305 - accuracy: 0.8504
314/627 [==============>...............] - ETA: 0s - loss: 0.3297 - accuracy: 0.8527
335/627 [===============>..............] - ETA: 0s - loss: 0.3285 - accuracy: 0.8537
365/627 [================>.............] - ETA: 0s - loss: 0.3286 - accuracy: 0.8565
393/627 [=================>............] - ETA: 0s - loss: 0.3289 - accuracy: 0.8569
423/627 [===================>..........] - ETA: 0s - loss: 0.3304 - accuracy: 0.8561
453/627 [====================>.........] - ETA: 0s - loss: 0.3323 - accuracy: 0.8551
482/627 [======================>.......] - ETA: 0s - loss: 0.3368 - accuracy: 0.8527
512/627 [=======================>......] - ETA: 0s - loss: 0.3374 - accuracy: 0.8525
547/627 [=========================>....] - ETA: 0s - loss: 0.3359 - accuracy: 0.8544
581/627 [==========================>...] - ETA: 0s - loss: 0.3340 - accuracy: 0.8552
617/627 [============================>.] - ETA: 0s - loss: 0.3361 - accuracy: 0.8539
627/627 [==============================] - 2s 2ms/step - loss: 0.3359 - accuracy: 0.8537 - val_loss: 0.6784 - val_accuracy: 0.7129
## Epoch 6/10
## 
  1/627 [..............................] - ETA: 1s - loss: 0.1333 - accuracy: 1.0000
 37/627 [>.............................] - ETA: 0s - loss: 0.2734 - accuracy: 0.8784
 74/627 [==>...........................] - ETA: 0s - loss: 0.2664 - accuracy: 0.8953
108/627 [====>.........................] - ETA: 0s - loss: 0.2676 - accuracy: 0.8831
145/627 [=====>........................] - ETA: 0s - loss: 0.2765 - accuracy: 0.8767
182/627 [=======>......................] - ETA: 0s - loss: 0.2808 - accuracy: 0.8805
213/627 [=========>....................] - ETA: 0s - loss: 0.2856 - accuracy: 0.8762
248/627 [==========>...................] - ETA: 0s - loss: 0.2890 - accuracy: 0.8755
279/627 [============>.................] - ETA: 0s - loss: 0.2914 - accuracy: 0.8768
309/627 [=============>................] - ETA: 0s - loss: 0.2889 - accuracy: 0.8786
329/627 [==============>...............] - ETA: 0s - loss: 0.2903 - accuracy: 0.8784
347/627 [===============>..............] - ETA: 0s - loss: 0.2896 - accuracy: 0.8779
367/627 [================>.............] - ETA: 0s - loss: 0.2911 - accuracy: 0.8764
392/627 [=================>............] - ETA: 0s - loss: 0.2906 - accuracy: 0.8753
418/627 [===================>..........] - ETA: 0s - loss: 0.2951 - accuracy: 0.8747
447/627 [====================>.........] - ETA: 0s - loss: 0.2958 - accuracy: 0.8753
470/627 [=====================>........] - ETA: 0s - loss: 0.2948 - accuracy: 0.8771
501/627 [======================>.......] - ETA: 0s - loss: 0.2992 - accuracy: 0.8743
533/627 [========================>.....] - ETA: 0s - loss: 0.2989 - accuracy: 0.8748
570/627 [==========================>...] - ETA: 0s - loss: 0.2995 - accuracy: 0.8750
606/627 [===========================>..] - ETA: 0s - loss: 0.2992 - accuracy: 0.8744
627/627 [==============================] - 2s 3ms/step - loss: 0.2980 - accuracy: 0.8746 - val_loss: 0.6883 - val_accuracy: 0.7189
## Epoch 7/10
## 
  1/627 [..............................] - ETA: 1s - loss: 0.2289 - accuracy: 1.0000
 30/627 [>.............................] - ETA: 1s - loss: 0.1825 - accuracy: 0.9583
 57/627 [=>............................] - ETA: 1s - loss: 0.1969 - accuracy: 0.9364
 76/627 [==>...........................] - ETA: 1s - loss: 0.1903 - accuracy: 0.9391
101/627 [===>..........................] - ETA: 1s - loss: 0.2084 - accuracy: 0.9295
133/627 [=====>........................] - ETA: 0s - loss: 0.2183 - accuracy: 0.9295
158/627 [======>.......................] - ETA: 0s - loss: 0.2289 - accuracy: 0.9217
186/627 [=======>......................] - ETA: 0s - loss: 0.2223 - accuracy: 0.9267
213/627 [=========>....................] - ETA: 0s - loss: 0.2283 - accuracy: 0.9255
239/627 [==========>...................] - ETA: 0s - loss: 0.2371 - accuracy: 0.9179
270/627 [===========>..................] - ETA: 0s - loss: 0.2388 - accuracy: 0.9171
301/627 [=============>................] - ETA: 0s - loss: 0.2454 - accuracy: 0.9128
333/627 [==============>...............] - ETA: 0s - loss: 0.2507 - accuracy: 0.9088
365/627 [================>.............] - ETA: 0s - loss: 0.2482 - accuracy: 0.9106
396/627 [=================>............] - ETA: 0s - loss: 0.2515 - accuracy: 0.9056
426/627 [===================>..........] - ETA: 0s - loss: 0.2526 - accuracy: 0.9040
459/627 [====================>.........] - ETA: 0s - loss: 0.2537 - accuracy: 0.9022
490/627 [======================>.......] - ETA: 0s - loss: 0.2545 - accuracy: 0.9003
522/627 [=======================>......] - ETA: 0s - loss: 0.2552 - accuracy: 0.8997
553/627 [=========================>....] - ETA: 0s - loss: 0.2553 - accuracy: 0.8996
586/627 [===========================>..] - ETA: 0s - loss: 0.2591 - accuracy: 0.8974
620/627 [============================>.] - ETA: 0s - loss: 0.2612 - accuracy: 0.8964
627/627 [==============================] - 2s 2ms/step - loss: 0.2609 - accuracy: 0.8962 - val_loss: 0.7289 - val_accuracy: 0.7152
## Epoch 8/10
## 
  1/627 [..............................] - ETA: 0s - loss: 0.2016 - accuracy: 0.8750
 39/627 [>.............................] - ETA: 0s - loss: 0.1916 - accuracy: 0.9519
 75/627 [==>...........................] - ETA: 0s - loss: 0.2067 - accuracy: 0.9317
109/627 [====>.........................] - ETA: 0s - loss: 0.2097 - accuracy: 0.9255
125/627 [====>.........................] - ETA: 0s - loss: 0.2040 - accuracy: 0.9290
142/627 [=====>........................] - ETA: 0s - loss: 0.2028 - accuracy: 0.9278
165/627 [======>.......................] - ETA: 0s - loss: 0.2021 - accuracy: 0.9258
190/627 [========>.....................] - ETA: 0s - loss: 0.2058 - accuracy: 0.9257
211/627 [=========>....................] - ETA: 0s - loss: 0.2065 - accuracy: 0.9259
234/627 [==========>...................] - ETA: 0s - loss: 0.2115 - accuracy: 0.9252
259/627 [===========>..................] - ETA: 0s - loss: 0.2183 - accuracy: 0.9213
283/627 [============>.................] - ETA: 0s - loss: 0.2159 - accuracy: 0.9231
308/627 [=============>................] - ETA: 0s - loss: 0.2170 - accuracy: 0.9225
331/627 [==============>...............] - ETA: 0s - loss: 0.2183 - accuracy: 0.9226
358/627 [================>.............] - ETA: 0s - loss: 0.2184 - accuracy: 0.9214
388/627 [=================>............] - ETA: 0s - loss: 0.2218 - accuracy: 0.9188
421/627 [===================>..........] - ETA: 0s - loss: 0.2238 - accuracy: 0.9175
455/627 [====================>.........] - ETA: 0s - loss: 0.2256 - accuracy: 0.9170
485/627 [======================>.......] - ETA: 0s - loss: 0.2263 - accuracy: 0.9160
512/627 [=======================>......] - ETA: 0s - loss: 0.2263 - accuracy: 0.9155
533/627 [========================>.....] - ETA: 0s - loss: 0.2239 - accuracy: 0.9172
557/627 [=========================>....] - ETA: 0s - loss: 0.2243 - accuracy: 0.9165
581/627 [==========================>...] - ETA: 0s - loss: 0.2242 - accuracy: 0.9163
610/627 [============================>.] - ETA: 0s - loss: 0.2253 - accuracy: 0.9152
627/627 [==============================] - 2s 3ms/step - loss: 0.2258 - accuracy: 0.9152 - val_loss: 0.8327 - val_accuracy: 0.6937
## Epoch 9/10
## 
  1/627 [..............................] - ETA: 1s - loss: 0.1488 - accuracy: 1.0000
 34/627 [>.............................] - ETA: 0s - loss: 0.1648 - accuracy: 0.9706
 59/627 [=>............................] - ETA: 1s - loss: 0.1746 - accuracy: 0.9513
 90/627 [===>..........................] - ETA: 0s - loss: 0.1658 - accuracy: 0.9528
110/627 [====>.........................] - ETA: 0s - loss: 0.1710 - accuracy: 0.9477
135/627 [=====>........................] - ETA: 0s - loss: 0.1700 - accuracy: 0.9463
170/627 [=======>......................] - ETA: 0s - loss: 0.1739 - accuracy: 0.9426
204/627 [========>.....................] - ETA: 0s - loss: 0.1777 - accuracy: 0.9400
240/627 [==========>...................] - ETA: 0s - loss: 0.1774 - accuracy: 0.9406
271/627 [===========>..................] - ETA: 0s - loss: 0.1780 - accuracy: 0.9414
304/627 [=============>................] - ETA: 0s - loss: 0.1796 - accuracy: 0.9404
335/627 [===============>..............] - ETA: 0s - loss: 0.1770 - accuracy: 0.9410
367/627 [================>.............] - ETA: 0s - loss: 0.1786 - accuracy: 0.9390
402/627 [==================>...........] - ETA: 0s - loss: 0.1780 - accuracy: 0.9384
434/627 [===================>..........] - ETA: 0s - loss: 0.1828 - accuracy: 0.9335
469/627 [=====================>........] - ETA: 0s - loss: 0.1851 - accuracy: 0.9315
500/627 [======================>.......] - ETA: 0s - loss: 0.1865 - accuracy: 0.9310
535/627 [========================>.....] - ETA: 0s - loss: 0.1868 - accuracy: 0.9301
568/627 [==========================>...] - ETA: 0s - loss: 0.1894 - accuracy: 0.9294
604/627 [===========================>..] - ETA: 0s - loss: 0.1912 - accuracy: 0.9280
627/627 [==============================] - 2s 3ms/step - loss: 0.1926 - accuracy: 0.9275 - val_loss: 0.8951 - val_accuracy: 0.7143
## Epoch 10/10
## 
  1/627 [..............................] - ETA: 1s - loss: 0.4191 - accuracy: 0.7500
 35/627 [>.............................] - ETA: 0s - loss: 0.1559 - accuracy: 0.9536
 63/627 [==>...........................] - ETA: 0s - loss: 0.1439 - accuracy: 0.9643
 99/627 [===>..........................] - ETA: 0s - loss: 0.1418 - accuracy: 0.9646
129/627 [=====>........................] - ETA: 0s - loss: 0.1377 - accuracy: 0.9632
164/627 [======>.......................] - ETA: 0s - loss: 0.1360 - accuracy: 0.9604
198/627 [========>.....................] - ETA: 0s - loss: 0.1369 - accuracy: 0.9621
231/627 [==========>...................] - ETA: 0s - loss: 0.1409 - accuracy: 0.9605
266/627 [===========>..................] - ETA: 0s - loss: 0.1423 - accuracy: 0.9586
303/627 [=============>................] - ETA: 0s - loss: 0.1435 - accuracy: 0.9571
340/627 [===============>..............] - ETA: 0s - loss: 0.1466 - accuracy: 0.9555
377/627 [=================>............] - ETA: 0s - loss: 0.1508 - accuracy: 0.9532
408/627 [==================>...........] - ETA: 0s - loss: 0.1513 - accuracy: 0.9522
444/627 [====================>.........] - ETA: 0s - loss: 0.1544 - accuracy: 0.9507
482/627 [======================>.......] - ETA: 0s - loss: 0.1568 - accuracy: 0.9502
516/627 [=======================>......] - ETA: 0s - loss: 0.1575 - accuracy: 0.9494
554/627 [=========================>....] - ETA: 0s - loss: 0.1601 - accuracy: 0.9483
590/627 [===========================>..] - ETA: 0s - loss: 0.1594 - accuracy: 0.9481
622/627 [============================>.] - ETA: 0s - loss: 0.1589 - accuracy: 0.9488
627/627 [==============================] - 1s 2ms/step - loss: 0.1596 - accuracy: 0.9483 - val_loss: 1.0364 - val_accuracy: 0.7044
## <keras.src.callbacks.History object at 0x13828df40>

16.3.3 Modellgüte

y_pred = (model.predict(X_test) > 0.5).astype("int32")
## 
  1/111 [..............................] - ETA: 10s
 50/111 [============>.................] - ETA: 0s 
107/111 [===========================>..] - ETA: 0s
111/111 [==============================] - 0s 951us/step
accuracy = accuracy_score(y_test, y_pred)
print(f"Test Accuracy: {accuracy}")
## Test Accuracy: 0.7044167610419027

16.3.4 Fazit

Die Modellgüte der 2. Pipeline ist etwas geringer als in der ersten. Die zweite Hidden-Layer muss also nicht zur Modellgüte positiv beitragen. Ähnliches gilt für die Batch-Size; wobei eigentlich kleine Batch-Sizes für diesen eher kleinen Datensatz sinnvoll sein sollten …

16.4 Pipeline mit (englischen) Word Embedding

Diese Pipeline orientiert sich an diesem Beispiel von Tensorflow.

16.4.1 Daten

import pandas as pd

train_file_path = "https://github.com/sebastiansauer/pradadata/raw/master/data-raw/germeval_train.csv"

d_train = pd.read_csv(train_file_path)

test_file_path = "https://github.com/sebastiansauer/pradadata/raw/master/data-raw/germeval_test.csv"

d_test = pd.read_csv(test_file_path)

Prädiktor-Dataframes als Arrays:

X_train = d_train["text"].values

X_test = d_test["text"].values

16.4.2 Module

tensorflow-hub ist übrigens NICHT mehr nötig. Das Paket ist jetzt Teil von tensorflow.

import os
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub

16.4.3 GPU

Testen, ob eine GPU verfügbar ist:

tf.config.list_physical_devices('GPU') 
## []
print("TF Version: ", tf.__version__)
## TF Version:  2.13.1
print("Eager mode: ", tf.executing_eagerly())
## Eager mode:  True
print("Hub version: ", hub.__version__)
## Hub version:  0.14.0
print("GPU is", "available" if tf.config.list_physical_devices("GPU") else "NOT AVAILABLE")
## GPU is NOT AVAILABLE

Tja, leider nein.

16.4.4 Wort-Einbettungen

embedding = "https://tfhub.dev/google/nnlm-en-dim50/2"
hub_layer = hub.KerasLayer(embedding, input_shape=[], 
                           dtype=tf.string, trainable=True)

16.4.5 Modell

model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1))

model.summary()
## Model: "sequential_2"
## _________________________________________________________________
##  Layer (type)                Output Shape              Param #   
## =================================================================
##  keras_layer (KerasLayer)    (None, 50)                48190600  
##                                                                  
##  dense_5 (Dense)             (None, 16)                816       
##                                                                  
##  dense_6 (Dense)             (None, 1)                 17        
##                                                                  
## =================================================================
## Total params: 48191433 (183.84 MB)
## Trainable params: 48191433 (183.84 MB)
## Non-trainable params: 0 (0.00 Byte)
## _________________________________________________________________
model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

16.4.6 Trainieren

model.fit(X_train, y_train, 
epochs=10, 
batch_size=8, 
validation_data=(X_test, y_test),
verbose = 1)
Epoch 1/10
627/627 [==============================] - 490s 781ms/step - loss: 0.6232 - accuracy: 0.6638 - val_loss: 0.6093 - val_accuracy: 0.6628
Epoch 2/10
627/627 [==============================] - 477s 760ms/step - loss: 0.4541 - accuracy: 0.7686 - val_loss: 0.6536 - val_accuracy: 0.6761
Epoch 3/10
627/627 [==============================] - 482s 769ms/step - loss: 0.2762 - accuracy: 0.8794 - val_loss: 0.8118 - val_accuracy: 0.6526
Epoch 4/10
627/627 [==============================] - 521s 831ms/step - loss: 0.1671 - accuracy: 0.9367 - val_loss: 1.0416 - val_accuracy: 0.6467
Epoch 5/10
627/627 [==============================] - 456s 727ms/step - loss: 0.0936 - accuracy: 0.9689 - val_loss: 1.2981 - val_accuracy: 0.6486
Epoch 6/10
627/627 [==============================] - 455s 726ms/step - loss: 0.0478 - accuracy: 0.9872 - val_loss: 1.5631 - val_accuracy: 0.6297
Epoch 7/10
627/627 [==============================] - 456s 727ms/step - loss: 0.0240 - accuracy: 0.9954 - val_loss: 1.8281 - val_accuracy: 0.6285
Epoch 8/10
627/627 [==============================] - 455s 726ms/step - loss: 0.0101 - accuracy: 0.9982 - val_loss: 2.0636 - val_accuracy: 0.6334
Epoch 9/10
627/627 [==============================] - 459s 732ms/step - loss: 0.0067 - accuracy: 0.9986 - val_loss: 2.2470 - val_accuracy: 0.6291
Epoch 10/10
627/627 [==============================] - 455s 727ms/step - loss: 0.0046 - accuracy: 0.9992 - val_loss: 2.3786 - val_accuracy: 0.6277
<keras.src.callbacks.History object at 0x148309730>

16.4.7 Modellgüte

y_pred = (model.predict(X_test) > 0.5).astype("int32")
accuracy = accuracy_score(y_test, y_pred)
print(f"Test Accuracy: {accuracy}")
111/111 [==============================] - 17s 151ms/step
Test Accuracy: 0.6276896942242356

16.4.8 Fazit

Naja, dafür dass es englische Wortvektoren waren, gar nicht so schlecht 🤣

16.5 Pipeline mit deutschen Word-Embeddings

Hier geht’s zum CoLab-Notebook

Eine Gesamtgenauigkeit von .69. Nicht so berauschend.