[Keras](https://keras.io/) is a [[base/Deep Learning/deep learning|deep learning]] library that was acquired by Google and now is a wrapper for [[Tensorflow]] and [[PyTorch]].
You'll want to run `keras` on a [[graphics processing unit|GPU]], your local environment will likely be too slow even on small toy models. The best way to access a GPU is to [[enable GPU or TPU in Google Colab]].
In the example below, we'll build a simple neural network with one hidden layer to predict 5 classes from 100 training images. The number of epochs is the number of times the model is shown the training data.
```python
import keras
from keras.models import Sequential, Dense
# Create basic neural net
model = Sequential()
# Add densely connected hidden layer with ReLU activation 100 inputs
model.add(Dense(units=64, activation='relu', input_dim=100))
# Add output layer with softmax activation function for 5 classes
model.add(Dense(units=5, activation='softmax'))
# Create an optimizer and specify learning rate
opt = keras.optimizers.SGD(learning_rate=0.1)
# Print summary
model.summary
```
To train the model
```python
# Compile model
model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy']
)
# Train model (TODO: define X, y)
model.fit(X, y, epochs=5)
```
You can also access `keras` from within `tensorflow`.
```python
import tensorflow as tf
model = tf.keras.Sequential()
...
```
> [!Tip]- Additional Resources
> - [Stanford Deep Learning for Computer Vision (CS231n)](https://cs231n.stanford.edu/)
## keras image
Efficient data streaming is essential for speeding up data processing. Kaggle makes available both GPU and TPU processer units with limited free use per week. The Keras class `image_dataset_from_directory` allows streaming directly from disk, allowing better performance on GPU, however requires images be converted from TIFF to a supported file format (e.g., JPG) and a subdirectory structure with subfolder for each label like
```
*
--+ train/
--+ 1/
--+ 0/
--+ test/
--+ 1/
--+ 0/
```
I'll copy the files into a new directory matching the above and then read using Keras class `image_dataset_from_directory` before training the model.
```python
# Balance training data (subsample for debugging)
n_train = train_labels['label'].value_counts().min()
n_train = round(n_train * 0.05) # Limit to 5% for debugging
benign = train_labels[train_labels['label'] == 0].sample(n_train)
malignant = train_labels[train_labels['label'] == 1].sample(n_train)
df_train_all = pd.concat([benign, malignant], axis=0).reset_index(drop=True)
# Create directories
keras_train_dir = 'keras/train'
keras_test_dir = 'keras/test'
for fold in [keras_train_dir, keras_test_dir]:
for subf in ["0", "1"]:
os.makedirs(os.path.join(fold, subf), exist_ok=True)
# copy files
def copy_and_convert_images_to_jpg(
src_dir, dest_dir, df_fids
):
for _, (fid, label) in tqdm(
df_fids.iterrows(), total=len(df_fids), desc="Converting images"
):
fname = str(fid) + ".tif"
src = os.path.join(src_dir, fname)
new_fname = str(fid) + ".jpg"
dst = os.path.join(dest_dir, str(label), new_fname)
# Open, convert, and save as .jpg or .png
with Image.open(src) as img:
img.save(dst, format="JPEG")
copy_and_convert_images_to_jpg(train_dir, keras_train_dir, df_train_all)
# set up data streams
image_size = (96, 96)
batch_size = 128
train_ds, val_ds = keras.utils.image_dataset_from_directory(
keras_train_dir,
validation_split=0.2,
subset="both",
seed=1337,
image_size=image_size,
batch_size=batch_size,
shuffle=True
)
# Prefetching samples in GPU memory helps maximize GPU utilization.
train_ds = train_ds.prefetch(tf_data.AUTOTUNE)
val_ds = val_ds.prefetch(tf_data.AUTOTUNE)
# Set up image stream for test data
test_ds = keras.utils.image_dataset_from_directory(
keras_test_dir,
image_size=image_size,
batch_size=batch_size,
shuffle=False
)
```