keras

[Keras](https://keras.io/) is a [[base/Deep Learning/deep learning|deep learning]] library that was acquired by Google and now is a wrapper for [[Tensorflow]] and [[PyTorch]]. You'll want to run `keras` on a [[graphics processing unit|GPU]], your local environment will likely be too slow even on small toy models. The best way to access a GPU is to [[enable GPU or TPU in Google Colab]]. In the example below, we'll build a simple neural network with one hidden layer to predict 5 classes from 100 training images. The number of epochs is the number of times the model is shown the training data. ```python import keras from keras.models import Sequential, Dense # Create basic neural net model = Sequential() # Add densely connected hidden layer with ReLU activation 100 inputs model.add(Dense(units=64, activation='relu', input_dim=100)) # Add output layer with softmax activation function for 5 classes model.add(Dense(units=5, activation='softmax')) # Create an optimizer and specify learning rate opt = keras.optimizers.SGD(learning_rate=0.1) # Print summary model.summary ``` To train the model ```python # Compile model model.compile( loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'] ) # Train model (TODO: define X, y) model.fit(X, y, epochs=5) ``` You can also access `keras` from within `tensorflow`. ```python import tensorflow as tf model = tf.keras.Sequential() ... ``` > [!Tip]- Additional Resources > - [Stanford Deep Learning for Computer Vision (CS231n)](https://cs231n.stanford.edu/) ## keras image Efficient data streaming is essential for speeding up data processing. Kaggle makes available both GPU and TPU processer units with limited free use per week. The Keras class `image_dataset_from_directory` allows streaming directly from disk, allowing better performance on GPU, however requires images be converted from TIFF to a supported file format (e.g., JPG) and a subdirectory structure with subfolder for each label like ``` * --+ train/ --+ 1/ --+ 0/ --+ test/ --+ 1/ --+ 0/ ``` I'll copy the files into a new directory matching the above and then read using Keras class `image_dataset_from_directory` before training the model. ```python # Balance training data (subsample for debugging) n_train = train_labels['label'].value_counts().min() n_train = round(n_train * 0.05) # Limit to 5% for debugging benign = train_labels[train_labels['label'] == 0].sample(n_train) malignant = train_labels[train_labels['label'] == 1].sample(n_train) df_train_all = pd.concat([benign, malignant], axis=0).reset_index(drop=True) # Create directories keras_train_dir = 'keras/train' keras_test_dir = 'keras/test' for fold in [keras_train_dir, keras_test_dir]: for subf in ["0", "1"]: os.makedirs(os.path.join(fold, subf), exist_ok=True) # copy files def copy_and_convert_images_to_jpg( src_dir, dest_dir, df_fids ): for _, (fid, label) in tqdm( df_fids.iterrows(), total=len(df_fids), desc="Converting images" ): fname = str(fid) + ".tif" src = os.path.join(src_dir, fname) new_fname = str(fid) + ".jpg" dst = os.path.join(dest_dir, str(label), new_fname) # Open, convert, and save as .jpg or .png with Image.open(src) as img: img.save(dst, format="JPEG") copy_and_convert_images_to_jpg(train_dir, keras_train_dir, df_train_all) # set up data streams image_size = (96, 96) batch_size = 128 train_ds, val_ds = keras.utils.image_dataset_from_directory( keras_train_dir, validation_split=0.2, subset="both", seed=1337, image_size=image_size, batch_size=batch_size, shuffle=True ) # Prefetching samples in GPU memory helps maximize GPU utilization. train_ds = train_ds.prefetch(tf_data.AUTOTUNE) val_ds = val_ds.prefetch(tf_data.AUTOTUNE) # Set up image stream for test data test_ds = keras.utils.image_dataset_from_directory( keras_test_dir, image_size=image_size, batch_size=batch_size, shuffle=False ) ```